Seagate, APM, pfSense and dreadful Start Stop Count rise killing slowly my hard drive

One day, tinkering with my pfSense, I was quite annoyed about how long it takes to boot up my box to get it to the user interface. OK, I had an old 2.5 HDD 320GB from Hitachi, it was bulletproof TBH, but soo slow.

Not long time ago, I saw one of 2.5in, 500GB Seagate SSHD FireCuda hybrids – ST500LX025. Fast 8GB SSD inside of normal HDD, up to 140MB/s transfer, SATA III. One way of saying: NICE! Normally, I would not use full SSD in something like pfSense, due to wear and tear, but… an idea, that internal SSD is only utilised for the most often used files… it kind of appealed to me very quickly. Let restart system a few times and it should get updated onto internal SSD. Let’s see.

THE PROBLEM

Installation was very straightforward as usual and in about 800 hours later, I had a peek at S.M.A.R.T. details and I was gutted. Not again. All that singing and dancing about “green drives”, “save the environment!” usually end up truly with saying: “Penny wise – dollar stupid!”. Why? Let me rumble a bit: This is about the third time I have to resort myself to digging around the Internet to find out how to disable something, I don’t really need it and it was quite difficult to find info about it. OK, I’ve saved few watts on electricity having all those fancy features on, but wasted a much more looking for the way to disable them. Let me show you SMART features after some 886 working hours.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   067   064   006    Pre-fail  Always       -       5411719
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   084   084   020    Old_age   Always       -       17116
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   071   060   045    Pre-fail  Always       -       13991423
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       886 (200 24 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       96
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       3
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   053   049   040    Old_age   Always       -       47 (Min/Max 25/51)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2
193 Load_Cycle_Count        0x0032   092   092   000    Old_age   Always       -       17171
194 Temperature_Celsius     0x0022   047   051   000    Old_age   Always       -       47 (0 22 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       517 (152 84 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       233859569
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       15692817
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

Don’t just don’t look at Raw_Read_Error_RateSeek_Error_Rate, it is the same rubbish I see since my first Seagate drives 7200.11 series. Somehow it always gives me some weird numbers not related to actual valid information, so with Seagate, I will just skip it. What you should be concerned, is Start_Stop_CountLoad_Cycle_Count, with BOTH having over 17000 counts!!! In 886 hours??!! Let us do some mathematics:

886h (working time) / 24h = 36.9 days
17116 / 36.9 days = 464 Start Stop cycles per day
17171 / 36.9 days = 465 Load Cycles per day

Those drives are calculated at max 600.000 times on those cycles, so:

600.000 / 465 =  1290 days
1290 / 365 days per year = 3.5 years

Basically… warranty last 5 years… but the drive with all those cycles may not.

THE SOLUTION

Having already NAS4Free and problems with way too much head parking, the solution is exactly the same: TURNING THIS BLOODY THING OFF!

New hard drives have SMART options and ability to turn some features on and off, so quick command in pfSense’s Diagnostics/Command Prompt:

ataidle /dev/ada0



Model: ST500LX025-1U717D
Serial: ********
Firmware Rev: SDM1
ATA revision: ATA-10
LBA 48: yes
Geometry: 16383 cyls, 16 heads, 63 spt
Capacity: 465GB
SMART Supported: yes
SMART Enabled: yes
Write Cache Supported: yes
Write Cache Enabled: yes
APM Supported: yes
APM Enabled: yes
AAM Supported: no

What we need is APM Supported: yes and APM Enabled: yes. This is an indication that power management is available and is ON,  so next thing is to turn this thing OFF by issuing the command:

ataidle -P 0 /dev/ada0
ataidle /dev/ada0

...
APM Supported: yes
APM Enabled: no
...

Now… check the SMART features and you should see that those counts are not changed as often as it was before. It should also survive a reboot of the machine, at least it did on mine so far.

Three days later, I went back to SMART info and I saw that drive is parking head again, but also somehow I left pfSense’s System/Advanced/Miscellaneous/Hard Drive Standby @ Standby 36, which forced HDD back into APM mode. Leave this option with “ALWAYS ON”. Next thing is that it will turn off Advanced Power Management, but “old/normal power management” is still on and that will imply standby timers, where the device will go into normal standby mode as per ATA/SATA old standards. We can take care of those with this command:

camcontrol standby ada0 -t 3600

Forcing standby timers to 3600 seconds = 1 hour of inactivity.

So, that’s it. It is off and in machines like NAS4Free or pfSense should stay off as they do loads of small writes, where magnetic head stays busy for a small period time, waking up very often, racking up those cycles. The only problem is that not every HDD can be turned off this way, luckily this one can. The last thing is to just check if everything goes by the plan by issuing the command:

camcontrol identify ada0



pass0: <ST500LX025-1U717D SDM1> ACS-3 ATA SATA 3.x device
pass0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)

protocol              ATA/ATAPI-10 SATA 3.x
device model          ST500LX025-1U717D
firmware revision     SDM1
serial number         ********
WWN                   ********
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 4096, offset 0
LBA supported         268435455 sectors
LBA48 supported       976773168 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6
media RPM             5400

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes	yes
write cache                    yes	yes
flush cache                    yes	yes
overlap                        no
Tagged Command Queuing (TCQ)   no	no
Native Command Queuing (NCQ)   yes		                32 tags
NCQ Queue Management           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    no
SMART                          yes	yes
microcode download             yes	yes
security                       yes	no
power management               yes	yes
advanced power management      yes      no      0/0x00
automatic acoustic management  no	no
media status notification      no	no
power-up in Standby            yes	no
write-read-verify              yes	no	0/0x0
unload                         yes	yes
general purpose logging        yes	yes
free-fall                      no	no
Data Set Management (DSM/TRIM) no
Host Protected Area (HPA)      yes      no      976773168/976773168
HPA - Security                 no

 

UPDATE:

Had a peek at SMART values and “Huston, we have NO problems” anymore. Both counts in question increased about +2 for the past 12h. I can live with that… 😉

 

This entry was posted in NAS4Free, PFSense and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.