Skip to content

SSD / Hard Disk / S.M.A.R.T.

Benefits

Many if not all computer users experience data loss due to hard drive failure at one point in time. If you're lucky, you will notice error messages while accessing your files and will be able to save some of your data to another drive before the failing one stops working completely.

Unfortunately, as soon as you will be unable to access SOME of your files due to mechanical failure of your hard drive you have only very little time to react and save the most precious data you have not backed up to another physical medium before.

Fortunately, almost all modern hard drives support a technology called S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology). Studies1 have shown, that up to 70 percent of all hard disk failures can be predicted in time so that the user has time to save all important data before the disc fails completely. The most extensive of these studies was done by Google by monitoring 100.000 hard disk drives of all major manufacturers over a period of nine months.

Because the Windows operating systems do not inform the user of the S.M.A.R.T. status of the systems disks, additional software is required if you want to profit from the failure predicting capabilities of your drives.

Argus Monitor runs as a background task and constantly monitors the health status of your hard disks. You will be informed in case one of the critical S.M.A.R.T. values changes. If one of the attributes reaches a vendor specified threshold, you will also get a warning that the hard drive is likely to fail within the next 24 hours of operation. Argus Monitor also stores the S.M.A.R.T. status of every drive (even the ones attached only from time to time - e.g. via S-ATA) and will therefore be able to warn you of decreased drive health even if the hard drive was attached to another PC when the value decreased.

Turboboost
Warning about a likely hard drive failure because of decreased critical S.M.A.R.T. attributes.

S.M.A.R.T. monitoring

Overview

The purpose of monitoring S.M.A.R.T. attributes is to provide system administrators and users with information to be able to predict an imminent failure of a hard drive in time to back up critical data and to replace the hard drive itself.

Mechanical failure is responsible for approx. 60 percent of all hard drive failures (1). Even though a total failure of a hard drive could mean the loss of valuable data, most of those events can be predicted in advance with a reliability of up to 70 percent.

Argus Monitor is monitoring S.M.A.R.T. attributes of your hard drives in user defined time intervals (default value is once every 5 minutes). Monitoring is done based on the values of the so called 'critical S.M.A.R.T. attributes'.

The monitoring process is subdivided into the following three distinct categories:

  • Error (standard hard drive failure warnings)
  • Caution (extended hard drive failure warnings)
  • Information (additional information for expert users)

All of these 3 warning categories can be configured individually using the extended S.M.A.R.T. warning configuration options, available via the Argus Monitor settings dialog. which description you can find in the section about configuration.
The following image gives an overview over the S.M.A.R.T. monitoring options available to the user of Argus Monitor, providing the best hard drive failure prediction possible:

S.M.A.R.T. warnings
Argus Monitor S.M.A.R.T. warnings overview

Category Failure

A S.M.A.R.T. error in the context of the standard failure prediction means that a critical S.M.A.R.T. attribute has reached its vendor specific threshold. Values and thresholds are both normalized.

In the image “Example showing all three available S.M.A.R.T. events“ the critical attribute 5 “Reallocated Sector Count“ has reached its threshold of 140. According to S.M.A.R.T. specifications this hard drive is either just failing or about to fail within the next 24 hours.

S.M.A.R.T. events
Example showing all three available S.M.A.R.T. events

Category Caution

The extended hard drive failure prediction monitors raw data values of certain critical S.M.A.R.T. attributes. Those values are vendor specific and not normalized. Studies done by several online data storage providers (e.g. Google (1) and Backblaze (2)) show that heuristic algorithms based on raw data values of critical attributes 5, 187, 196, 197 and 198 are capable of predicting hard drive failures in advance much more reliably.

As a result of the pre-failure heuristic algorithm the attribute 196 “Reallocated Event Count“ in the above image is marked with the word 'Caution'. This in itself is NOT a hard drive failure however, nevertheless the user should pay closer attention to this hard drive and is recommended to perform a backup of all important data stored on this device. Further information about this issue can be found at the end of this page.

Category Information

Changes of the values of critical S.M.A.R.T. attributes fall in this category. An example for such an event is shown the image: the value of attribute 198 “Offline Uncorrectable“ has changed and therefore the word 'Change' is displayed in the table of the attributes.

This category is intended for expert users only. A mere change of a critical S.M.A.R.T. attribute is not considered a hard drive failure as long as the vendor specific threshold has not been reached (see above). This event is only a hint to show that there was a change in one of the values.

For some of the critical values like attribute 3 “Spin Up Time“ this happens regularly and can be considered normal. For those events even expert users should under normal circumstances turn this warning off. In case of an SSD having an attribute like “SSD Life Left“ one might use this mechanism to observe in which time interval the flash writes will lead to a decrease in this value. This might be useful to estimate the remaining lifetime of this SSD according to the vendor specifications.

Configuration

The options for the extended S.M.A.R.T. configuration can be found under Settings/S.M.A.R.T./Configuration. This will open a new dialog where all the S.M.A.R.T. warning configuration options can be defined.

The default is to configure all hard drives using the same settings. If you want to you may also configure every hard drive separately (in case of removable drives: the drive you want to configure has to be present in the system when you open the dialog, but the configuration is saved and applied even after you remove/reattach the drive later). If you want to configure all drives independently or if want to use one configuration for all drives can be selected in the top right of the configuration dialog.

S.M.A.R.T. configuration
Configuration window for S.M.A.R.T. warnings

For every of the three available categories you can specify if you want to enable/disable warnings in general and for the categories 'Caution' and 'Information' you can select which attributes should be taken into consideration. During the installation procedure, Argus Monitor will will pre-configure the warnings based on our recommendations. If you have changed some values and want to revert to the this default setting, there is a button 'Default' in the configuration dialog allowing for that.

In the lower part of the configuration dialog you can specify the action that should be taken once one of the configured S.M.A.R.T. checks issues a warning. You can select a Messagebox to be shown that will remain on the desktop until you dismiss it by clicking OK. Additionally there is a separate notification window available, that will inform you of events but will fade out automatically after a few seconds. Other options include playing a sound, logging the event to the Argus Monitor event log file, the execution of an external program or sending an email.

Further references

Quote from a study done by Google2

“Work at Google on over 100,000 drives over a 9-month period found correlations between certain SMART information and actual failure rates. In the 60 days following the first off-line scan uncorrectable error on a drive (SMART attribute 0xC6 or 198), the drive was, on average, 39 times more likely to fail than it would have been if no such error occurred. First errors in reallocations, offline reallocations (SMART attributes 0xC4 and 0x05 or 196 and 5) and probational counts (SMART attribute 0xC5 or 197) were also strongly correlated to higher probabilities of failure. Conversely, little correlation was found for increased temperature and no correlation for usage level. However, the research showed that a large proportion (56%) of the failed drives failed without recording any count in the 'four strong S.M.A.R.T. warnings' identified as scan errors, reallocation count, offline reallocation and probational count. Further, 36% of drives failed without recording any S.M.A.R.T. error at all (except temperature), meaning that S.M.A.R.T. data alone was of limited usefulness in anticipating failures.“

Quote from a study done by Backblaze3

“There are over 70 SMART statistics available, but we use only 5. To give some insight into the analysis we’ve done, we’ll look at three different SMART statistics here. The first one, SMART 187, we already use to decide when to replace a drive, it’s really a test of the analysis. The other two are SMART stats we don’t use right now, but have potentially interesting correlations with failure.

SMART 187: Reported_Uncorrect – Backblaze uses this one.

Number 187 reports the number of reads that could not be corrected using hardware ECC. Drives with 0 uncorrectable errors hardly ever fail. This is one of the SMART stats we use to determine hard drive failure; once SMART 187 goes above 0, we schedule the drive for replacement.

This first chart shows the failure rates by number of errors. Because this is one of the attributes we use to decide whether a drive has failed, there has to be a strong correlation.“

S.M.A.R.T. configuration
SMART 187- correlated to annual failure rate

Argus Monitor developers advice

S.M.A.R.T. is not able to predict EVERY hard drive failure; e.g. in case of the failure of a drive's electronics S.M.A.R.T. cannot -- by design -- be effective. Nevertheless, failures that are directly linked to errors of the storage medium itself (the magnetic disk or the flash memory of an SSD) can be predicted in advance with a relatively high reliability.

By monitoring the five statistically most significant of the critical S.M.A.R.T. values, Argus Monitor can reliably predict most hard drive and SSD failures in time for the user to backup important data before the drive fails completely.

S.M.A.R.T. attributes

For the actual S.M.A.R.T. attributes names and explanations, please refer to this Wikipedia article:
Known ATA S.M.A.R.T. attributes
Known NVMe S.M.A.R.T. attributes

S.M.A.R.T. on RAID controllers

Hardware RAID

Some RAID controllers in the form of PCI/PCI-express expansion boards do not support accessing the S.M.A.R.T. data of the various hard disk drives attached to them. Examples for these are adapters by Promise or Silicon Image. These controllers or the drivers required to use them are blocking any software from reading the values of the SMART attributes. Unfortunately, there is no way for Argus Monitor to access S.M.A.R.T. data on these RAID configurations.

Other RAID controllers however support reading the S.M.A.R.T. data of the attached hard disk drives. Examples for this are controllers by 3ware and LSI. With these controllers, Argus Monitor can access the S.M.A.R.T. data.

Integrated RAID controllers (on the mainboard) from Intel and JMicron offer the possibility to read S.M.A.R.T. data as well and are supported by Argus Monitor. Prerequisite is that a recent driver for them is installed, which should be available for download at the website of the mainboard vendor.

Supported RAID Controllers:

  • 3ware Escalada Series 7000/8000
  • 3ware / LSI / Avago MegaRAID Series
  • Adaptec Raid Series
  • Integrated Intel RAID controllers (Intel RST, mainboard chipset)
  • Integrated JMicron / Gigabyte RAID controllers (mainboard chipset)

Software RAID

If you are running a RAID array using the software RAID functionality of your Microsoft Windows operating system, S.M.A.R.T. data can be accessed as with any other hard disk drive running in standard AHCI oder IDE mode.

S.M.A.R.T. on Network Attached Storage (NAS)

General information

In general, NAS (Network Attached Storage) devices do not provide a readout function for S.M.A.R.T. data. From a user's point of view, it is possible to read and write files - but it is not possible to communicate the temperature and S.M.A.R.T. status of the hard disks externally via an interface or to generate an external warning in the event of critical problems. However, some NAS devices perform internal S.M.A.R.T. checks under the control of the NAS operating system.

With Argus Monitor and the S.M.A.R.T. on NAS feature discussed here, it is possible to monitor the hard disks of a NAS device in the same way as the internal PC hard disks are monitored and - in case of critical S.M.A.R.T. errors. - to generate warnings within Argus Monitor.

Argus Monitor can read the S.M.A.R.T. data of a NAS device under this condition:

  • An instance of the Linux tool Smartmontools can be started cyclically on the NAS and this tools logs the S.M.A.R.T. data of the internal hard disks into a directory of the NAS.
  • This directory is accessible from the Windows PC running Argus Monitor (network share), so Argus Monitor can read the data generated by Smartmontools.

Warning

Please note that our support for problems with S.M.A.R.T. access on NAS devices can only cover the topic
Configuration in Argus Monitor
For the topic
Configuration of NAS / Smartmontools (Synology Example)
please consult the help of your NAS device or corresponding forums on the internet.
Due to the diversity of NAS devices and NAS operating systems, we cannot provide support on this topic.

Configuration of NAS / Smartmontools (Synology Example)

If necessary, install Smartmontools on your NAS. Consult the help for your device or corresponding forums on the Internet. On NAS devices of common manufacturers the Smartmontools are often already preinstalled, e.g. on Synology NAS devices. Open a command line on your NAS via SSH and determine the correct Smartmontools command to display the S.M.A.R.T. data. Please consult the Smartmontools help or appropriate forums on the internet in case of doubt. Note that Smartmontools requires sudo privileges to read the S.M.A.R.T. data.

Example for the first hard disk:
sudo smartctl -a -d sat /dev/sda

Example for the second hard disk:
sudo smartctl -a -d sat /dev/sdb

Example of a log file after successfully reading S.M.A.R.T. data using smartmontools:

S.M.A.R.T. NAS 0

  • Create a directory on the NAS where the S.M.A.R.T. log files will be stored in the future, e.g. /Smart
  • Schedule a task via the task scheduler of your NAS operating system that periodically writes the S.M.A.R.T. log files.
  • Please make sure the generated log files start with smart_ (Argus Monitor will process only S.M.A.R.T. data files which start with this name).
  • If there is no graphical task scheduler, such a task can alternatively be done via a cronjob (Linux knowledge required).

Example for task scheduling under Synology DSM 7 OS:

S.M.A.R.T. NAS 1 S.M.A.R.T. NAS 2 S.M.A.R.T. NAS 3

  • After the task was executed, check if the log files with the S.M.A.R.T. data were generated in the directory \Smart:

S.M.A.R.T. NAS 4

  • In each S.M.A.R.T. log file the different S.M.A.R.T. attributes must be present (in the section "Vendor Specific SMART Attributes with Thresholds", see picture with example log file).

This completes the configuration on your NAS device.

Configuration of Argus Monitor

  • In Argus Monitor Settings/S.M.A.R.T., enable S.M.A.R.T. over Network Attached Storage (NAS) and specify the location of the network share path where the S.M.A.R.T. data files are stored:

S.M.A.R.T. NAS 5

Argus Monitor will scan this path within its default S.M.A.R.T. check interval and process the S.M.A.R.T. data from the NAS hard disk drives. If everything is successful, you will see these NAS drives in the same way you see the internal drives:

S.M.A.R.T. NAS 6

  • In the screenshot, please note the Interface type NAS for these kind of drives.

  • Please also note that the NAS log files are scanned by Argus Monitor in a fixed time interval of 10 min, not in the set user-defined interval for internal SSDs or hard disks. This has internal technical reasons and should be sufficient for detecting S.M.A.R.T. errors on a NAS in any case.

Additional feature

Argus Monitor also supports NAS devices which are not permanently powered on but are turned on/off only at certain times by user defined energy settings within the NAS operating system (e.g. for backup purposes).

To make this work, Argus Monitor will mirror by default all S.M.A.R.T. data files from the network share to the local directory c:\Users\YourUserName\AppData\Roaming\ArgusMonitor.

In this local directory, all S.M.A.R.T. data files with an age of less than 3 days are considered valid. This way, drives from part-time powered off NAS devices will always show in Argus Monitor as long as they can be reached at least every 3 days.

The field Timestamp will show the time of the S.M.A.R.T. data.

Known issues

Hard drives of the Samsung EcoGreen F4 series, more precisely models HD204UI and HD155UI were shipped with a defective firmware that can cause loss of data under certain conditions.

As an precautionary safety measure Argus Monitor does not access S.M.A.R.T. data of these drives.

Samsung however provides a firmware upgrade to resolve this issue (F4EG.exe, 696 KB): Samsung Firmware Patch

If you have installed this firmware upgrade and want to activate the S.M.A.R.T. features of Argus Monitor for these drives you can do this by entering the custom program parameter HD204UI_ENABLE in Settings / Miscellaneous / Custom Program parameters as shown in the picture below and restarting Argus Monitor.

Please make sure that you have upgraded the firmware of all Samsung HD204UI / HD155UI drives in your system beforehand!

S.M.A.R.T. NAS 6