Q Lately Seti@home is giving a signal 11 error while it does its number crunching. This error has been related to problems with hard disks. The problem is that I don't know what the best way to check my hard disk is. I use Kubuntu 8.04, and have four partitions: one for Windows (vfat), swap, root (ext3) and /home (ReiserFS). I had the ReiserFS partition because it was the default when I installed my first Linux (SUSE 7.2). I think, though I could be wrong, that I haven't formatted or checked the /home partition since I created it six years ago, which I believe is quite impressive (for me, at least) - six years without any problem! What should I do? Boot using a Live CD? Just use some CLI magic? Or is there something graphical I should use?
A There are two separate entities to consider here: the physical disk and the filesystem residing on that disk. The filesystem is the most straightforward, as you can simply run fsck over it. Of course, nothing is quite that simple, and you should not try to fsck a mounted filesystem. You could unmount /home, but that would only work if you've set up a root login. Then you could logout of the desktop, switch to a virtual console with Ctrl+Alt+F1 and log in as root. You need to log out of the desktop because you cannot unmount /home while any user but root is logged in, this is why you need a separate root login and cannot use sudo. Now you can run fsck on that partition
The alternative is to use a Live CD or DVD, which would also enable you to fsck the root partition. This is usually the simpler option if you do not mind rebooting the computer. Any Live CD distro will contain the fsck tools, but my current favourite for this type of work is GRML (www.grml.org). This is based on Debian and aimed at system administration and rescue. Another alternative would be SystemRescue CD (www.sysresccd.org). Fsck checks the filesystem for corruption but does not look at the underlying hardware. For this, Smartmontools is a good choice. It's included in the software repositories of most distros.
Smart (Self-Monitoring, Analysis, and Reporting Technology) is a way for hard disks to monitor their own performance, running a set of self-tests to detect and even predict failure. You may need to enable an option in your computer's BIOS for Smart to work, then install Smartmontools and edit /etc/smartd.conf to tell it which drives to test and how. A good starting point is
/dev/sda -d sat -I 194 -I 231 -I 9 -W 5 -a -m email@example.com
This checks /dev/sda, a SATA drive, to ignore attributes 9 (power on time), 194 and 231 (temperature) but report temperature changes more than five degrees. The -a option says to monitor all other attributes, while -m gives an email address for errors and warnings. If you have a PATA/IDE drive, use d ata. Set smartd to start at boot in your services manager and your driv (s) will be continually monitored. You can run an immediate health check on the drive with
smartctl --health -d sat /dev/sda
and run various tests with
smartctl --test=TEST -d sat /dev/sda
where TEST is one of offline, short or long. See the smartd and smartctl man pages for (a lot) more information on the various options, attributes and tests you can use.
Follow us on Identi.ca or Twitter