Restoring data in RAID setup when a disk dies
Q I have recently set up a file server using SUSE Enterprise Server 9. There are three hard disks in the system: an 80GB disk and two 120GB disks. The 80GB disk contains the OS. The two 120GB disks are formatted as two RAID 1 partitions, primarily to store user data. The RAID is software implemented via SUSE, not hardware RAID via a controller. The filesystem is ReiserFS. Everything is working fine and hopefully will for a long time. However, at some stage, one of these mirrored disks may fail and will have to be replaced. What are the processes involved in replacing a crashed mirror and restoring the data from the other drive? Are there any methods or utilities to determine the health of a RAID system? It seems to me that there is much discussion regarding the merits of RAID and implementing it but nothing, or very little, on maintenance or recovery.
A Here's a quick overview. There are several ways of examining the status of an array. The following code,
mdadm --detail /dev/md*
gives a quick overview of the status of any RAID array. The mdadm program also has a daemon mode that will run in the background. You'll need to edit /etc/mdadm.conf and test it on the command line first, then set mdadmd to start at boot in Yast > System > System Services. It will send you an email if it detects any problems. With RAID 1, if a disk fails the array carries on working using just the good disk. To replace the broken disk, first remove it from the RAID with
mdadm /dev/mdX --fail /dev/hdYn --remove /dev/hdYn
where mdX and hdYn are the array and partition device nodes respectively. Then you can power down, replace the disk with a new one, reboot, create the necessary partitions on the disk as you did when setting up the array in the first place, and add it to the array with
mdadm /dev/mdX --add /dev/hdYn
The array will be rebuilt automatically. There will be a slight reduction in performance while the rebuild takes place. Either of the two commands given for examining an array can be used to tell when the rebuilding is complete. You can use the raidtools package instead of mdadm for these tasks, but mdadm is my preferred choice - it is newer and more consistent to use. You may also consider running smartmontools to monitor the disks themselves.
Follow us on Identi.ca or Twitter