Configuring Software RAID1 on Fedora Core using Disk Druid during system install
RAID1 or mirroring configuration uses two hard drives to duplicates exactly one drive to the other. This provides hardware redundancy - if one drive fails the other can continue to operate independently. Hardware RAID is provided by the controller, which present to the operating system one logical drive and the RAID management is transparent.
Note: most of the onboard SATA RAID controllers are not a real hardware RAID, but just provide an extension for the operating system. A driver must be installed for proper use of such controllers. Also, Dell's PowerEdge 1850 and 1950 have the MegaRAID which require a driver to work properly under Linux.
If you tried to configure software RAID, and not followed the next steps, there is a good chance that you're not protected at all (did you ever test?). Further more, you might have bumped into many problems during the installation, such as disk not booting up after the installation, getting GRUB error messages during boot, system is bootable only when the primary disk is online but not when the secondary is, RAID is not working as you expect, and many more.
Configuring software RAID during the system install, using Disk Druid is not a trivial procedure. This document describe the steps you need to take in order for such configuration to work.
While writing this guide, I used two 8GB SATA hard drives; primary /dev/sda and secondary /dev/sdb. The BIOS was configured with the onboard SATA RAID disabled, and both drives were controlled directly by the BIOS. So the operating system see two hard drives.
0. To sum it up...
The following steps should be followed to achieve the goal:
- 1. Partition and configure RAID using Disk Druid
- 2. Build the RAID arrays
- 3. Configure GRUB
- 4. Test
Additional important steps:
- 5. Check RAID status and put the RAID on monitor
- 6. Recover from disk failure (god forbid)
1. Partition and configure RAID using Disk Druid
During the installation of Fedora, you'll be asked if to automatically partition using Disk Druid, or manually partition. No matter which you choose, you should delete all the existing partitions and start with a clean drivers (it will delete all your existing data - you should know):
There are 3 partitions you should create. /boot, swap and / (also referred as root). Our goal is to have both root and /boot partitions on the RAID1. It is unwise to put the swap on the software RAID as it will cause unnecessary overhead.
Important: The /boot partition should be the first one on the disk, i.e. start at cylinder 1. In addition, make sure you set "Force to be a primary partition" on each partition you create (unless you know what you're doing). A /boot partition of 100MB should be enough for most configurations.
Let's start with creating the /boot partition. Click on the RAID button and choose "Create a software RAID partition":
For the File System Type choose "Software RAID", select drive first drive and set a fixed size of 100MB:
Repeat the same for the second drive, resulting in two software RAID partitions of 100MB, one on each drive. Those partitions are now ready for RAID device and mount point creation:
Click on the RAID button and now choose "Create a RAID device". For the Mount Point choose "/boot", RAID Level should be RAID1, on device md0, as shown in the following figure:
Now create a swap partition. The swap partition size should at least match the size of the RAM. Swap should not reside on the software RAID so all you need to do is to click on New, and create a swap on each hard drive. The result will be two swap partitions, one on each drive:
Now, after creating the /boot and the swap partition, allocate the remaining free space as md1 and create the root partition on it. You should be now familiar with the steps. The final results of the partitioning should be similar to the following figure:
Complete the Fedora installation. When the system reboot it will probably halt ;( prior to loading GRUB. Error message may vary between file system errors, kernel panic, and GRUB error 17.
Don't be frustrated (yet) as there are some more actions you need to take.
2. Build the RAID arrays
Boot from the first installation CD, but instead of starting the installation type "linux rescue" to start the command prompt rescue mode. On the command prompt set a new root and build the RAID array:
RAID status is reported through the file /proc/mdstat. Let's view it and see how our RAID is performing:
If you see similar results then the RAID configuration is correct. The [UU] means that both hard drives are up. But although the RAID is
configured, it is not performing correctly as it is not set as "hot". Run the following command to "hotadd" and
rebuild the array:
During the rebuild you can cat /proc/mdstat to check the current progress and status. This process might take some time - depends on the sizes of the partitions.
Important: Wait until the process is done before you continue to the next step.
3. Configure GRUB
The first drive (on my system is /dev/sda) is not yet bootable. In following actions we complete the GRUB loader installation on the both drives and set the /boot as bootable.
Continue working on the rescue mode command prompt, and load GRUB shell:
On the GRUB shell type the following commands to re-install the boot loader on both drives, so when (not if - when!) each of the drive will fail or crash, your system will still boot. You might need to substitute
the hard drive location to match your system configuration:
Quit and boot from the hard disk. The system should load. Don't skip the testing stage to make sure everything is REALLY working properly.
The best way to test is physically unplug each drive, and see if the system is boot with only the other drive connected (make sure you power down the system before unplugging the drive).
Important: Testing causes your RAID to be degraded. This means that after you reconnect the drive you must hotadd the drive back to the array using mdadm /dev/mdx --add /dev/sdxx command.
If the test completed successfully and your system is booting from each drive, then you're basically done. Though I suggest that you'll continue with the next procedures to learn more incase you'll have a major crisis (touch wood).
5. Check RAID status and put the RAID on monitor
There are several ways to check the current status of your RAID, the best is using the mdadm --detail command. In the following example you can see that the RAID is degraded. Only /dev/sdb1 is active while the other one /dev/sda1 is missing from the RAID.
Other ways of checking RAID is by checking the system log:
And and always you can check the content of the /proc/mdstat file
Now we'll put a monitor daemon that will send an email alert when there is a problem with the RAID:
To test that emails are working, add a -t argument to the above line, and a test email will be sent. Don't forget to kill the test process you just created. It is recommended to put this line inside /etc/rc.local so it will automatically load after the system boots.
6. Recover from disk failure
When you encounter a failure in the RAID, the first thing I would suggest is that you DON'T PANIC! You should still be able to access your data and even boot, but the next thing you should do is to backup all the data. It happened to me once that after a disk failure, I accidentally deleted the good disk as well.... Luckily I didn't panic, and made a complete backup prior to any other actions I took :)
So, after you took a cold glass of water, and backed up all the data you need to identify the faulty disk by checking the content of the /proc/mdstat file. In my example below you can see that /dev/sda3 is no longer a member of the RAID, and obviously the RAID is not performing:
/dev/sda is the SATA hard drive connected to the first SATA controller. I physically removed it from the system and replaced it by a new one. Note that /dev/sda1, which resides on the same hard drive did not fail, but when I replace the faulty drive I will have to rebuild both arrays.
When you plug in a new hard drive, you don't have to worry about the size of the disk - just make sure it is larger than the one you already have installed. The free space in the new drive will not be a member in the RAID.
After replacing the faulty disk the partition table is to be created using fdisk, based on the exact partition table of the good disk. Here, /dev/sda is a completely new 250GB hard drive.
Before you continue - are you sure everything is on backup? If so, then load fdisk with the new disk as a parameter. My inputs are highlighted. You will have to adjust the input to match your own system.
Here is explanation of the procedure:
- Create 3 primary partitions using the (n) command - the sizes are based on the info from the good and working drive.
- Set partition #1 as bootable using the (a) command.
- Change the partitions system id using the (t) command - partition #1 and #3 to type fd, and partition #2 to type 82.
- Save the changes to the partition table using the (w) command.
EDIT (2008-01-15): Reinstall GRUB boot loader on the failed drive as described on Step #3 above.
Now the new hard drive is ready to participate in the RAID. We just need to hotadd it to the RAID using mdadm /dev/mdx --add /dev/sdxx command:
and check the content of the /proc/mdstat file to make sure everything is working properly.