Configuring Software RAID1 on Fedora Core using Disk Druid during system install

RAID1 or mirroring configuration uses two hard drives to duplicates exactly one drive to the other. This provides hardware redundancy - if one drive fails the other can continue to operate independently. Hardware RAID is provided by the controller, which present to the operating system one logical drive and the RAID management is transparent.

Such RAID controllers manufacturers are Adaptec, LSI MegaRAID and 3ware. The last provides drivers for all operating system). Be aware of performance issues involved with software RAID.

Note: most of the onboard SATA RAID controllers are not a real hardware RAID, but just provide an extension for the operating system. A driver must be installed for proper use of such controllers. Also, Dell's PowerEdge 1850 and 1950 have the MegaRAID which require a driver to work properly under Linux. 

If you tried to configure software RAID, and not followed the next steps, there is a good chance that you're not protected at all (did you ever test?). Further more, you might have bumped into many problems during the installation, such as disk not booting up after the installation, getting GRUB error messages during boot, system is bootable only when the primary disk is online but not when the secondary is, RAID is not working as you expect, and many more.

Configuring software RAID during the system install, using Disk Druid is not a trivial procedure. This document describe the steps you need to take in order for such configuration to work.

While writing this guide, I used two 8GB SATA hard drives; primary /dev/sda and secondary /dev/sdb. The BIOS was configured with the onboard SATA RAID disabled, and both drives were controlled directly by the BIOS. So the operating system see two hard drives.

 

0. To sum it up...

The following steps should be followed to achieve the goal:

  • 1. Partition and configure RAID using Disk Druid
  • 2. Build the RAID arrays
  • 3. Configure GRUB
  • 4. Test

Additional important steps:

  • 5. Check RAID status and put the RAID on monitor
  • 6. Recover from disk failure (god forbid)

 

1. Partition and configure RAID using Disk Druid

During the installation of Fedora, you'll be asked if to automatically partition using Disk Druid, or manually partition. No matter which you choose, you should delete all the existing partitions and start with a clean drivers (it will delete all your existing data - you should know):

There are 3 partitions you should create. /boot, swap and / (also referred as root). Our goal is to have both root and /boot partitions on the RAID1. It is unwise to put the swap on the software RAID as it will cause unnecessary overhead.
Important
: The /boot partition should be the first one on the disk, i.e. start at cylinder 1. In addition, make sure you set "Force to be a primary partition" on each partition you create (unless you know what you're doing). A /boot partition of 100MB should be enough for most configurations.

Let's start with creating the /boot partition. Click on the RAID button and choose "Create a software RAID partition":

For the File System Type choose "Software RAID", select drive first drive and set a fixed size of 100MB:

Repeat the same for the second drive, resulting in two software RAID partitions of 100MB, one on each drive. Those partitions are now ready for RAID device and mount point creation:

Click on the RAID button and now choose "Create a RAID device". For the Mount Point choose "/boot", RAID Level should be RAID1, on device md0, as shown in the following figure:

Now create a swap partition. The swap partition size should at least match the size of the RAM. Swap should not reside on the software RAID so all you need to do is to click on New, and create a swap on each hard drive. The result will be two swap partitions, one on each drive:

Now, after creating the /boot and the swap partition, allocate the remaining free space as md1 and create the root partition on it. You should be now familiar with the steps. The final results of the partitioning should be similar to the following figure:

Complete the Fedora installation. When the system reboot it will probably halt ;( prior to loading GRUB. Error message may vary between file system errors, kernel panic, and GRUB error 17.

Don't be frustrated (yet) as there are some more actions you need to take.

 

2. Build the RAID arrays

Boot from the first installation CD, but instead of starting the installation type "linux rescue" to start the command prompt rescue mode. On the command prompt set a new root and build the RAID array:

sh-3.00# chroot /mnt/sysimage

RAID status is reported through the file /proc/mdstat. Let's view it and see how our RAID is performing:

[root@raidtest ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb3[1] sda3[0]
      7060480 blocks [2/2] [UU]
     
md0 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]

If you see similar results then the RAID configuration is correct. The [UU] means that both hard drives are up. But although the RAID is
configured, it is not performing correctly as it is not set as "hot". Run the following command to "hotadd" and
rebuild the array:

[root@raidtest ~]# mdadm /dev/md0 --add /dev/sda1
[root@raidtest ~]# mdadm /dev/md1 --add /dev/sda3

During the rebuild you can cat /proc/mdstat to check the current progress and status. This process might take some time - depends on the sizes of the partitions.
Important: Wait until the process is done before you continue to the next step.

 

3. Configure GRUB

The first drive (on my system is /dev/sda) is not yet bootable. In following actions we complete the GRUB loader installation on the both drives and set the /boot as bootable.

Continue working on the rescue mode command prompt, and load GRUB shell:

sh-3.00# grub

On the GRUB shell type the following commands to re-install the boot loader on both drives, so when (not if - when!) each of the drive will fail or crash, your system will still boot. You might need to substitute
the hard drive location to match your system configuration:

grub> device (hd0) /dev/sda
grub> root (hd0,0)
grub> setup (hd0)

grub>
device (hd1) /dev/sdb
grub> root (hd1,0)
grub> setup (hd1)

Quit and boot from the hard disk. The system should load. Don't skip the testing stage to make sure everything is REALLY working properly.

 

4. Test

The best way to test is physically unplug each drive, and see if the system is boot with only the other drive connected (make sure you power down the system before unplugging the drive).
Important: Testing causes your RAID to be degraded. This means that after you reconnect the drive you must hotadd the drive back to the array using mdadm /dev/mdx --add /dev/sdxx command.


If the test completed successfully and your system is booting from each drive, then you're basically done. Though I suggest that you'll continue with the next procedures to learn more incase you'll have a major crisis (touch wood).

 

5. Check RAID status and put the RAID on monitor

There are several ways to check the current status of your RAID, the best is using the mdadm --detail command. In the following example you can see that the RAID is degraded. Only /dev/sdb1 is active while the other one /dev/sda1 is missing from the RAID.

[root@raidtest ~]# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Sun Jul 22 08:25:21 2007
     Raid Level : raid1
     Array Size : 104320 (101.88 MiB 106.82 MB)
    Device Size : 104320 (101.88 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Wed Aug  1 15:08:24 2007
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 08ed38e5:7ffca26e:f5ec53fc:e5d1983e
         Events : 0.1423

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       8       17        1      active sync   /dev/sdb1

Other ways of checking RAID is by checking the system log:

[root@raidtest ~]# tail -n 50 /var/log/messages

Or:

[root@raidtest ~]# dmesg

And and always you can check the content of the /proc/mdstat file

[root@raidtest ~]# cat /proc/mdstat

Now we'll put a monitor daemon that will send an email alert when there is a problem with the RAID:

[root@raidtest ~]# mdadm --monitor --scan --mail=you@domain.com delay=3600 --daemonise /dev/md0 /dev/md1

To test that emails are working, add a -t argument to the above line, and a test email will be sent. Don't forget to kill the test process you just created. It is recommended to put this line inside /etc/rc.local so it will automatically load after the system boots.

 

6. Recover from disk failure

When you encounter a failure in the RAID, the first thing I would suggest is that you DON'T PANIC! You should still be able to access your data and even boot, but the next thing you should do is to backup all the data. It happened to me once that after a disk failure, I accidentally deleted the good disk as well.... Luckily I didn't panic, and made a complete backup prior to any other actions I took :)

So, after you took a cold glass of water, and backed up all the data you need to identify the faulty disk by checking the content of the /proc/mdstat file. In my example below you can see that /dev/sda3 is no longer a member of the RAID, and obviously the RAID is not performing:

[root@raidtest ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb3[1]
      7060480 blocks [2/1] [_U]
     
md0 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]

/dev/sda is the SATA hard drive connected to the first SATA controller. I physically removed it from the system and replaced it by a new one. Note that /dev/sda1, which resides on the same hard drive did not fail, but when I replace the faulty drive I will have to rebuild both arrays.
When you plug in a new hard drive, you don't have to worry about the size of the disk - just make sure it is larger than the one you already have installed. The free space in the new drive will not be a member in the RAID.

After replacing the faulty disk the partition table is to be created using fdisk, based on the exact partition table of the good disk. Here, /dev/sda is a completely new 250GB hard drive.

[root@raidtest ~]# fdisk -l

Disk /dev/sda: 250.0 GB, 250058268160 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdb: 8.0 GB, 8589934592 bytes
255 heads, 63 sectors/track, 1019 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104391   fd  Linux raid autodetect
/dev/sdb2              14         140     1020127+  82  Linux swap / Solaris
/dev/sdb3             141        1019     7060567+  fd  Linux raid autodetect

Disk /dev/md0: 106 MB, 106823680 bytes
2 heads, 4 sectors/track, 26080 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table

Disk /dev/md1: 7229 MB, 7229931520 bytes
2 heads, 4 sectors/track, 1765120 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Before you continue - are you sure everything is on backup? If so, then load fdisk with the new disk as a parameter. My inputs are highlighted. You will have to adjust the input to match your own system.

[root@raidtest ~]# fdisk /dev/sda

The number of cylinders for this disk is set to 30401.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-30401, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-30401, default 30401): 13

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (14-30401, default 14): 14
Last cylinder or +size or +sizeM or +sizeK (14-30401, default 30401): 140

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (141-30401, default 141): 141
Last cylinder or +size or +sizeM or +sizeK (141-30401, default 30401): 1019

Command (m for help): a
Partition number (1-4): 1

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): 82
Changed system type of partition 2 to 82 (Linux swap / Solaris)

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Here is explanation of the procedure:

  • Create 3 primary partitions using the (n) command - the sizes are based on the info from the good and working drive.
  • Set partition #1 as bootable using the (a) command.
  • Change the partitions system id using the (t) command - partition #1 and #3 to type fd, and partition #2 to type 82.
  • Save the changes to the partition table using the (w) command.

EDIT (2008-01-15): Reinstall GRUB boot loader on the failed drive as described on Step #3 above.

Now the new hard drive is ready to participate in the RAID. We just need to hotadd it to the RAID using mdadm /dev/mdx --add /dev/sdxx command:

[root@raidtest ~]# mdadm /dev/md0 --add /dev/sda1
[root@raidtest ~]# mdadm /dev/md1 --add /dev/sda3

and check the content of the /proc/mdstat file to make sure everything is working properly.


If this page has helped you and you would like to contribute to this web site, please donate. Small amounts like $5 are helpful and will be gratefully accepted. Thank you!

Comments

Anonymous:

This was THE best tutorial I have come across..
Great JOB.

Anonymous:

do you have to :

grub> device (hd0) /dev/sda
grub> root (hd0,0)
grub> setup (hd0)

the new disk after a fail?

adambengur:

Yes. After you remove the failed drive and installed a new one instead -you need to re-install the grub boot loader on the new disk so the system will boot properly.
Thanks for pointing my attention towards this.

Anonymous:

wow amazing tutorial.

A huge thank you!!!!

E.

Anonymous:

Do I need to do 'Comment 2' (rebuild the grub) if I am just putting back the 2nd drive that worked but I just removed it as a test?

Anonymous:

Your tutorial was exactly what I needed for a box i am working on. One note though Fedora 7 boots automatically. No problems there :)

adambengur:

Reinstalling the grub is not necessary if you removed the drive for testings and reinstalled in back, as the grub loader is already installed on that drive. But if you wanna be on the safe side - just reinstall it - no harm will be done.

Anonymous:

hi,

i already installed fedora by following your instructions. Unfortunately, i cant get past item #2 Build the raid arrays. when i entered the command chroot /mnt/sysimage to change my root an error statement keeps on showing "cannot change root directory to /mnt/sysimage; no such file or directory"

help please.

- Open Noob

adambengur:

After you enter the rescue mode you need to mount the filesystem.
Type fdisk -l and try to mount the filesystem manually:
mount /dev/hda2 -t ext3 /mnt/filesystem (you might need to mkdir /mnt/filesystem first)
Then you should be able to chroot /mnt/filesystem and continue from there.

Anonymous:

hi
i have a problem on step 2:
after resync, when i run the
command mdadm /dev/md0 --add /dev/sda1
i get the message
mdadm: Cannot open /dev/sda1: Device or resource busy

same thing on sda3

help please

adambengur:

Make sure that your BIOS is not configured on RAID - as it might confiuse mdadm.
Also, make sure you start the computer on rescue mode by booting from the first installation CD and typing "linux rescue".
It is also possible the mdadm discovered the superblock and automatically loaded it, so you might wanna try to stop mdadm:
mdadm --stop /dev/md0
mdadm --stop /dev/md1

Anonymous:

hi, its great. I just do the same on centos 5.1 but i have one BIG problem: I have server with only sda and sdb will be in a few weeks. I think it is impossible to do it in diskdruid without sdb. Well. I did everything manually with mdadm, mke2fs, mkswap, swapon.. everything with missing sdb. But after reboot with install dvd diskdruid is not able to use already made mdx with fill mount point. It says Mount point: not applicable
Help, please
-how to do it with diskdruid ?
-how to do it manually and skip diskdruid in installation?
-how to do it anyway?

thanks very much, Alex

adambengur:

Hi Alex (#12), after you install the second hard drive sdb, create the partitions using fdisk, and "hotadd" the new drive to the RAID array.
So basically you need to follow step #6 above, as if you were recovering from a disk failure.
When you're done, don't forget to install GRUB on sdb as described in step #3. And I strongly suggest to do the testing when it's all done.

Anonymous:

Hi, I have no raid and no installed system. I have only one clean hard drive sda. I dont know how to build raid with diskdruid (it has no "missing disk" feature). Try to plug only sda into server and install system with raid. After installation (after diskdruid) there is no problem with degraded array. But how to use diskdruid, or how to skip diskdruid during installation?

Alex

Anonymous:

Hi Adambengur,

This is really great article with lots of GUI. I really appreciate it. I have one question. If I have to create 6-7 partitions, so when shall I create these partitions ie. at the time of RAID or after the installation is over. If after installation, can you please tell me how to create them.
Thanks in advance.

Manas

manas:

Hi,

I want to create 5 partitions. Can, I create while doing RAID or will I have to do after installation. If after installation, please tell how to do it on both hard disks. Thanks in advance.

Manas

adambengur:

Hello Manas, follow the instructions in step #1 to create the boot and swap partitions as those two are required. Then, instead of allocating all the free space to the root partition, you can create as many partitions as you want (such as /home /var etc). Create them as type Software RAID, and choose /dev/md2 /dev/md3 etc as the mount point.
I suggest that you'll do the partitions during the installation of the OS, and not after the installation. Good luck!

Anonymous:

Excellent!! Thanks for this very helpful article.

Works perfectly on CentOS 4.6.

Anonymous:

Hi Ben-Gur, good work!!!

I was installing Red Hat Enterprise Linux 5 on HP Proliant ML 150 G3 Server using your tutorial. But, I´m not sure if this guide will have the same result on hot-plug hard disks. When you say "make sure you power down the system before unplugging the drive", is it necessary for hot-plug drives? When I unplug the primary hard disk on raid 1 the system freezes. Is it possible to configure the linux to be sure that the system will stay up in this situation?

adambengur:

You should probably do a full test to make sure you can recover, before you put this system on production environment.

1. At first you should determine which drive has failed. You can find that information from /proc/mdstat and mdadm
#mdadm -D /dev/md0
(For testing purposes you can fail the drive manually by running mdadm -f)

2. Then remove all the partitions that reside on the failed drive from the RAID array.
#mdadm -r /dev/md0 /dev/sda1
It is possible that this step is done automatically.

3. Try to hot-unplug your drive while the system is running. If it doesn't freeze you can proceed with the recovery. If it does... well, you might need to reboot after all.

Then continue the recovery procedure: plug a new drive, create the partitions using fdisk, install GRUB boot loader on the new drive, and hotadd the new drive to the RAID array.

When you're done reboot and check that the new RAID is working correctly.
Good luck!

Anonymous:

Dear Adambengur,

I have one doubt. According to your tutorial, only /boot should be primary. What about / ? Should it be "Primary" or an ordinary partition. Since, while installing any OS, we keep / as "Primary" partition. Please clarify.

Thanks
Manas

adambengur:

Hi Manas. Boot partition ought to be a primary partition. The rest depends on your configuration and requirements. It's better to have only primary partitions - in terms of data recovery. Keep in mind that there is a limit to the maximum number of primary partitions you can set.
Check here: http://www.linfo.org/primary_partition.html

rajasekar:

mdadm --stop /dev/md0
mdadm --stop /dev/md1

After executing the above command Device or resources busy error is coming. I am not able to hoadd the two RAID disk.

=======================SNAP==================================================
dambengur:

Make sure that your BIOS is not configured on RAID - as it might confiuse mdadm.
Also, make sure you start the computer on rescue mode by booting from the first installation CD and typing "linux rescue".
It is also possible the mdadm discovered the superblock and automatically loaded it, so you might wanna try to stop mdadm:
mdadm --stop /dev/md0
mdadm --stop /dev/md1

adambengur:

rajasekar, hardware RAID controllers can confuse the software RAID. Make sure that the RAID option is disabled in the BIOS and that you don't have any other RAID controller installed. If you followed the exact steps on stage 1, and you're on rescue mode (or single user mode), then you might have a hardware issue. Try to investigate using dmesg.

Anonymous:

Dear Adambengur,

I installed Centos 4.5 with 2 hard disk using RAID 1 as per your tutorial. After checking also, it seemed to work fine. I have one problem, when I do "fdisk -l" it shows the following error.

Disk /dev/md1 doesn't contain a valid partition table
Disk /dev/md2 doesn't contain a valid partition table
Disk /dev/md3 doesn't contain a valid partition table
Disk /dev/md4 doesn't contain a valid partition table
Disk /dev/md5 doesn't contain a valid partition table
Disk /dev/md6 doesn't contain a valid partition table

What has gone wrong or is this OK? Please clarify.

Thanks
Manas

[root@localhost ~]# fdisk -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 fd Linux raid autodetect
/dev/sda2 14 26121 209712510 fd Linux raid autodetect
/dev/sda3 26122 39175 104856255 fd Linux raid autodetect
/dev/sda4 39176 60801 173710845 5 Extended
/dev/sda5 39176 45702 52428096 fd Linux raid autodetect
/dev/sda6 45703 52229 52428096 fd Linux raid autodetect
/dev/sda7 52230 54840 20972826 fd Linux raid autodetect
/dev/sda8 54841 57451 20972826 fd Linux raid autodetect
/dev/sda9 57452 58756 10482381 82 Linux swap
/dev/sda10 58757 60801 16426431 fd Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 104391 fd Linux raid autodetect
/dev/sdb2 14 26121 209712510 fd Linux raid autodetect
/dev/sdb3 26122 39175 104856255 fd Linux raid autodetect
/dev/sdb4 39176 60801 173710845 5 Extended
/dev/sdb5 39176 45702 52428096 fd Linux raid autodetect
/dev/sdb6 45703 52229 52428096 fd Linux raid autodetect
/dev/sdb7 52230 54840 20972826 fd Linux raid autodetect
/dev/sdb8 54841 57451 20972826 fd Linux raid autodetect
/dev/sdb9 57452 58756 10482381 82 Linux swap
/dev/sdb10 58757 60801 16426431 fd Linux raid autodetect

Disk /dev/md0: 106 MB, 106823680 bytes
2 heads, 4 sectors/track, 26080 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Device Boot Start End Blocks Id System

Disk /dev/md7: 16.8 GB, 16820535296 bytes
2 heads, 4 sectors/track, 4106576 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md7 doesn't contain a valid partition table

Disk /dev/md5: 21.4 GB, 21476081664 bytes
2 heads, 4 sectors/track, 5243184 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md5 doesn't contain a valid partition table

Disk /dev/md6: 21.4 GB, 21476081664 bytes
2 heads, 4 sectors/track, 5243184 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md6 doesn't contain a valid partition table

Disk /dev/md1: 53.6 GB, 53686304768 bytes
2 heads, 4 sectors/track, 13107008 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md2: 53.6 GB, 53686304768 bytes
2 heads, 4 sectors/track, 13107008 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md2 doesn't contain a valid partition table

Disk /dev/md3: 107.3 GB, 107372675072 bytes
2 heads, 4 sectors/track, 26214032 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md3 doesn't contain a valid partition table

Disk /dev/md4: 214.7 GB, 214745481216 bytes
2 heads, 4 sectors/track, 52428096 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md4 doesn't contain a valid partition table

Anonymous:

Wonderful tutorial, the best out there i'd say. Work like a breeze. At least passed the software raid phobiaz. May you be rewarded.

Farihin Fong

adambengur:

Manas (26), according to this thread http://www.clarkconnect.com/forums/showflat.php?Cat=0&Number=83483&Main=..., changes in the partition table were not saved. Try to restart the process, and make sure you're not skipping any command in fdisk.

adambengur:

Thanks Farihin Fong!

Anonymous:

Thanks Adambengur,

You were right. Changes to the partition were not saved. I think, you can include this part of partition writing in your tutorial since, many users face this problem of invalid partition.

Anonymous:

hi,
tutorial was perfect, the only prob is when I reach point 2 and I hotadd using mdadm /dev/md0 --add /dev/sda1, I get the following message:

mdadm: cannot open /dev/sda1: device or resource busy

any idea to go on?
tx
francesco

geospine:

In step 3, why are the grub commands different for drives sda and sdb? If the partitions are mirrored wont the changes be nullified anyway?

adambengur:

Hi #30. Please read my comment on #11 to overcome the device or resource busy problem.

adambengur:

Hi geospine. Software RAID is controlled by the kernel (or operating system). In stage 3 you are running in rescue mode which does not use the same kernel, hence the RAID is actually not operational. This is why you should install the GRUB on both drives.
If you'll install GRUB only on one drive the RAID will still be operational, but only one drive will be bootable. So if the bootable drive crashes (god forbid) then you'll have hard time accessing your data again (although possible, just more actions needs to be taken).

Anonymous:

Hi all. Nice tutorial and comments. Setting up RAID 1 on Fedora 7 as above, I encountered the following:

After configuring the partitions, setting time zone, host name and choosing software packages to install, my system returned the error:

"An error occurred trying to format md1. The is a serious error. Click enter to restart."

This happened several times as I restarted and tried again and again. The fix was to choose Create custom partition, then leave the partitions already created except for the RAID partitions, which I deleted and recreated. It is a mystery why this worked, but it may be that the RAID partitions were incorrectly numbered and recreating them after reboot gave them the correct numbers.

Best of luck.

David

adambengur:

Thanks for your comment David!

Anonymous:

I get the same "device or resource busy errors" and nothing seems to help. The bios is correct and I definitely booted into "Linux rescue" mode. Here's some extra data:

Run:
mdadm --stop /dev/md0

...gives the same resource busy errors

Run:
#df

/dev/md1 mounted on /
/dev/md0 mounted on /boot

Run:
#umount /dev/md1

...it works, but when I re-run df, it shows /dev/md1 still mounted.

I can't run mdadm on anything that's already mounted. How do I get them to not mount in the first place?

Don

PS - I agree, great tutorial

Anonymous:

Here's a clue on the "...device or resource busy" problem. It seems that certain chipsets result in dmraid is being loaded (fakeraid driver?). I don't quite understand it all, but this thread explains it:

http://www.brandonchecketts.com/archives/disabling-dmraid-fakeraid-on-ce...

...and yes, I have nvidia sata hardware, so it makes sense. How do I get it out of that mode?

Don

Anonymous:

Great tutorial . I was wondering if you know how to make
a second hard drive with a raid1 config boot. I followed the
steps in your tutorial and i cant get my second drive to boot
when i unplug the first one.
Do you think its posible to install grub into the second hard
drive during the instalation?

thanks a lot
Herlit11

Anonymous:

i did with fedora core 9 as your manual.i can boot with any harddisk.but half of boot process was stopping & appearing like that.

If i remove any harddisk. this error message appear like that

Checking Filesystems
/dev/md1:clean, 4543466/4546567files,464564/454354 blocks
fsck.ext3 : Invalid argument while trying to open /dev/md0 /dev/md0:
--------------------------( general text message )
--------------------------( general text message )
Give root password for maintenance
(or type Control-D to continue) :

what should i do?plz point to me.
Aung lay

adambengur:

#39 - continue with the maintenance by supplying the root password, as it seems that fsck discovered inconsistency in your filesystem.
It is possible the inconsistency was caused due to the test you made to make sure each harddisk can boot.
If maintenance went well, you might need to reinstall grub on both disks using rescue mode (boot from CD and jump to step #3 above).
In any case, if you have important data - make sure you backed up everything.

adambengur:

#38 - It seems thatyou missed the GRUB installation as described in step #3 above. Boot from CD, select rescue mode, and jump to step #3 above.

Anonymous:

i got still this error after i did step 3.i can boot any harddisk but half of process, i still receive this error.i can't use fsck under raid mod as u know.Let me know something that i was unplugging one harddisk then i was booting only left one harddisk.otherwiese will i connect with another next one harddisk with left raid1 harddisk.or can i run only left one harddisk? plz answer to me with ur knindness.

aung lay

Anonymous:

i got still this error after i did step 3.i can boot any harddisk but half of process, i still receive this error.i can't use fsck under raid mod as u know.Let me know something that i was unplugging one harddisk then i was booting only left one harddisk.otherwiese will i connect with another next one harddisk with left raid1 harddisk.or can i run only left one harddisk? i'm in trouble.i'm in hurry with my important data.So i need to test firstly before i save my important data into the raid 1.plz answer to me with ur knindness.

aung lay

adambengur:

Aung lay, I suspect that you missed part of a stage in the process. Make sure you follow it exactly. Make sure the partitions you make are the exact size in each physical disk.
Running mdadm sometime takes time. You can monitor the progress by running "cat /proc/mdstat". Proceed only if the prior action completed.
Make sure the RAID is working properly before doing tests on it.

Anonymous:

if i type like this command in mylinux : fdisk -l

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 fd Linux raid autodetect

/dev/sda2 14 526 3423232+ 82 Linux swap /Solaris

/dev/sda3 527 30401 239970937+ fd Linux raid autodetect

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 104391 fd Linux raid autodetect

/dev/sdb2 14 526 3423232+ 82 Linux swap /solaris

/dev/sdb3 527 30401 239970937 fd Linux raid autodetect.

if i type like that : cat /pro/mdmstat

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sda1[1] sdb1[0]
104320 blocks [2/2] [UU]

md1 : active raid1 sdb3[2] sda3[1]
239970816 blocks [2/1] [_U]
[=====>...............] recovery = 27.2% (65495360/239970816) finish=38.1min speed=76133K/sec

unused devices:
[root@linuxsrv ~]#

if i type like that : vi /etc/fstab

/dev/md1 / ext3 defaults 1 1
/dev/md0 /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
UUID=2a3d0bfc-ff98-40a4-a40c-8231f236af44 swap swap defaults 0 0
UUID=0fc71bbd-5091-43d7-8956-bfbbb31ce83c swap swap defaults 0 0
Can i boot only left mirror hardisk.otherwise will i add another new harddisk before boot when after failure.
~plz help me with ur kindness.

adambengur:

It seems that you're not waiting for the recovery process to complete ([=====>...............] recovery = 27.2% (65495360/239970816) finish=38.1min speed=76133K/sec)
If you reboot the server during the RAID creation/recovery/rebuild process then the data will be corrupted on the second disk.
Make sure you "cat /proc/mdmstat" and proceed only when process is complete.

Anonymous:

yes.i did complete.i copied this sentence when i was recovering my raid process.plz suggest to me.

Anonymous:

Thank you for your tutorial. After removing one disk the system hangs on grub. On another machine it booted to rescue mode. The Problem is that the boot partition /dev/md0 is not started, but the data partition /dev/md1 is started.
Strange thing.
So you have to run

mdadm -Ac partitions /dev/md0 -m dev

on rescue mode or boot into rescue with cd. Then it worked again.

This happens on Fedora 9

Anonymous:

Adam,

I am following your steps and I am stuck in step two as mentioned by another user in the past.

------------- Previous post -------------

hi
i have a problem on step 2:
after resync, when i run the
command mdadm /dev/md0 --add /dev/sda1
i get the message
mdadm: Cannot open /dev/sda1: Device or resource busy

same thing on sda3

help please

12/13/2007 - 08:51
11adambengur:
Make sure that your BIOS is not configured on RAID - as it might confiuse mdadm.
Also, make sure you start the computer on rescue mode by booting from the first installation CD and typing "linux rescue".
It is also possible the mdadm discovered the superblock and automatically loaded it, so you might wanna try to stop mdadm:
mdadm --stop /dev/md0
mdadm --stop /dev/md1

-------- end of previous post ------------

I have tried the --stop command and I get the same message.

Now, I am using 2 320GB SATA II hard drives and all have been succesful up to now. Oh, except, when I issue the command "chroot /mnt/sysimage" it does not change from "sh-3.02#".

I am on Fedora Core 9.

Adam, can you please assist me with this. I am on a tight deadline.

Thank you,
Nathan.

Anonymous:

Hi Adam, and thanks for your tutorial.
I'm on FC9, and all the steps are ok, before the step 2.
In particoular, i want ask you:
What means exactly that the array is "not set as hot"? How you can see if is set as hot or no?
In wich way you chose the partition to add at the raid with the mdadm command? (mdadm /dev/md0 --add /dev/sda1 ** why sda1 and not sdb1? **)
If i run "mdadm --detail /dev/md0" i read in the State field "active" and not "clear" ... there is difference between this two state?
Thank you very much,
Mauro.

Anonymous:

i have such a problem like #39. I have FC9.
Sync is 100%, errors are the same like in #39.
Pls read this https://bugzilla.redhat.com/show_bug.cgi?id=450722
System with such a bugs without solution past few months is not good.
I think that this is the end of my adventure with Fedora.
I am IT in Labour Office, f9 is dhcp, squid+proxy, smb server and I have this problem past two weeks and now i know that i wasted my time (no solution).
pete

Anonymous:

Hello, I want to be sure on the procedure for swapping a drive, after a bad drive is physically removed and the new one installed I am not sure on how to re-install
the grub boot-loader so the system will properly boot. I did see above in #2 someone
mentioned
grub> device ......
grub> root.........
grub> setup........

Im not 100% sure what to put in these fields, below you can see my mdstat. I know I
would obviously have to put the right information under device, root and setup.

[root@ikse002 ~]# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sda1[0] sdb1[1]
241665664 blocks [2/2] [UU]

unused devices:

adambengur:

#56, it appears you have two drivers sda and sdb.
So to install GRUB in need to follow these commands:
grub> device (hd0) /dev/sda
grub> root (hd0,0)
grub> setup (hd0)
and then
grub> device (hd1) /dev/sdb
grub> root (hd1,0)
grub> setup (hd1)
This will install GRUB on both drives. Check the GRUB manual: http://www.gnu.org/software/grub/manual/grub.html#Installing-GRUB-native...

Espolador:

Congratulations. It works 100%.

Fabrice:

Thank you for this usefull tutorial.
I install my new linux box with 3 HD in raid 5
All is working perfectly.

Fabrice

Francesco:

Adam,

Thank you (100 times) for this usefull tutorial.
I install my new linux box (file server via Samba in Windows LAN) with 2 HD in raid 1.
Now all is working perfectly.

For you 1 week of free holidays at my home in Volterra (Tuscany-Italy)

Best regards

Francesco Stefanelli
17, orti s. agostino
56048 - Volterra (PI)

Rss Feed Ping Servic:

Thank you for this in depth tutorial!!!

Lars Nielsen Lind:

Thanks, for the very usefull tutorial.

I had the same problem with Device or resource busy, and I was not able to stop the array either.

For my system the trick was to go directly to the Grub part and complete it first. Then do the test with only hd1 and start the system, and afterwards vice versa with hd2 alone. Then boot the system with both hd's and the install dvd (linux rescue). Now I was able to add hd1 and hd2 to the system.

Lars Nielsen Lind:

Hi,

found this usefull article:

http://www.linuxplanet.com/linuxplanet/tutorials/6518/1/

...

You'll need to unmount all filesystems on the array before you can stop it.

To remove devices from an array, they must first be failed. You can fail a healthy device manually:

# mdadm /dev/md1 --fail /dev/sda2 --remove /dev/sda2

...

len:

Beautiful job. I must have visited 50 pages that did not apply. I like google, but they have some strange way of ordering things sometimes.
A few small points.
1) You never mention that fedora 10 starts out with this lvm thing. It appears that it is necessary to remove it (them).
2) for some strange reason, i had to add root before swap on /dev/sdb.
3) that part "you already know how to do that" . If i do, that how come i had to try it about 5 different times ;-)...
Thanks again for you well write help.. 2 36 gig drive for 30 bucks, and Im runnin' raid 1 Linux. What a long strange trip its been

Anonymous:

This simply worked with no problems. I made a mirror 1 raid with Fedora 10 x86 64 bits. Works great, and you don't have to make the grub part, it booted with no problem.

Thank you very much.

Erick:

Thanks for writing this guide. I've almost got it working. After I quit the grub and reboot, A lot of info goes up the screen and it just stops. Here are the last three lines.

sdb: sdb1 sdb2 sdb3
sd 0:0:1:0: [sdb] Attached SCSI disk
sd 0:0:1:0: Attached scsi generic sg1 type 0

Note: All my partitions are set as primary partitions.

Erick:

UPDATE:

I replaced my SCSI drives with SATA drives and everything worked on the first try.
I'm not sure why my system just stopped with the following on the screen when I used the SCSI drives.

sdb: sdb1 sdb2 sdb3
sd 0:0:1:0: [sdb] Attached SCSI disk
sd 0:0:1:0: Attached scsi generic sg1 type 0

Note: All my partitions are set as primary partitions.

Colin:

Thanks, for the very very good tutorial.
Worked perfectly on my Fedora 10 box once I found out my Abit NF2 mobo had a Bios bug which kept corrupting a drive. Changing EXT-P2P's Discard Time = 1ms in the Bios fixed it.

Jack Crow:

Adam -

I have 5 SCSI disks installed (sda, sdb, sdc, sdd, sde).
I have /boot, /, and /usr partitions on sdc & sde (they are mirrored, Linux soft-raid1). /boot is the 1st partition, / is the 2nd partition, and /usr is the 3rd - all are primary. So /dev/md0 is /boot (sdc1 & sde1), /dev/md1 is / (sdc2 & sde2), and /dev/md2 is /usr (sdc3 & sde3). My device.map file has hd2 & hd4 listed as boot drives - which is correct, however, system fails to boot saying 'image not found' ... I have run GRUB's device/root/setup for hd2 & hd4 (partition 0) ...
sda - /home (not mirrored)
sdb - /var & /tmp ... mirrored to sdd ... /dev/md3,md4
sdc - /boot, /, /usr ... mirrored to sde, /dev/md0, md1, & md2

I am at a loss on to why it still won't boot - any ideas?
Could it be my grub.conf file? Image file and kernel exist in the /boot/grub directory like they should ...

sda may also have 'grub' on it ... could that be the problem?

adambengur:

Hi Jack Crow, your problem might be because your GRUB does not point to an existing kernel or initrd image. It depends on your active partition and boot drive. Make sure you setup GRUB on the correct drives, if I understand your configuration then you should install GRUB on sdc and sde (see step #3 above). Make sure sdc is set active.

Some mean people put this blog under a robot that submits spam comments, sometimes more than 100 spam posts a day. I am forced to disable commenting to this blog... Sorry guys...