I utilize the SATA ports on my motherboard and Linux software RAID (mdadm) to provide my main RAID5 storage array. The five drives in the array are kept in a SATA hot-swap enclosure. The enclosure allows you to individually power on/off each drive so that they can be easily swapped. I wasn’t sure on how to tell the OS how to do the same thing. Here are the steps that must be performed:
- Remove the failed drive from your RAID array. For example, issue the command mdadm -r.
- Remove the failed drive from the SCSI subsystem. SATA drives are handled by the SCSI subsystem, which is why they show up as /dev/sdxx. This is the step that I was unsure of, which will be discussed in further detail below.
- Power the failed drive off.
- Physically remove the failed drive, and replace it with a new one.
- Power on the failed drive.
- Insert the new drive into the SCSI subsystem.
- Recreate the partition setup that was on the failed drive.
- Add the drive back to your RAID array. For example, issue the command mdadm -a.
Inserting and removing a drive into/from the SCSI subsystem were the steps that I was unsure of (steps 2 and 6 above). It turns out that the scisadd command, which is conveniently located in the Ubuntu/Debian scsiadd package is just what the doctor ordered. First, issue scsiadd -p, to print out the attached devices. I pasted in a sample of the output from my server below:
user@localhost:~$ scsiadd -p
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: ST3750330AS Rev: SD15
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: ST3750330AS Rev: SD15
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: ST3750330AS Rev: SD15
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi3 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: ST3750330AS Rev: SD04
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi4 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: ST3750330AS Rev: SD15
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi5 Channel: 00 Id: 00 Lun: 00
Vendor: TSSTcorp Model: CDDVDW SH-S203B Rev: SB02
Type: CD-ROM ANSI SCSI revision: 05
Host: scsi8 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: WDC WD5000AAJS-2 Rev: 12.0
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi9 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: WDC WD5000AAJS-0 Rev: 01.0
Type: Direct-Access ANSI SCSI revision: 05
To remove the last drive listed, I would issue the command scsiadd -r 9 0 0 0. The numbers in the command correspond to the SCSI host and channel in the output, i.e., “Host: scsi9 Channel: 00 Id: 00 Lun: 00″. At this point the drive can be powered down and removed.
After replacing the drive, you must issue two commands to insert it into the SCSI subsystem. First, issue scsiadd -a 9 0 0 0. Second issue scsiadd -s. At this point you should be able to edit the partition table and insert the drive back into your RAID array. I recommend trying this procedure out when you first assemble a machine, so that you don’t have to learn on the fly when a failure occurs.