SATA Hot-Swap in Linux

I utilize the SATA ports on my motherboard and Linux software RAID (mdadm) to provide my main RAID5 storage array.  The five drives in the array are kept in a SATA hot-swap enclosure.  The enclosure allows you to individually power on/off each drive so that they can be easily swapped.  I wasn’t sure on how to tell the OS how to do the same thing.  Here are the steps that must be performed:

  1. Remove the failed drive from your RAID array.  For example, issue the command mdadm -r.
  2. Remove the failed drive from the SCSI subsystem.  SATA drives are handled by the SCSI subsystem, which is why they show up as /dev/sdxx.  This is the step that I was unsure of, which will be discussed in further detail below.
  3. Power the failed drive off.
  4. Physically remove the failed drive, and replace it with a new one.
  5. Power on the failed drive.
  6. Insert the new drive into the SCSI subsystem.
  7. Recreate the partition setup that was on the failed drive.
  8. Add the drive back to your RAID array.  For example, issue the command mdadm -a.

Inserting and removing a drive into/from the SCSI subsystem were the steps that I was unsure of (steps 2 and 6 above).  It turns out that the scisadd command, which is conveniently located in the Ubuntu/Debian scsiadd package is just what the doctor ordered.  First, issue scsiadd -p, to print out the attached devices.  I pasted in a sample of the output from my server below:

user@localhost:~$ scsiadd -p
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA      Model: ST3750330AS      Rev: SD15
Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: ATA      Model: ST3750330AS      Rev: SD15
Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: ATA      Model: ST3750330AS      Rev: SD15
Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi3 Channel: 00 Id: 00 Lun: 00
Vendor: ATA      Model: ST3750330AS      Rev: SD04
Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi4 Channel: 00 Id: 00 Lun: 00
Vendor: ATA      Model: ST3750330AS      Rev: SD15
Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi5 Channel: 00 Id: 00 Lun: 00
Vendor: TSSTcorp Model: CDDVDW SH-S203B  Rev: SB02
Type:   CD-ROM                           ANSI  SCSI revision: 05
Host: scsi8 Channel: 00 Id: 00 Lun: 00
Vendor: ATA      Model: WDC WD5000AAJS-2 Rev: 12.0
Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi9 Channel: 00 Id: 00 Lun: 00
Vendor: ATA      Model: WDC WD5000AAJS-0 Rev: 01.0
Type:   Direct-Access                    ANSI  SCSI revision: 05

To remove the last drive listed, I would issue the command scsiadd -r 9 0 0 0.  The numbers in the command correspond to the SCSI host and channel in the output, i.e., “Host: scsi9 Channel: 00 Id: 00 Lun: 00″.  At this point the drive can be powered down and removed.

After replacing the drive, you must issue two commands to insert it into the SCSI subsystem.  First, issue scsiadd -a 9 0 0 0.  Second issue scsiadd -s.  At this point you should be able to edit the partition table and insert the drive back into your RAID array.  I recommend trying this procedure out when you first assemble a machine, so that you don’t have to learn on the fly when a failure occurs.

This entry was posted in Software and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>