Home > Geek Stuff > ZFS Drive Replacement

ZFS Drive Replacement

My home storage server has been a source of annoyance for a few months now. I had upgraded it from an Intel Atom board to a E5500 and had some major stability issues involving bad RAM and a bad motherboard. After finally getting it stable, I learned one of my 2TB drives in my RAIDZ pool had started reporting a slightly smaller size, making it unable to participate in the pool. Luckily, the drive was still under warranty, and replacing it is a stupidly easy process, which I’ve decided to document here.

[chip@sumo ~]$ zpool status
  pool: storage
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: none requested
config:

        NAME                STATE     READ WRITE CKSUM
        storage             DEGRADED     0     0     0
          raidz1            DEGRADED     0     0     0
            label/2TBdisk1  UNAVAIL      0     0     0  cannot open
            label/2TBdisk2  ONLINE       0     0     0
            label/2TBdisk3  ONLINE       0     0     0

errors: No known data errors

As you can see, 2TBdisk1 has failed and is unavailable. Since these SATA controllers support hot-swap, I just hooked up the new drive while the box was running. FreeBSD didn’t automatically detect the new drive, so I had to instruct it to rescan the SATA device. Normally I would expect atacontrol reinit to do this, but I ended up having to detach and attach the appropriate ata chain to get it to see the new drive.

[chip@sumo ~]$ sudo atacontrol detach ata2
[chip@sumo ~]$ sudo atacontrol attach ata2
Master:  ad4 < SAMSUNG HD204UI/1AQ10001 > SATA revision 2.x
Slave:       no device present

With the new drive now being recognized, I applied a GEOM label to it, then replaced the failed drive in the ZFS pool.

[chip@sumo ~]$ sudo glabel label 2TBdisk4 /dev/ad4
[chip@sumo ~]$ sudo zpool replace storage label/2TBdisk1 label/2TBdisk4
[chip@sumo ~]$ zpool status
  pool: storage
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 0.12% done, 4h40m to go
config:

        NAME                  STATE     READ WRITE CKSUM
        storage               DEGRADED     0     0     0
          raidz1              DEGRADED     0     0     0
            replacing         DEGRADED     0     0     0
              label/2TBdisk1  UNAVAIL      0     0     0  cannot open
              label/2TBdisk4  ONLINE       0     0     0  554M resilvered
            label/2TBdisk2    ONLINE       0     0     0
            label/2TBdisk3    ONLINE       0     0     0

errors: No known data errors

Once the replacement process was finished, zpool status goes back to normal:

[chip@sumo ~]$ zpool status
  pool: storage
 state: ONLINE
 scrub: resilver completed after 5h49m with 0 errors on Fri Jul  1 23:23:08 2011
config:

	NAME                STATE     READ WRITE CKSUM
	storage             ONLINE       0     0     0
	  raidz1            ONLINE       0     0     0
	    label/2TBdisk4  ONLINE       0     0     0  437G resilvered
	    label/2TBdisk2  ONLINE       0     0     0
	    label/2TBdisk3  ONLINE       0     0     0

errors: No known data errors

Overall a remarkably painless process, all without taking the machine offline!

Categories: Geek Stuff Tags: , ,
  1. No comments yet.
  1. No trackbacks yet.

Performance Optimization WordPress Plugins by W3 EDGE