ZFS Drive Replacement
My home storage server has been a source of annoyance for a few months now. I had upgraded it from an Intel Atom board to a E5500 and had some major stability issues involving bad RAM and a bad motherboard. After finally getting it stable, I learned one of my 2TB drives in my RAIDZ pool had started reporting a slightly smaller size, making it unable to participate in the pool. Luckily, the drive was still under warranty, and replacing it is a stupidly easy process, which I’ve decided to document here.
[chip@sumo ~]$ zpool status
pool: storage
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
storage DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
label/2TBdisk1 UNAVAIL 0 0 0 cannot open
label/2TBdisk2 ONLINE 0 0 0
label/2TBdisk3 ONLINE 0 0 0
errors: No known data errors
As you can see, 2TBdisk1 has failed and is unavailable. Since these SATA controllers support hot-swap, I just hooked up the new drive while the box was running. FreeBSD didn’t automatically detect the new drive, so I had to instruct it to rescan the SATA device. Normally I would expect atacontrol reinit to do this, but I ended up having to detach and attach the appropriate ata chain to get it to see the new drive.
[chip@sumo ~]$ sudo atacontrol detach ata2
[chip@sumo ~]$ sudo atacontrol attach ata2
Master: ad4 < SAMSUNG HD204UI/1AQ10001 > SATA revision 2.x
Slave: no device present
With the new drive now being recognized, I applied a GEOM label to it, then replaced the failed drive in the ZFS pool.
[chip@sumo ~]$ sudo glabel label 2TBdisk4 /dev/ad4
[chip@sumo ~]$ sudo zpool replace storage label/2TBdisk1 label/2TBdisk4
[chip@sumo ~]$ zpool status
pool: storage
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h0m, 0.12% done, 4h40m to go
config:
NAME STATE READ WRITE CKSUM
storage DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
replacing DEGRADED 0 0 0
label/2TBdisk1 UNAVAIL 0 0 0 cannot open
label/2TBdisk4 ONLINE 0 0 0 554M resilvered
label/2TBdisk2 ONLINE 0 0 0
label/2TBdisk3 ONLINE 0 0 0
errors: No known data errors
Once the replacement process was finished, zpool status goes back to normal:
[chip@sumo ~]$ zpool status
pool: storage
state: ONLINE
scrub: resilver completed after 5h49m with 0 errors on Fri Jul 1 23:23:08 2011
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
raidz1 ONLINE 0 0 0
label/2TBdisk4 ONLINE 0 0 0 437G resilvered
label/2TBdisk2 ONLINE 0 0 0
label/2TBdisk3 ONLINE 0 0 0
errors: No known data errors
Overall a remarkably painless process, all without taking the machine offline!
