Pedro's blog: db2, mysql, php, linux and performance


Linux RAID1 faulty spare and a dead hard disk
21/04/2012, 9:09
Filed under: General,GNU/Linux,Hardware | Tags: , , ,

Yesterday we had a surprise, something happen, take a look:Linux faulty spare raid1

This is the IO Service time graph – by day, and you can see the difference around nine in the morning between yellow and orange color lines (sdb write and read  stats) and the blue and green color lines (sda drive write and read stats). The hard disk working as /dev/sdb failed around twenty past nine and we found out at seven pm.

The mdadm command said something about “faulty spare”, the output was (command: mdadm –misc –detail /dev/md1):

dev/md2:
Version : 0.90
Creation Time : Thu Dec 16 11:16:09 2010
Raid Level : raid1
Array Size : 1454122944 (1386.76 GiB 1489.02 GB)
Used Dev Size : 1454122944 (1386.76 GiB 1489.02 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Thu Apr 19 20:13:58 2012
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
UUID : 36c6fe3e:6fbcc6a0:a4d2adc2:26fd5302
Events : 0.108754
Number   Major   Minor   RaidDevice State
0       8        2        0      active sync   /dev/sda2
1       0        0        1      removed
2       8       18        –      faulty spare /dev/sdb2

So, we opened a ticket to the hosting team and they take like 10 minutes to change the disk, awesome support! then, copying partitions and rebuilding raid1 array:

#sfdisk -d /dev/sda | sfdisk -f /dev/sdb

#mdadm /dev/md1 –manage –add /dev/sdb1
mdadm: added /dev/sdb1

# mdadm /dev/md2 –manage –add /dev/sdb2
mdadm: added /dev/sdb2

# mdadm –misc –detail /dev/md1 | grep sdb
1       8       17        1      active sync   /dev/sdb1
# mdadm –misc –detail /dev/md2 | grep sdb
2       8       18        1      spare rebuilding   /dev/sdb2

Syncing:
# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md1 : active raid1 sdb1[1] sda1[0]
10485696 blocks [2/2] [UU]
md2 : active raid1 sdb2[2] sda2[0]
1454122944 blocks [2/1] [U_]
[>………………..]  recovery =  0.1% (2138624/1454122944) finish=3913.2min speed=6183K/sec

And then swap!
# cat /proc/swaps
Filename                                Type            Size    Used    Priority
/dev/sda3                               partition       526236  0       -1
# mkswap /dev/sdb3
Setting up swapspace version 1, size = 538865 kB
# swapon -a
# cat /proc/swaps
Filename                                Type            Size    Used    Priority
/dev/sda3                               partition       526236  0       -1
/dev/sdb3                               partition       526236  0       -2
[root@ns24862 ~]# free
total       used       free     shared    buffers     cached
Mem:      12318872   12069548     249324          0     507552    9107340
-/+ buffers/cache:    2454656    9864216
Swap:      1052472          0    1052472

After some long time, everything working again like a charm!