Linux: HTPC / Home Backup: MDADM, RAID6, LVM, XFS, CIFS and NFS | Thoughts and Scribbles

HTPC / Backup Home Server Solution using Linux

Pages: 1 2 3 4 5

RELIABILITY AND AVAILABILITY TESTS

Performance tests are one thing but if such a setup as we have here doesn't stay up and is prone to crashing and data losses, performance will mean little. In this section, we'll put the array to the test by subjecting it to various scenarios that attempt to simulate real world events that could result in data loss or destroy the array.

TEST #	TEST CONFIG	TEST RESULTS
1	Fill the storage up to it's maximum possible. For this, we will attempt to write a function to auto fill the RAID6 array at it's maximum speed. At the end of this table is the script to do just that. An alternate way to do the same using a copy of a file (Be sure to copy the first sample to the storage array first as the script assumes the file resides on the storage): for (( fcp = 0; fcp < 5000; fcp++ )); do cp -i fill.dat fill.$fcp.dat; done NOTE: Either way would work. The second method would not be hit by a read penalty since the file would be cached for multiple copies anyway.	PASS: RAID6 array filled to capacity. No errors. # df -m . Filesystem 1M-blocks Used Available Use% Mounted on /dev/mapper/MBPCStorage-MBPCBackup 1907288 1907288 1 100% /mnt/MBPCBackupx # # xfs_db -c frag -r /dev/MBPCStorage/MBPCBackup actual 59209, ideal 29835, fragmentation factor 49.61% # smartctl quick check: # for ddn in $(ls /dev/sd{a,b,c,d,e,f,g}); do DDN=$(smartctl -A $ddn\|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done /dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdc: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sde: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdf: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdg: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0 # While filling the FS using the AWK script, the task would complete to 100% capacity and when the message not enough disk space was encountered however, would revert to 85% usage in an instant. This was puzzling however this may be fixed by adding this option to the XFS mount command: # mount -o inode64 This would allow XFS to save the inode table beyond the first TB of data space. (Have not tested this option as filling the FS using the KSH script worked without that descrepancy.)
2	Delete all test files created in step 1.	PASS: Delete was instantenous.
3	Fill the array up with a copy of a binary file. (ie Linux ISO's could be a good choice.) Use the second script from step 1. for (( fcp = 0; fcp < 10000; fcp++ )); do cp -i test.dat $fcp.dat; done	PASS: Result showed no errors or issues in filling the array: # df -m . Filesystem 1M-blocks Used Available Use% Mounted on /dev/mapper/MBPCStorage-MBPCBackup 1907288 1907288 1 100% /mnt/MBPCBackupx You have new mail in /var/spool/mail/root #
2	Grow the LVM and XFS filesystem on top of it to 3TB from the 2TB we've set above [or] on the first page.	PASS: Grow the LVM by 50% of the free space to 3TB (I used a two command combination one liner): # lvm lvextend -L+$(lvm vgs –units S\|awk '{ if ( $1 ~ /MBPCStorage/ ) print $7 / (100 / 50); }')S /dev/MBPCStorage/MBPCBackup Extending logical volume MBPCBackup to 2.73 TiB Logical volume MBPCBackup successfully resized # # lvm lvdisplay /dev/MBPCStorage/MBPCBackup –units G — Logical volume — LV Name /dev/MBPCStorage/MBPCBackup VG Name MBPCStorage LV UUID k6dLRW-BUht-tm0n-9GU8-e6ma-M0qW-wJ4TiD LV Write Access read/write LV Status available # open 1 LV Size 3000.41 GB Current LE 715353 Segments 1 Allocation inherit Read ahead sectors auto – currently set to 32768 Block device 253:6 # I then used this command to extend the XFS filesystem (XFS FS is still at 100% usage at this point). An XFS filesystem is extended while mounted: # xfs_growfs /mnt/MBPCBackupx/ meta-data=/dev/mapper/MBPCStorage-MBPCBackup isize=256 agcount=64, agsize=7629408 blks = sectsz=512 attr=2 data = bsize=4096 blocks=488282112, imaxpct=5 = sunit=16 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=16384, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 data blocks changed from 488282112 to 732521472 # cd MBPCBackupx/ # df -m . Filesystem 1M-blocks Used Available Use% Mounted on /dev/mapper/MBPCStorage-MBPCBackup 2861348 1907290 954059 67% /mnt/MBPCBackupx # xfs_growfs -n /mnt/MBPCBackupx/ meta-data=/dev/mapper/MBPCStorage-MBPCBackup isize=256 agcount=97, agsize=7629408 blks = sectsz=512 attr=2 data = bsize=4096 blocks=732521472, imaxpct=5 = sunit=16 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=16384, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 #
3	Refill the array using the method in TEST 1. (We will reuse the second script in this case.)	PASS # cd /mnt/MBPCBackupx Then start from the last file the system could write, 4838 (File # 4839): # for (( fcp = 4839; fcp < 10000; fcp++ )); do cp -i test.dat $fcp.dat; done # df -m . Filesystem 1M-blocks Used Available Use% Mounted on /dev/mapper/MBPCStorage-MBPCBackup 2861348 2861348 1 100% /mnt/MBPCBackupx #
4	Remove two disks and reinsert them in swapped order (w/o Writing or reading data or resyncing etc) while array is quiet. Here, we want to see the rebuild time and statistics when used in combination with –bitmap=internal option.	PASS For this test we'll select two disks arbitrarily but first we get their serial numbers so we can identify which disks to pull from the outside (Use hdparm -i /dev/rsda for example to get that): /dev/rsda (9VX0X9TA) – /dev/sdc /dev/rsdd (9VX0WJKA) – /dev/sdf First we'll spin the platters down and effectively shut the disks down: echo 1 > /sys/block/sdc/device/delete echo 1 > /sys/block/sdf/device/delete Allow some appropriate amount of time to let the disks spin down. Check using below command to see that two disks are in fact now unavailable in the array: # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdb[4] sdc[0](F) sde[1] sdd[2] sdf[3](F) sda[5] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [_UU_UU] bitmap: 0/8 pages [0KB], 65536KB chunk unused devices: <none> # The array should still be functioning fine. Delete one of the files that were copied and copy another in it's place as a quick test while the disks are gone: # df -m . Filesystem 1M-blocks Used Available Use% Mounted on /dev/mapper/MBPCStorage-MBPCBackup 2861348 2861348 1 100% /mnt/MBPCBackupx # # ls -al 99[8-9]* -rwxr–r-x. 1 root root 413368320 Apr 9 00:50 998.dat -rwxr–r-x. 1 root root 413368320 Apr 10 21:46 999.dat # Remove the drives that were turned off and replug them back in in each other's slots then check the array: # mdadm –detail /dev/raidmd0 /dev/raidmd0: Version : 1.2 Creation Time : Mon Mar 26 00:06:24 2012 Raid Level : raid6 Array Size : 3907045632 (3726.05 GiB 4000.81 GB) Used Dev Size : 976761408 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 6 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Tue Apr 10 21:54:39 2012 State : active, degraded Active Devices : 4 Working Devices : 4 Failed Devices : 2 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : mbpc:0 (local to host mbpc) UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e Events : 3991 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 64 1 active sync /dev/sde 2 8 48 2 active sync /dev/sdd 3 0 0 3 removed 4 8 16 4 active sync /dev/sdb 5 8 0 5 active sync /dev/sda 0 8 32 – faulty spare 3 8 80 – faulty spare # for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i,j}); do DDN=$(smartctl -A $ddn\|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done ls: cannot access /dev/sdc: No such file or directory ls: cannot access /dev/sdf: No such file or directory ls: cannot access /dev/sdj: No such file or directory /dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sde: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdg: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0 /dev/sdh: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdi: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 # From the above we can see which ones are marked as faulty (F) and how many are active in the array (6/4). Now the drives are back but now as sdh and sdi. However, our UDEV rules are still creating the original links to point to the drives with the above serial numbers irrespective of the names (sdi, sdh, sdf, sdc etc) given by the system: # ls -al /dev/rsd* lrwxrwxrwx. 1 root root 3 Apr 10 21:52 /dev/rsda -> sdh lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdb -> sde lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdc -> sdd lrwxrwxrwx. 1 root root 3 Apr 10 21:54 /dev/rsdd -> sdi lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsde -> sdb lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdf -> sda # Now that the drives are plugged in again, we need to tell the system to add them to the array again: # mdadm –add /dev/raidmd0 /dev/rsda mdadm: /dev/rsda reports being an active member for /dev/raidmd0, but a –re-add fails. mdadm: not performing –add as that would convert /dev/rsda in to a spare. mdadm: To make this a spare, use "mdadm –zero-superblock /dev/rsda" first. # mdadm –add /dev/raidmd0 /dev/rsdd mdadm: /dev/rsdd reports being an active member for /dev/raidmd0, but a –re-add fails. mdadm: not performing –add as that would convert /dev/rsdd in to a spare. mdadm: To make this a spare, use "mdadm –zero-superblock /dev/rsdd" first. # Can't use the following either as the devices are no longer listed under /dev/: # mdadm –manage /dev/raidmd0 –fail /dev/sdf mdadm: cannot find /dev/sdf: No such file or directory # mdadm –manage /dev/raidmd0 –fail /dev/sdc mdadm: cannot find /dev/sdc: No such file or directory # mdadm –manage /dev/raidmd0 –remove /dev/sdf mdadm: cannot find /dev/sdf: No such file or directory # mdadm –manage /dev/raidmd0 –remove /dev/sdc mdadm: cannot find /dev/sdc: No such file or directory # Instead, we will use the same device names as above but without /dev/ in the name: # mdadm –manage /dev/raidmd0 –remove sdf mdadm: hot removed sdf from /dev/raidmd0 # # mdadm –manage /dev/raidmd0 –remove sdc mdadm: hot removed sdc from /dev/raidmd0 # The result of the operation while checking that the data integrity was still fine: # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdb[4] sde[1] sdd[2] sda[5] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [_UU_UU] bitmap: 3/8 pages [12KB], 65536KB chunk unused devices: <none> # # mdadm –detail /dev/raidmd0 /dev/raidmd0: Version : 1.2 Creation Time : Mon Mar 26 00:06:24 2012 Raid Level : raid6 Array Size : 3907045632 (3726.05 GiB 4000.81 GB) Used Dev Size : 976761408 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 4 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Tue Apr 10 22:33:40 2012 State : active, degraded Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : mbpc:0 (local to host mbpc) UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e Events : 4317 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 64 1 active sync /dev/sde 2 8 48 2 active sync /dev/sdd 3 0 0 3 removed 4 8 16 4 active sync /dev/sdb 5 8 0 5 active sync /dev/sda # Now it appears we can readd the replugged devices. The internal bitmap is now making the recovery nearly instantaneous all the while drive is 100% full with 3TB of data: # mdadm –add /dev/raidmd0 /dev/rsda mdadm: re-added /dev/rsda # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdh[0] sdb[4] sde[1] sdd[2] sda[5] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [_UU_UU] [=======>………….] recovery = 35.3% (344989104/976761408) finish=183.8min speed=57259K/sec bitmap: 3/8 pages [12KB], 65536KB chunk unused devices: <none> # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdh[0] sdb[4] sde[1] sdd[2] sda[5] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUU_UU] bitmap: 3/8 pages [12KB], 65536KB chunk unused devices: <none> # date Tue Apr 10 22:38:15 EDT 2012 # To give a good perspective how quick that took we look at the logs: /var/log/messages Apr 10 22:37:57 mbpc kernel: md: bind<sdh> Apr 10 22:37:57 mbpc kernel: md: recovery of RAID array md0 Apr 10 22:37:57 mbpc kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Apr 10 22:37:57 mbpc kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Apr 10 22:37:57 mbpc kernel: md: using 128k window, over a total of 976761408k. Apr 10 22:38:06 mbpc kernel: md: md0: recovery done. So about 9 seconds. Now we readd the second disk but this time we time the rebuild differently to show how quick it actually takes: # date; mdadm –add /dev/raidmd0 /dev/rsdd; sleep 5; cat /proc/mdstat; sleep 5; cat /proc/mdstat Tue Apr 10 22:43:15 EDT 2012 mdadm: re-added /dev/rsdd Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdi[3] sdh[0] sdb[4] sde[1] sdd[2] sda[5] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUU_UU] [=====>……………] recovery = 25.0% (244315840/976761408) finish=234.9min speed=51955K/sec bitmap: 3/8 pages [12KB], 65536KB chunk unused devices: <none> Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdi[3] sdh[0] sdb[4] sde[1] sdd[2] sda[5] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUU_UU] [=======>………….] recovery = 36.0% (351886304/976761408) finish=215.9min speed=48227K/sec bitmap: 3/8 pages [12KB], 65536KB chunk unused devices: <none> # Again, checking the logs we see how long the rebuild took: /var/log/messages Apr 10 22:43:15 mbpc kernel: md: bind<sdi> Apr 10 22:43:15 mbpc kernel: md: recovery of RAID array md0 Apr 10 22:43:15 mbpc kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Apr 10 22:43:15 mbpc kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Apr 10 22:43:15 mbpc kernel: md: using 128k window, over a total of 976761408k. Apr 10 22:43:27 mbpc kernel: md: md0: recovery done. Let's check if it's all done and if everything checks out fine: # for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i,j}); do DDN=$(smartctl -A $ddn\|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done ls: cannot access /dev/sdc: No such file or directory ls: cannot access /dev/sdf: No such file or directory ls: cannot access /dev/sdj: No such file or directory /dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sde: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdg: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0 /dev/sdh: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdi: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdi[3] sdh[0] sdb[4] sde[1] sdd[2] sda[5] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU] bitmap: 1/8 pages [4KB], 65536KB chunk unused devices: <none> # # mdadm –detail /dev/raidmd0 /dev/raidmd0: Version : 1.2 Creation Time : Mon Mar 26 00:06:24 2012 Raid Level : raid6 Array Size : 3907045632 (3726.05 GiB 4000.81 GB) Used Dev Size : 976761408 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 6 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Tue Apr 10 22:53:02 2012 State : active Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : mbpc:0 (local to host mbpc) UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e Events : 4414 Number Major Minor RaidDevice State 0 8 112 0 active sync /dev/sdh 1 8 64 1 active sync /dev/sde 2 8 48 2 active sync /dev/sdd 3 8 128 3 active sync /dev/sdi 4 8 16 4 active sync /dev/sdb 5 8 0 5 active sync /dev/sda # And we are back to our normal operating mode making this a successful test.
5	Do the following in the order listed: Start writing large GB in size files to the storage continously. Begin watching a video or open a large file from another user. (Use switch user) Turn off two disk drives / spin down the platters. (See above or the first page for doing that) Remove the stopped disks from the chassis. Turn off the power at the power supply. Upon startup / reboot: Bring back the array in degraded mode. Start to watch a video again or accessing the large file. Begin to write data to the RAID6 storage again. (While in degraded mode with two disks missing.) Reinsert the two disks back swapping each disks location. (ie if disk 1 was taken out of slot 1 and disk 2 was taken out of slot 2, insert disk 1 into slot 2 and disk 2 into slot 1)	PASS (FIRST TEST): For this test, we will remove some files from the above tests to move us back from the 100% usage so we can write large files to the array continuously: Start using the disks such as watching a video then writing files. # cd /mnt/MBPCBackupx # date +%s:%N; cp -p /home/mdadm.dat ./raid6.0.dat; cp raid6.0.dat raid6.1.dat; cp -p /home/mdadm.dat ./raid6.2.dat; date +%s:%N; (And again if the above finishes too quickly) Get the disk serial numbers first so we know what to unplug: # hdparm -i /dev/sdb /dev/sdb: Model=ST31000520AS, FwRev=CC32, SerialNo=9VX0WK55 # hdparm -i /dev/sdc /dev/sdc: Model=ST31000520AS, FwRev=CC32, SerialNo=9VX0X5KC # ls -al /dev/rsd* lrwxrwxrwx. 1 root root 3 Apr 10 21:52 /dev/rsda -> sdh lrwxrwxrwx. 1 root root 3 Apr 11 00:22 /dev/rsdb -> sdc lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdc -> sdd lrwxrwxrwx. 1 root root 3 Apr 10 21:54 /dev/rsdd -> sdi lrwxrwxrwx. 1 root root 3 Apr 11 00:21 /dev/rsde -> sdb lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdf -> sda # Then spin down the disks: # echo 1 > /sys/block/sdc/device/delete # echo 1 > /sys/block/sdb/device/delete The result was: # ls -al /dev/rsd* lrwxrwxrwx. 1 root root 3 Apr 10 21:52 /dev/rsda -> sdh lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdc -> sdd lrwxrwxrwx. 1 root root 3 Apr 10 21:54 /dev/rsdd -> sdi lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdf -> sda # All while reading and writing again: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 2168.50 3847.70 35.20 43.60 9091.60 16289.80 644.20 7.76 97.77 6.53 51.46 sdd 2173.70 3829.50 34.80 39.00 9116.40 16274.20 688.09 9.94 133.18 7.38 54.44 sdb 2169.80 3841.60 36.90 42.20 9110.40 16260.60 641.49 10.49 131.63 6.71 53.09 sdg 2.40 5.80 402.60 1.10 49859.20 27.20 247.15 2.45 6.07 2.39 96.54 dm-0 0.00 0.00 18.20 6.80 348.80 27.20 30.08 2.07 82.67 14.06 35.16 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 25.70 927.60 1638.40 55020.50 118.87 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 386.80 0.00 49510.40 0.00 256.00 1.75 4.52 2.48 95.80 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 25.70 927.40 1638.40 55014.10 118.88 591.18 655.58 0.63 60.52 sdh 2161.40 3841.10 42.50 48.90 9091.60 16160.60 552.56 8.67 94.48 5.72 52.26 sdi 2165.50 3824.70 44.20 47.10 9120.00 16155.00 553.67 9.44 103.01 5.41 49.38 # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdi[3] sdh[0] sdd[2] sda[5] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U] bitmap: 4/8 pages [16KB], 65536KB chunk unused devices: <none> # Then unplug the disks and remove them. After that check that the video is still playing and that the file is still being written to the disk. Once verified, we shut off the power to the system from the switch at the power supply for the server/HTPC+B. Start up the server and bring back the array in the following manner and ascertain the situation: # mount /mnt/MBPCBackupx/ mount: special device /dev/MBPCStorage/MBPCBackup does not exist # ls -al /dev/raidmd0 ls: cannot access /dev/raidmd0: No such file or directory # ls -al /dev/md* brw-rw—-. 1 root disk 9, 0 Apr 11 00:44 /dev/md0 /dev/md: total 4 drwxr-xr-x. 2 root root 60 Apr 11 00:44 . drwxr-xr-x. 22 root root 4400 Apr 11 00:45 .. -rw——-. 1 root root 54 Apr 11 00:44 md-device-map # ls -al /dev/rsd* lrwxrwxrwx. 1 root root 3 Apr 11 00:44 /dev/rsda -> sdd lrwxrwxrwx. 1 root root 3 Apr 11 00:44 /dev/rsdc -> sdc lrwxrwxrwx. 1 root root 3 Apr 11 00:44 /dev/rsdd -> sdb lrwxrwxrwx. 1 root root 3 Apr 11 00:44 /dev/rsdf -> sda # mdadm –detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Mon Mar 26 00:06:24 2012 Raid Level : raid6 Used Dev Size : 976761408 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 4 Persistence : Superblock is persistent Update Time : Wed Apr 11 00:38:51 2012 State : active, degraded, Not Started Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : mbpc:0 (local to host mbpc) UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e Events : 4823 Number Major Minor RaidDevice State 0 8 48 0 active sync /dev/sdd 1 0 0 1 removed 2 8 32 2 active sync /dev/sdc 3 8 16 3 active sync /dev/sdb 4 0 0 4 removed 5 8 0 5 active sync /dev/sda # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : inactive sdd[0] sdc[2] sdb[3] sda[5] 4395422240 blocks super 1.2 unused devices: <none> # # for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i}); do DDN=$(smartctl -A $ddn\|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done ls: cannot access /dev/sdf: No such file or directory ls: cannot access /dev/sdg: No such file or directory ls: cannot access /dev/sdh: No such file or directory ls: cannot access /dev/sdi: No such file or directory /dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdc: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sde: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0 # So at least we can see things and that our array still has 4 disks and non list as having bad blocks or any other problems that would kill this array. So let's try to reassemble it again: # mdadm –assemble –scan # No result from trying to start the array. Hmm. So let's try with the verbose option: # mdadm –assemble –scan -v mdadm: looking for devices for /dev/md/0 mdadm: cannot open device /dev/dm-5: Device or resource busy mdadm: cannot open device /dev/dm-4: Device or resource busy mdadm: cannot open device /dev/dm-3: Device or resource busy mdadm: no RAID superblock on /dev/dm-2 mdadm: cannot open device /dev/dm-1: Device or resource busy mdadm: cannot open device /dev/dm-0: Device or resource busy mdadm: cannot open device /dev/sde3: Device or resource busy mdadm: cannot open device /dev/sde2: Device or resource busy mdadm: cannot open device /dev/sde1: Device or resource busy mdadm: cannot open device /dev/sde: Device or resource busy mdadm: cannot open device /dev/sdd: Device or resource busy mdadm: cannot open device /dev/sdc: Device or resource busy mdadm: cannot open device /dev/sdb: Device or resource busy mdadm: cannot open device /dev/sda: Device or resource busy So let's examine this array further: # mdadm –examine –scan ARRAY /dev/md/0 metadata=1.2 UUID=2f36ac48:5e3e4c54:72177c53:bea3e41e name=mbpc:0 # cat /etc/mdadm.conf ARRAY /dev/md/0 metadata=1.2 name=mbpc:0 UUID=2f36ac48:5e3e4c54:72177c53:bea3e41e # # cat /sys/block/md0/md/array_state inactive # Try to stop it: # mdadm -S /dev/md0 mdadm: stopped /dev/md0 # mdadm –detail /dev/md0 mdadm: cannot open /dev/md0: No such file or directory # Now we're getting somewhere and this time we're getting more useful data: # mdadm –assemble –scan mdadm: /dev/md/0 assembled from 4 drives – not enough to start the array while not clean – consider –force. # More strangeness and now all the devices in what's left are marked as spares (Now I start to worry): # mdadm –assemble –scan –force # # ls -al /dev/md md/ md0 # ls -al /dev/md0 brw-rw—-. 1 root disk 9, 0 Apr 11 01:10 /dev/md0 # mdadm –detail /dev/md0 mdadm: md device /dev/md0 does not appear to be active. # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : inactive sdd[0](S) sda[5](S) sdb[3](S) sdc[2](S) 4395422240 blocks super 1.2 unused devices: <none> # Ok. So let's try to be more explicit and forceful here. After all, we want our data badly folks: # ls -al /dev/rsd* lrwxrwxrwx. 1 root root 3 Apr 11 01:25 /dev/rsda -> sdd lrwxrwxrwx. 1 root root 3 Apr 11 01:25 /dev/rsdc -> sdc lrwxrwxrwx. 1 root root 3 Apr 11 01:25 /dev/rsdd -> sdb lrwxrwxrwx. 1 root root 3 Apr 11 01:25 /dev/rsdf -> sda # # mdadm –detail /dev/md0 mdadm: md device /dev/md0 does not appear to be active. # mdadm -S /dev/md0 mdadm: stopped /dev/md0 # mdadm –detail /dev/md0 mdadm: cannot open /dev/md0: No such file or directory # mdadm -v –assemble –scan –force /dev/md0 /dev/rsda/ dev/rsdc /dev/rsdd /dev/rsdf mdadm: looking for devices for /dev/md0 mdadm: cannot open device /dev/dm-5: Device or resource busy mdadm: cannot open device /dev/dm-4: Device or resource busy mdadm: cannot open device /dev/dm-3: Device or resource busy mdadm: no RAID superblock on /dev/dm-2 mdadm: cannot open device /dev/dm-1: Device or resource busy mdadm: cannot open device /dev/dm-0: Device or resource busy mdadm: cannot open device /dev/sde3: Device or resource busy mdadm: cannot open device /dev/sde2: Device or resource busy mdadm: cannot open device /dev/sde1: Device or resource busy mdadm: cannot open device /dev/sde: Device or resource busy mdadm: /dev/sdd is identified as a member of /dev/md0, slot 0. mdadm: /dev/sdc is identified as a member of /dev/md0, slot 2. mdadm: /dev/sdb is identified as a member of /dev/md0, slot 3. mdadm: /dev/sda is identified as a member of /dev/md0, slot 5. mdadm: Marking array /dev/md0 as 'clean' mdadm: no uptodate device for slot 1 of /dev/md0 mdadm: added /dev/sdc to /dev/md0 as 2 mdadm: added /dev/sdb to /dev/md0 as 3 mdadm: no uptodate device for slot 4 of /dev/md0 mdadm: added /dev/sda to /dev/md0 as 5 mdadm: added /dev/sdd to /dev/md0 as 0 mdadm: /dev/md0 has been started with 4 drives (out of 6). mdadm: /dev/rsda/ not identified in config file. mdadm: dev/rsdc not identified in config file. mdadm: /dev/rsdd not identified in config file. mdadm: /dev/rsdf not identified in config file. # Now let's check the array: # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdd[0] sda[5] sdb[3] sdc[2] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U] bitmap: 5/8 pages [20KB], 65536KB chunk unused devices: <none> # # mdadm –detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Mon Mar 26 00:06:24 2012 Raid Level : raid6 Array Size : 3907045632 (3726.05 GiB 4000.81 GB) Used Dev Size : 976761408 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 4 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Wed Apr 11 00:38:51 2012 State : active, degraded Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : mbpc:0 (local to host mbpc) UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e Events : 4823 Number Major Minor RaidDevice State 0 8 48 0 active sync /dev/sdd 1 0 0 1 removed 2 8 32 2 active sync /dev/sdc 3 8 16 3 active sync /dev/sdb 4 0 0 4 removed 5 8 0 5 active sync /dev/sda # # for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i}); do DDN=$(smartctl -A $ddn\|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done ls: cannot access /dev/sdf: No such file or directory ls: cannot access /dev/sdg: No such file or directory ls: cannot access /dev/sdh: No such file or directory ls: cannot access /dev/sdi: No such file or directory /dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdc: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sde: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0 # Now we are back in the situation we were in reliability test step 5 above. Before we push the drives we failed back again, let's start writing again and using some files like watching a home video again: # mount /mnt/MBPCBackupx/ mount: special device /dev/MBPCStorage/MBPCBackup does not exist # lvm vgchange -a y 3 logical volume(s) in volume group "VGEntertain" now active 3 logical volume(s) in volume group "mbpcvg" now active 1 logical volume(s) in volume group "MBPCStorage" now active # mount /mnt/MBPCBackupx/ # cd /mnt/MBPCBackupx/ # df -m . Filesystem 1M-blocks Used Available Use% Mounted on /dev/mapper/MBPCStorage-MBPCBackup 2861348 2849239 12110 100% /mnt/MBPCBackupx # ls -al \|tail -rwxr–r-x. 1 root root 413368320 Apr 10 21:46 999.dat -rwxr–r-x. 1 root root 413368320 Apr 9 00:14 99.dat -rwxr–r-x. 1 root root 413368320 Apr 9 00:10 9.dat -rwxr–r–. 1 root root 4720553072 May 4 2011 raid6.0.dat -rwxr–r–. 1 root root 4720553072 Apr 10 23:57 raid6.1.dat -rwxr–r–. 1 root root 4720553072 May 4 2011 raid6.2.dat -rwx——. 1 root root 3757387776 Apr 11 00:38 raid6.3.dat -rwxrwSrwx. 1 root root 413368320 Jun 21 2008 test.dat -rw-r–r–. 1 root root 9984418 Apr 9 23:11 test.log # Again, file writing and reading was fine even with the two missing disks above. Now it's time to replug the drives we took out and have the array reassemble. From here, the steps to do that would be identical to TEST 4 above this time starting further down as the devices would be already removed: # mdadm –add /dev/raidmd0 /dev/rsdb; # cat /proc/mdstat; Wed Apr 11 02:16:40 EDT 2012 Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdg[1] sdd[0] sda[5] sdb[3] sdc[2] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U] [>………………..] recovery = 0.0% (0/976761408) finish=1017459.8min speed=0K/sec bitmap: 5/8 pages [20KB], 65536KB chunk unused devices: <none> # This time the array did not recover very quickly even with an internal bitmap (after about 2 minutes) but still relatively quicker then the standard few hours: # cat /proc/mdstat; Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdg[1] sdd[0] sda[5] sdb[3] sdc[2] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U] [=>……………….] recovery = 6.2% (61041460/976761408) finish=1529.0min speed=9980K/sec bitmap: 5/8 pages [20KB], 65536KB chunk unused devices: <none> # While this is going on, let's put back the second disk: # mdadm –add /dev/raidmd0 /dev/rsde mdadm: re-added /dev/rsde # cat /proc/mdstat; Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdf[4](S) sdg[1] sdd[0] sda[5] sdb[3] sdc[2] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U] [==>………………] recovery = 11.8% (115617408/976761408) finish=7203.2min speed=1992K/sec bitmap: 5/8 pages [20KB], 65536KB chunk unused devices: <none> # # mdadm –detail /dev/raidmd0 /dev/raidmd0: Version : 1.2 Creation Time : Mon Mar 26 00:06:24 2012 Raid Level : raid6 Array Size : 3907045632 (3726.05 GiB 4000.81 GB) Used Dev Size : 976761408 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 6 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Wed Apr 11 02:23:29 2012 State : active, degraded, recovering Active Devices : 4 Working Devices : 6 Failed Devices : 0 Spare Devices : 2 Layout : left-symmetric Chunk Size : 64K Rebuild Status : 12% complete Name : mbpc:0 (local to host mbpc) UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e Events : 4882 Number Major Minor RaidDevice State 0 8 48 0 active sync /dev/sdd 1 8 96 1 spare rebuilding /dev/sdg 2 8 32 2 active sync /dev/sdc 3 8 16 3 active sync /dev/sdb 4 0 0 4 removed 5 8 0 5 active sync /dev/sda 4 8 80 – spare /dev/sdf # Once recovered, we'll check the parameters of the disks to ensure settings are correct and adjust otherwise using steps from PAGE 1 of this post: *# for dn in $(ls /dev/rsd); do WCH=$(/sbin/hdparm -I $dn\|grep -i "Write cache"); echo $dn": $WCH"; done** /dev/rsda: Write cache /dev/rsdb: * Write cache /dev/rsdc: Write cache /dev/rsdd: Write cache /dev/rsde: * Write cache /dev/rsdf: Write cache # Disable the cache on the two readded disks (Again only if using XFS) as they would have it enabled by default when plugged back in: # hdparm -W 0 /dev/rsdb; /dev/rsdb: setting drive write-caching to 0 (off) write-caching = 0 (off) # hdparm -W 0 /dev/rsde; /dev/rsde: setting drive write-caching to 0 (off) write-caching = 0 (off) #
6	This is a repeat of test #5 above I've done earlier, in fact, prior to doing test 5 actually. This was also successful. The setup is as follows: Do the following all in sequence and in quick succession of each other: Start writing large GB files repeatedly to the mount on which the RAID6 attay is on. Unplug SATA cable from one drive.* # ( simulate )* Unplug SATA cable from a second drive.* # ( simulate )* Begin accessing one of the files on the RAID6 array. Shut the Power Supply down using the on/off switch on the Power Supply. For this test, 2TB RAID6 drive is ~< 2% full. * Simulate a drive beign physically pulled by essentially cutting power to the device like this: echo 1 > /sys/block/<DEVICE>/device/delete	PASS (SECOND TEST): First, let's check that the array is fine and no disks show errors or potential problems that could grow into problems for us: # for ddn in $(ls /dev/sd{a,b,c,d,e,f,g}); do DDN=$(smartctl -A $ddn\|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done /dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdc: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sde: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdf: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdg: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0 # Again check with # mdadm –detail /dev/raidmd0 /dev/raidmd0: Version : 1.2 Creation Time : Mon Mar 26 00:06:24 2012 Raid Level : raid6 Array Size : 3907045632 (3726.05 GiB 4000.81 GB) Used Dev Size : 976761408 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 6 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Tue Mar 27 00:30:49 2012 State : active Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : mbpc:0 (local to host mbpc) UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e Events : 3063 Number Major Minor RaidDevice State 0 8 32 0 active sync /dev/sdc 1 8 64 1 active sync /dev/sde 2 8 48 2 active sync /dev/sdd 3 8 80 3 active sync /dev/sdf 4 8 16 4 active sync /dev/sdb 5 8 0 5 active sync /dev/sda to ensure everything is within an expected state and the array shows active with all disks participating in the RAID6 array showing as active sync. Next spin down / cut power to two of the disks: echo 1 > /sys/block/sdc/device/delete echo 1 > /sys/block/sda/device/delete The result of the operation should show this: # mdadm –detail /dev/raidmd0 . . State : active, degraded . . Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 64 1 active sync /dev/sde 2 8 48 2 active sync /dev/sdd 3 8 80 3 active sync /dev/sdf 4 8 16 4 active sync /dev/sdb 5 0 0 5 removed Now we pull the plug while our write job to the disk is in progress writing 4.7GB files. After the reboot, the system came back up but did not mount: # mdadm –detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Mon Mar 26 00:06:24 2012 Raid Level : raid6 Used Dev Size : 976761408 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 4 Persistence : Superblock is persistent Update Time : Tue Mar 27 00:36:15 2012 State : active, degraded, Not Started Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : mbpc:0 (local to host mbpc) UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e Events : 3132 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 64 1 active sync /dev/sde 2 8 48 2 active sync /dev/sdd 3 8 80 3 active sync /dev/sdf 4 8 16 4 active sync /dev/sdb 5 0 0 5 removed # So let's try to reassemble the array: # mdadm –assemble –scan mdadm: /dev/md/0 is already in use. # So we stop the array as before: # mdadm –stop /dev/md0 mdadm: stopped /dev/md0 # Now we're getting somewhere: # mdadm –assemble –scan mdadm: /dev/md/0 assembled from 4 drives – not enough to start the array while not clean – consider –force. # Looks like we may have to add the disks back first as per the first suggestion before we try to force (Relying on my experience here, force option to anything can sometimes have unintended consequences so I tend to use it as a method of last resort). To do this, we look at our UDEV rules to tell us which disks we are to add: # ls -al /dev/rsd* lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsda -> sdc lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsdb -> sde lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsdc -> sdd lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsdd -> sdf lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsde -> sdb lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsdf -> sda # But then I realize, I can't tell which disks I pulled (I can't tell from the outside either because all 6 are still cabled the same, just have their power cut / platters spun down from the OS where they don't appear now either as a result.): No problem. Let's try to start the array in degraded mode which is the way I would choose anyway as we're running a test here folks: # mdadm –assemble –scan –force mdadm: Marking array /dev/md/0 as 'clean' mdadm: /dev/md/0 has been started with 4 drives (out of 6). # And check again to make sure we are clean: # for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i}); do DDN=$(smartctl -A $ddn\|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done /dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdc: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sde: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdf: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0 /dev/sdg: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0 # Now before we reassemble, I want to see if we can mount our array: # mount /mnt/MBPCBackupx/ mount: special device /dev/MBPCStorage/MBPCBackup does not exist # lvm vgchange -a y 3 logical volume(s) in volume group "VGEntertain" now active 3 logical volume(s) in volume group "mbpcvg" now active 1 logical volume(s) in volume group "MBPCStorage" now active # And try mounting now. Success. Now we can see the files with the original dates applied to the completed ones: # ls -al total 17499768 drwxr-xr-x. 2 root root 108 Mar 27 00:34 . drwxr-xr-x. 6 root root 4096 Mar 24 23:53 .. -rwxr–r–. 1 root root 4720553072 May 4 2011 testX0.m2ts -rwxr–r–. 1 root root 4720553072 May 4 2011 testX1.m2ts -rwx——. 1 root root 3219345408 Mar 27 00:35 testX2.m2ts -rwxr–r–. 1 root root 4720553072 May 4 2011 testXseq.m2ts # # xfs_db -c frag -r /dev/MBPCStorage/MBPCBackup actual 4, ideal 4, fragmentation factor 0.00% # # xfs_check /dev/MBPCStorage/MBPCBackup # I'm going to try to access a file off this disk by switching users to my videouser non privileged user account. I want to see that the file is read fine and still workable before readding any disks. This was a success as the video played fine. Now to readd the disks: # mdadm /dev/md0 -a /dev/rsda mdadm: re-added /dev/rsda # # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdc[0] sde[1] sdb[4] sdf[3] sdd[2] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [_UUUU_] [=>……………….] recovery = 6.3% (61590408/976761408) finish=334.2min speed=45632K/sec bitmap: 2/8 pages [8KB], 65536KB chunk unused devices: <none> # Now because of our internal bitmap, the array started to rebuild very very quickly, which is good. Now to add another drive while this is going on…too late…the resync was done before I could even blink: # mdadm –detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Mon Mar 26 00:06:24 2012 Raid Level : raid6 Array Size : 3907045632 (3726.05 GiB 4000.81 GB) Used Dev Size : 976761408 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 5 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Tue Mar 27 01:16:43 2012 State : active, degraded Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : mbpc:0 (local to host mbpc) UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e Events : 3186 Number Major Minor RaidDevice State 0 8 32 0 active sync /dev/sdc 1 8 64 1 active sync /dev/sde 2 8 48 2 active sync /dev/sdd 3 8 80 3 active sync /dev/sdf 4 8 16 4 active sync /dev/sdb 5 0 0 5 removed # # mdadm /dev/md0 -a /dev/rsdf mdadm: re-added /dev/rsdf # # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sda[5] sdc[0] sde[1] sdb[4] sdf[3] sdd[2] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUUUU_] [=>……………….] recovery = 5.5% (54287616/976761408) finish=324.2min speed=47413K/sec bitmap: 2/8 pages [8KB], 65536KB chunk unused devices: <none> # And the rebuilding jumps by leaps and bounds and is done in seconds: # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sda[5] sdc[0] sde[1] sdb[4] sdf[3] sdd[2] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUUUU_] [====>…………….] recovery = 24.9% (244180084/976761408) finish=277.1min speed=44047K/sec bitmap: 2/8 pages [8KB], 65536KB chunk unused devices: <none> # And resync of the second disk is done before you know it: # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sda[5] sdc[0] sde[1] sdb[4] sdf[3] sdd[2] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU] bitmap: 0/8 pages [0KB], 65536KB chunk unused devices: <none> # And now our array is back up. NOTE: We've added the disks all the while rebuilding the array when mounted under /mnt/MBPCBackupx: # mdadm –detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Mon Mar 26 00:06:24 2012 Raid Level : raid6 Array Size : 3907045632 (3726.05 GiB 4000.81 GB) Used Dev Size : 976761408 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 6 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Tue Mar 27 01:20:05 2012 State : active Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : mbpc:0 (local to host mbpc) UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e Events : 3210 Number Major Minor RaidDevice State 0 8 32 0 active sync /dev/sdc 1 8 64 1 active sync /dev/sde 2 8 48 2 active sync /dev/sdd 3 8 80 3 active sync /dev/sdf 4 8 16 4 active sync /dev/sdb 5 8 0 5 active sync /dev/sda # And this concludes successful AVAILABILITY TEST #6.
7	Unmount the array off of /mnt/MBPCBackupx, stop it, spin down all disks then reassemble the array in the same hard disk order and still ensure you can see the data. Before adding each disk back, power down each disk then bring it up using commands above or on the first page of this post.	PASS: For this test, we run the following commands to basically disassemble the array and then try to reassemble it (Assuming the RAID6 is mounted on MBPCBackupx and under /dev/raidmd0 or /dev/md0): # du -sh . 2.8T . # pwd /mnt/MBPCBackupx # cd .. # umount MBPCBackupx # lvm lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert MBPCBackup MBPCStorage -wi-a– 2.73t . . . # lvm vgs VG #PV #LV #SN Attr VSize VFree MBPCStorage 1 1 0 wz–n- 3.64t 931.70g . . # ls -al /dev/MBPCStorage/MBPCBackup lrwxrwxrwx. 1 root root 7 Apr 15 17:51 /dev/MBPCStorage/MBPCBackup -> ../dm-6 # lvm vgchange -a n MBPCStorage 0 logical volume(s) in volume group "MBPCStorage" now active # lvm lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert MBPCBackup MBPCStorage -wi-–– 2.73t . . . # ls -al /dev/MBPCStorage/MBPCBackup ls: cannot access /dev/MBPCStorage/MBPCBackup: No such file or directory # mdadm –stop /dev/md0 mdadm: stopped /dev/md0 # ls -al /dev/rsd* ls: cannot access /dev/rsd: No such file or directory # udevadm trigger 2>&1* # ls -al /dev/rsd* lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsda -> sdf lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsdb -> sdb lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsdc -> sdd lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsdd -> sdc lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsde -> sde lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsdf -> sda # mdadm –stop /dev/md0 mdadm: stopped /dev/md0 # ls -al /dev/md0 ls: cannot access /dev/md0: No such file or directory # ls -al /dev/raidmd0 ls: cannot access /dev/raidmd0: No such file or directory # # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] unused devices: <none> # And now we can power down / spin down each disk that corresponds to the actual rsdX number above: echo 1 > /sys/block/sda/device/delete echo 1 > /sys/block/sdb/device/delete echo 1 > /sys/block/sdc/device/delete echo 1 > /sys/block/sdd/device/delete echo 1 > /sys/block/sde/device/delete echo 1 > /sys/block/sdf/device/delete (You should hear clicking sounds of the heads returning to their spot as well as the subtle sound of the disk spinning down much like when you shut down the system.) Allow for a minute to let the disks spin down. At this point I'll unplug the SATA cables and replug them back in. I expect that when I reassemble my array, all my files will be there. /var/log/messages Apr 15 17:54:44 mbpc kernel: md0: detected capacity change from 4000814727168 to 0 Apr 15 17:54:44 mbpc kernel: md: md0 stopped. Apr 15 17:54:44 mbpc kernel: md: unbind<sdd> Apr 15 17:54:44 mbpc kernel: md: export_rdev(sdd) Apr 15 17:54:44 mbpc kernel: md: unbind<sdb> Apr 15 17:54:44 mbpc kernel: md: export_rdev(sdb) Apr 15 17:54:44 mbpc kernel: md: unbind<sde> Apr 15 17:54:44 mbpc kernel: md: export_rdev(sde) Apr 15 17:54:44 mbpc kernel: md: unbind<sda> Apr 15 17:54:44 mbpc kernel: md: export_rdev(sda) Apr 15 17:54:44 mbpc kernel: md: unbind<sdc> Apr 15 17:54:44 mbpc kernel: md: export_rdev(sdc) Apr 15 17:54:44 mbpc kernel: md: unbind<sdf> Apr 15 17:54:44 mbpc kernel: md: export_rdev(sdf) Apr 15 18:26:34 mbpc kernel: sd 0:0:0:0: [sda] Stopping disk Apr 15 18:26:35 mbpc kernel: ata1.00: disabled Apr 15 18:26:43 mbpc kernel: sd 1:0:0:0: [sdb] Stopping disk Apr 15 18:26:44 mbpc kernel: ata2.00: disabled Apr 15 18:27:14 mbpc kernel: sd 2:0:0:0: [sdc] Stopping disk Apr 15 18:27:14 mbpc kernel: ata3.00: disabled Apr 15 18:27:21 mbpc kernel: sd 3:0:0:0: [sdd] Stopping disk Apr 15 18:27:21 mbpc kernel: ata4.00: disabled Apr 15 18:27:27 mbpc kernel: sd 4:0:0:0: [sde] Stopping disk Apr 15 18:27:28 mbpc kernel: ata5.00: disabled Apr 15 18:27:33 mbpc kernel: sd 5:0:0:0: [sdf] Stopping disk Apr 15 18:27:34 mbpc kernel: ata6.00: disabled Apr 15 18:30:00 mbpc kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen Apr 15 18:30:00 mbpc kernel: ata3: irq_stat 0x00400000, PHY RDY changed Apr 15 18:30:00 mbpc kernel: ata3: SError: { RecovComm Persist PHYRdyChg 10B8B } Apr 15 18:30:00 mbpc kernel: ata3: hard resetting link Apr 15 18:30:00 mbpc kernel: ata3: SATA link down (SStatus 0 SControl 300) Apr 15 18:30:00 mbpc kernel: ata3: EH complete Apr 15 18:32:01 mbpc kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen Apr 15 18:32:01 mbpc kernel: ata5: irq_stat 0x00400000, PHY RDY changed Apr 15 18:32:01 mbpc kernel: ata5: SError: { RecovComm Persist PHYRdyChg 10B8B } Apr 15 18:32:01 mbpc kernel: ata5: hard resetting link Apr 15 18:32:02 mbpc kernel: ata5: SATA link down (SStatus 0 SControl 300) Apr 15 18:32:02 mbpc kernel: ata5: EH complete Apr 15 18:32:12 mbpc kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen Apr 15 18:32:12 mbpc kernel: ata4: irq_stat 0x00400000, PHY RDY changed Apr 15 18:32:12 mbpc kernel: ata4: SError: { RecovComm Persist PHYRdyChg 10B8B } Apr 15 18:32:12 mbpc kernel: ata4: hard resetting link Apr 15 18:32:12 mbpc kernel: ata4: SATA link down (SStatus 0 SControl 300) Apr 15 18:32:12 mbpc kernel: ata4: EH complete Apr 15 18:32:18 mbpc kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen Apr 15 18:32:18 mbpc kernel: ata6: irq_stat 0x00400000, PHY RDY changed Apr 15 18:32:18 mbpc kernel: ata6: SError: { RecovComm Persist PHYRdyChg 10B8B } Apr 15 18:32:18 mbpc kernel: ata6: hard resetting link Apr 15 18:32:19 mbpc kernel: ata6: SATA link down (SStatus 0 SControl 300) Apr 15 18:32:19 mbpc kernel: ata6: EH complete Apr 15 18:32:22 mbpc kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen Apr 15 18:32:22 mbpc kernel: ata2: irq_stat 0x00400000, PHY RDY changed Apr 15 18:32:22 mbpc kernel: ata2: SError: { RecovComm Persist PHYRdyChg 10B8B } Apr 15 18:32:22 mbpc kernel: ata2: hard resetting link Apr 15 18:32:23 mbpc kernel: ata2: SATA link down (SStatus 0 SControl 300) Apr 15 18:32:23 mbpc kernel: ata2: EH complete Apr 15 18:32:26 mbpc kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen Apr 15 18:32:26 mbpc kernel: ata1: irq_stat 0x00400000, PHY RDY changed Apr 15 18:32:26 mbpc kernel: ata1: SError: { RecovComm Persist PHYRdyChg 10B8B } Apr 15 18:32:26 mbpc kernel: ata1: hard resetting link Apr 15 18:32:26 mbpc kernel: ata1: SATA link down (SStatus 0 SControl 300) Apr 15 18:32:26 mbpc kernel: ata1: EH complete The smartd daemon might complain as well, but this is normal in our case: /var/log/messages Apr 15 18:33:25 mbpc smartd[2325]: Device: /dev/sda [SAT], open() failed: No such device Apr 15 18:33:25 mbpc smartd[2325]: Sending warning via mail to root … Apr 15 18:33:25 mbpc smartd[2325]: Warning via mail to root: successful Apr 15 18:33:25 mbpc smartd[2325]: Device: /dev/sdb [SAT], open() failed: No such device Apr 15 18:33:25 mbpc smartd[2325]: Sending warning via mail to root … Apr 15 18:33:25 mbpc smartd[2325]: Warning via mail to root: successful Apr 15 18:33:25 mbpc smartd[2325]: Device: /dev/sdc [SAT], open() failed: No such device Apr 15 18:33:25 mbpc smartd[2325]: Sending warning via mail to root … Apr 15 18:33:26 mbpc smartd[2325]: Warning via mail to root: successful Apr 15 18:33:26 mbpc smartd[2325]: Device: /dev/sdd [SAT], open() failed: No such device Apr 15 18:33:26 mbpc smartd[2325]: Sending warning via mail to root … Apr 15 18:33:26 mbpc smartd[2325]: Warning via mail to root: successful Apr 15 18:33:26 mbpc smartd[2325]: Device: /dev/sde [SAT], open() failed: No such device Apr 15 18:33:26 mbpc smartd[2325]: Sending warning via mail to root … Apr 15 18:33:26 mbpc smartd[2325]: Warning via mail to root: successful Apr 15 18:33:26 mbpc smartd[2325]: Device: /dev/sdf [SAT], open() failed: No such device Apr 15 18:33:26 mbpc smartd[2325]: Sending warning via mail to root … Apr 15 18:33:26 mbpc smartd[2325]: Warning via mail to root: successful Now let's plug everything back in again. So here we go folks (Hopefully you didn't forget the cable sequence). Now the interesting thing was that once we replugged all the SATA cables back in, the array reassembled itself without having to do so manually!: /var/log/messages Apr 15 18:36:19 mbpc kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen Apr 15 18:36:19 mbpc kernel: ata6: irq_stat 0x00000040, connection status changed Apr 15 18:36:19 mbpc kernel: ata6: SError: { CommWake 10B8B DevExch } Apr 15 18:36:19 mbpc kernel: ata6: hard resetting link Apr 15 18:36:20 mbpc kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 15 18:36:20 mbpc kernel: ata6.00: ATA-8: ST31000520AS, CC32, max UDMA/133 Apr 15 18:36:20 mbpc kernel: ata6.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 15 18:36:20 mbpc kernel: ata6.00: configured for UDMA/133 Apr 15 18:36:20 mbpc kernel: ata6: EH complete Apr 15 18:36:20 mbpc kernel: scsi 5:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5 Apr 15 18:36:20 mbpc kernel: sd 5:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) Apr 15 18:36:20 mbpc kernel: sd 5:0:0:0: [sda] Write Protect is off Apr 15 18:36:20 mbpc kernel: sd 5:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Apr 15 18:36:20 mbpc kernel: sd 5:0:0:0: Attached scsi generic sg0 type 0 Apr 15 18:36:25 mbpc kernel: sda: unknown partition table Apr 15 18:36:25 mbpc kernel: sd 5:0:0:0: [sda] Attached SCSI disk Apr 15 18:36:25 mbpc kernel: md: bind<sda> Apr 15 18:36:26 mbpc kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen Apr 15 18:36:26 mbpc kernel: ata5: irq_stat 0x00000040, connection status changed Apr 15 18:36:26 mbpc kernel: ata5: SError: { DevExch } Apr 15 18:36:26 mbpc kernel: ata5: hard resetting link Apr 15 18:36:27 mbpc kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 15 18:36:27 mbpc kernel: ata5.00: ATA-8: ST31000520AS, CC32, max UDMA/133 Apr 15 18:36:27 mbpc kernel: ata5.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 15 18:36:27 mbpc kernel: ata5.00: configured for UDMA/133 Apr 15 18:36:27 mbpc kernel: ata5: EH complete Apr 15 18:36:27 mbpc kernel: scsi 4:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5 Apr 15 18:36:27 mbpc kernel: sd 4:0:0:0: Attached scsi generic sg1 type 0 Apr 15 18:36:27 mbpc kernel: sd 4:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) Apr 15 18:36:27 mbpc kernel: sd 4:0:0:0: [sdb] Write Protect is off Apr 15 18:36:27 mbpc kernel: sd 4:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Apr 15 18:36:32 mbpc kernel: sdb: unknown partition table Apr 15 18:36:32 mbpc kernel: sd 4:0:0:0: [sdb] Attached SCSI disk Apr 15 18:36:32 mbpc kernel: md: bind<sdb> Apr 15 18:36:34 mbpc kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen Apr 15 18:36:34 mbpc kernel: ata4: irq_stat 0x00000040, connection status changed Apr 15 18:36:34 mbpc kernel: ata4: SError: { DevExch } Apr 15 18:36:34 mbpc kernel: ata4: hard resetting link Apr 15 18:36:35 mbpc kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 15 18:36:35 mbpc kernel: ata4.00: ATA-8: ST31000520AS, CC32, max UDMA/133 Apr 15 18:36:35 mbpc kernel: ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 15 18:36:35 mbpc kernel: ata4.00: configured for UDMA/133 Apr 15 18:36:35 mbpc kernel: ata4: EH complete Apr 15 18:36:35 mbpc kernel: scsi 3:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5 Apr 15 18:36:35 mbpc kernel: sd 3:0:0:0: Attached scsi generic sg2 type 0 Apr 15 18:36:35 mbpc kernel: sd 3:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) Apr 15 18:36:35 mbpc kernel: sd 3:0:0:0: [sdc] Write Protect is off Apr 15 18:36:35 mbpc kernel: sd 3:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Apr 15 18:36:40 mbpc kernel: sdc: unknown partition table Apr 15 18:36:40 mbpc kernel: sd 3:0:0:0: [sdc] Attached SCSI disk Apr 15 18:36:40 mbpc kernel: md: bind<sdc> Apr 15 18:36:42 mbpc kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen Apr 15 18:36:42 mbpc kernel: ata3: irq_stat 0x00000040, connection status changed Apr 15 18:36:42 mbpc kernel: ata3: SError: { CommWake 10B8B DevExch } Apr 15 18:36:42 mbpc kernel: ata3: hard resetting link Apr 15 18:36:43 mbpc kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 15 18:36:43 mbpc kernel: ata3.00: ATA-8: ST31000520AS, CC32, max UDMA/133 Apr 15 18:36:43 mbpc kernel: ata3.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 15 18:36:43 mbpc kernel: ata3.00: configured for UDMA/133 Apr 15 18:36:43 mbpc kernel: ata3: EH complete Apr 15 18:36:43 mbpc kernel: scsi 2:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5 Apr 15 18:36:43 mbpc kernel: sd 2:0:0:0: Attached scsi generic sg3 type 0 Apr 15 18:36:43 mbpc kernel: sd 2:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) Apr 15 18:36:43 mbpc kernel: sd 2:0:0:0: [sdd] Write Protect is off Apr 15 18:36:43 mbpc kernel: sd 2:0:0:0: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Apr 15 18:36:48 mbpc kernel: sdd: unknown partition table Apr 15 18:36:48 mbpc kernel: sd 2:0:0:0: [sdd] Attached SCSI disk Apr 15 18:36:48 mbpc kernel: md: bind<sdd> Apr 15 18:36:48 mbpc kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x4080000 action 0xe frozen Apr 15 18:36:48 mbpc kernel: ata2: irq_stat 0x00000040, connection status changed Apr 15 18:36:48 mbpc kernel: ata2: SError: { 10B8B DevExch } Apr 15 18:36:48 mbpc kernel: ata2: hard resetting link Apr 15 18:36:49 mbpc kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 15 18:36:49 mbpc kernel: ata2.00: ATA-8: ST31000520AS, CC32, max UDMA/133 Apr 15 18:36:49 mbpc kernel: ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 15 18:36:49 mbpc kernel: ata2.00: configured for UDMA/133 Apr 15 18:36:49 mbpc kernel: ata2: EH complete Apr 15 18:36:49 mbpc kernel: scsi 1:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5 Apr 15 18:36:49 mbpc kernel: sd 1:0:0:0: Attached scsi generic sg4 type 0 Apr 15 18:36:49 mbpc kernel: sd 1:0:0:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) Apr 15 18:36:49 mbpc kernel: sd 1:0:0:0: [sde] Write Protect is off Apr 15 18:36:49 mbpc kernel: sd 1:0:0:0: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Apr 15 18:36:54 mbpc kernel: sde: unknown partition table Apr 15 18:36:54 mbpc kernel: sd 1:0:0:0: [sde] Attached SCSI disk Apr 15 18:36:55 mbpc kernel: md: bind<sde> Apr 15 18:36:56 mbpc kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4080000 action 0xe frozen Apr 15 18:36:56 mbpc kernel: ata1: irq_stat 0x00000040, connection status changed Apr 15 18:36:56 mbpc kernel: ata1: SError: { 10B8B DevExch } Apr 15 18:36:56 mbpc kernel: ata1: hard resetting link Apr 15 18:36:57 mbpc kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Apr 15 18:36:57 mbpc kernel: ata1.00: ATA-8: ST1500DL003-9VT16L, CC3C, max UDMA/133 Apr 15 18:36:57 mbpc kernel: ata1.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 15 18:36:57 mbpc kernel: ata1.00: configured for UDMA/133 Apr 15 18:36:57 mbpc kernel: ata1: EH complete Apr 15 18:36:57 mbpc kernel: scsi 0:0:0:0: Direct-Access ATA ST1500DL003-9VT1 CC3C PQ: 0 ANSI: 5 Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: Attached scsi generic sg5 type 0 Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: [sdf] 2930277168 512-byte logical blocks: (1.50 TB/1.36 TiB) Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: [sdf] 4096-byte physical blocks Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: [sdf] Write Protect is off Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: [sdf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Apr 15 18:37:04 mbpc kernel: sdf: unknown partition table Apr 15 18:37:04 mbpc kernel: sd 0:0:0:0: [sdf] Attached SCSI disk Apr 15 18:37:04 mbpc kernel: md: bind<sdf> Apr 15 18:37:04 mbpc kernel: bio: create slab <bio-1> at 1 Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sdf operational as raid disk 5 Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sde operational as raid disk 1 Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sdd operational as raid disk 0 Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sdc operational as raid disk 2 Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sdb operational as raid disk 4 Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sda operational as raid disk 3 Apr 15 18:37:04 mbpc kernel: md/raid:md0: allocated 6386kB Apr 15 18:37:04 mbpc kernel: md/raid:md0: raid level 6 active with 6 out of 6 devices, algorithm 2 Apr 15 18:37:04 mbpc kernel: created bitmap (8 pages) for device md0 Apr 15 18:37:04 mbpc kernel: md0: bitmap initialized from disk: read 1/1 pages, set 0 of 14905 bits Apr 15 18:37:04 mbpc kernel: md0: detected capacity change from 0 to 4000814727168 Apr 15 18:37:04 mbpc kernel: md0: unknown partition table And let's check with the standard mdadm tools: # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdf[5] sde[1] sdd[0] sdc[2] sdb[4] sda[3] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU] bitmap: 0/8 pages [0KB], 65536KB chunk unused devices: <none> # # ls -al /dev/raidmd0 lrwxrwxrwx. 1 root root 3 Apr 15 18:37 /dev/raidmd0 -> md0 # ls -al /dev/rsd* lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsda -> sdd lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsdb -> sde lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsdc -> sdc lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsdd -> sda lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsde -> sdb lrwxrwxrwx. 1 root root 3 Apr 15 18:37 /dev/rsdf -> sdf # # mdadm –detail /dev/raidmd0 /dev/raidmd0: Version : 1.2 Creation Time : Mon Mar 26 00:06:24 2012 Raid Level : raid6 Array Size : 3907045632 (3726.05 GiB 4000.81 GB) Used Dev Size : 976761408 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 6 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sun Apr 15 17:52:23 2012 State : active Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : mbpc:0 (local to host mbpc) UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e Events : 5171 Number Major Minor RaidDevice State 0 8 48 0 active sync /dev/sdd 1 8 64 1 active sync /dev/sde 2 8 32 2 active sync /dev/sdc 3 8 0 3 active sync /dev/sda 4 8 16 4 active sync /dev/sdb 5 8 80 5 active sync /dev/sdf # Wow! Now that, I do have to say I didn't expect to have happened all on it's own like that. Now let's try to set the VG active again and mount the array back up and check the status of our files: # lvm vgs VG #PV #LV #SN Attr VSize VFree MBPCStorage 1 1 0 wz–n- 3.64t 931.70g # lvm lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert MBPCBackup MBPCStorage -wi-–– 2.73t # lvm vgchange -a y MBPCStorage 1 logical volume(s) in volume group "MBPCStorage" now active # lvm lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert MBPCBackup MBPCStorage -wi-a– 2.73t # lvm vgs VG #PV #LV #SN Attr VSize VFree MBPCStorage 1 1 0 wz–n- 3.64t 931.70g # And now for the mount: # mount /mnt/MBPCBackupx/ # mount\|grep MBPC /dev/mapper/MBPCStorage-MBPCBackup on /mnt/MBPCBackupx type xfs (rw,noatime,nodiratime,logbufs=8,allocsize=512m) # cd /mnt/MBPCBackup bash: cd: /mnt/MBPCBackup: No such file or directory # cd /mnt/MBPCBackupx/ # du -sh . 2.8T . # df -m . Filesystem 1M-blocks Used Available Use% Mounted on /dev/mapper/MBPCStorage-MBPCBackup 2861348 2849239 12110 100% /mnt/MBPCBackupx # This test is definitely a success!
8	In this test, we'll repeat TEST #7 however reassemble in a different order then how it was originally assembled. The UDEV rules we've created earlier will be key here.	PASS: Because the only thing that is different is how we replug the SATA cables, everything will be the same as in AVAILABILITY TEST #7 above with the exception that we'll replug the SATA cables in random order then what we originally had. The result: /var/log/messages Apr 15 22:33:02 mbpc kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x4080000 action 0xe frozen Apr 15 22:33:02 mbpc kernel: ata6: irq_stat 0x00000040, connection status changed Apr 15 22:33:02 mbpc kernel: ata6: SError: { 10B8B DevExch } Apr 15 22:33:02 mbpc kernel: ata6: hard resetting link Apr 15 22:33:03 mbpc kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 15 22:33:03 mbpc kernel: ata6.00: ATA-8: ST31000520AS, CC32, max UDMA/133 Apr 15 22:33:03 mbpc kernel: ata6.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 15 22:33:03 mbpc kernel: ata6.00: configured for UDMA/133 Apr 15 22:33:03 mbpc kernel: ata6: EH complete Apr 15 22:33:03 mbpc kernel: scsi 5:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5 Apr 15 22:33:03 mbpc kernel: sd 5:0:0:0: Attached scsi generic sg0 type 0 Apr 15 22:33:03 mbpc kernel: sd 5:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) Apr 15 22:33:03 mbpc kernel: sd 5:0:0:0: [sda] Write Protect is off Apr 15 22:33:03 mbpc kernel: sd 5:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Apr 15 22:33:08 mbpc kernel: sda: unknown partition table Apr 15 22:33:08 mbpc kernel: sd 5:0:0:0: [sda] Attached SCSI disk Apr 15 22:33:08 mbpc kernel: md: bind<sda> Apr 15 22:33:10 mbpc kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen Apr 15 22:33:10 mbpc kernel: ata4: irq_stat 0x00000040, connection status changed Apr 15 22:33:10 mbpc kernel: ata4: SError: { CommWake 10B8B DevExch } Apr 15 22:33:10 mbpc kernel: ata4: hard resetting link Apr 15 22:33:10 mbpc kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 15 22:33:10 mbpc kernel: ata4.00: ATA-8: ST31000520AS, CC32, max UDMA/133 Apr 15 22:33:10 mbpc kernel: ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 15 22:33:10 mbpc kernel: ata4.00: configured for UDMA/133 Apr 15 22:33:10 mbpc kernel: ata4: EH complete Apr 15 22:33:10 mbpc kernel: scsi 3:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5 Apr 15 22:33:10 mbpc kernel: sd 3:0:0:0: Attached scsi generic sg1 type 0 Apr 15 22:33:10 mbpc kernel: sd 3:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) Apr 15 22:33:10 mbpc kernel: sd 3:0:0:0: [sdb] Write Protect is off Apr 15 22:33:10 mbpc kernel: sd 3:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Apr 15 22:33:16 mbpc kernel: sdb: unknown partition table Apr 15 22:33:16 mbpc kernel: sd 3:0:0:0: [sdb] Attached SCSI disk Apr 15 22:33:16 mbpc kernel: md: bind<sdb> Apr 15 22:33:20 mbpc kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen Apr 15 22:33:20 mbpc kernel: ata2: irq_stat 0x00000040, connection status changed Apr 15 22:33:20 mbpc kernel: ata2: SError: { DevExch } Apr 15 22:33:20 mbpc kernel: ata2: hard resetting link Apr 15 22:33:21 mbpc kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 15 22:33:21 mbpc kernel: ata2.00: ATA-8: ST31000520AS, CC32, max UDMA/133 Apr 15 22:33:21 mbpc kernel: ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 15 22:33:21 mbpc kernel: ata2.00: configured for UDMA/133 Apr 15 22:33:21 mbpc kernel: ata2: EH complete Apr 15 22:33:21 mbpc kernel: scsi 1:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5 Apr 15 22:33:21 mbpc kernel: sd 1:0:0:0: Attached scsi generic sg2 type 0 Apr 15 22:33:21 mbpc kernel: sd 1:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) Apr 15 22:33:21 mbpc kernel: sd 1:0:0:0: [sdc] Write Protect is off Apr 15 22:33:21 mbpc kernel: sd 1:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Apr 15 22:33:25 mbpc smartd[2325]: Device: /dev/sdc [SAT], open() failed: Permission denied Apr 15 22:33:25 mbpc smartd[2325]: Device: /dev/sdd [SAT], open() failed: No such device Apr 15 22:33:25 mbpc smartd[2325]: Device: /dev/sde [SAT], open() failed: No such device Apr 15 22:33:25 mbpc smartd[2325]: Device: /dev/sdf [SAT], open() failed: No such device Apr 15 22:33:26 mbpc kernel: sdc: unknown partition table Apr 15 22:33:26 mbpc kernel: sd 1:0:0:0: [sdc] Attached SCSI disk Apr 15 22:33:27 mbpc kernel: md: bind<sdc> Apr 15 22:33:44 mbpc kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen Apr 15 22:33:44 mbpc kernel: ata5: irq_stat 0x00000040, connection status changed Apr 15 22:33:44 mbpc kernel: ata5: SError: { CommWake 10B8B DevExch } Apr 15 22:33:44 mbpc kernel: ata5: hard resetting link Apr 15 22:33:49 mbpc kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 15 22:33:49 mbpc kernel: ata5: link online but 1 devices misclassified, retrying Apr 15 22:33:49 mbpc kernel: ata5: reset failed (errno=-11), retrying in 5 secs Apr 15 22:33:54 mbpc kernel: ata5: hard resetting link Apr 15 22:33:54 mbpc kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 15 22:33:54 mbpc kernel: ata5.00: ATA-8: ST31000520AS, CC32, max UDMA/133 Apr 15 22:33:54 mbpc kernel: ata5.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 15 22:33:54 mbpc kernel: ata5.00: configured for UDMA/133 Apr 15 22:33:54 mbpc kernel: ata5: EH complete Apr 15 22:33:54 mbpc kernel: scsi 4:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5 Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: Attached scsi generic sg3 type 0 Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: [sdd] Write Protect is off Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Apr 15 22:33:54 mbpc kernel: sdd: unknown partition table Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: [sdd] Attached SCSI disk Apr 15 22:33:55 mbpc kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen Apr 15 22:33:55 mbpc kernel: ata3: irq_stat 0x00000040, connection status changed Apr 15 22:33:55 mbpc kernel: ata3: SError: { CommWake 10B8B DevExch } Apr 15 22:33:55 mbpc kernel: ata3: hard resetting link Apr 15 22:33:55 mbpc kernel: md: bind<sdd> Apr 15 22:33:55 mbpc kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 15 22:33:55 mbpc kernel: ata3.00: ATA-8: ST31000520AS, CC32, max UDMA/133 Apr 15 22:33:55 mbpc kernel: ata3.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 15 22:33:55 mbpc kernel: ata3.00: configured for UDMA/133 Apr 15 22:33:55 mbpc kernel: ata3: EH complete Apr 15 22:33:55 mbpc kernel: scsi 2:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5 Apr 15 22:33:55 mbpc kernel: sd 2:0:0:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) Apr 15 22:33:55 mbpc kernel: sd 2:0:0:0: [sde] Write Protect is off Apr 15 22:33:55 mbpc kernel: sd 2:0:0:0: Attached scsi generic sg4 type 0 Apr 15 22:33:55 mbpc kernel: sd 2:0:0:0: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Apr 15 22:34:01 mbpc kernel: sde: unknown partition table Apr 15 22:34:01 mbpc kernel: sd 2:0:0:0: [sde] Attached SCSI disk Apr 15 22:34:01 mbpc kernel: md: bind<sde> Apr 15 22:34:10 mbpc kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen Apr 15 22:34:10 mbpc kernel: ata1: irq_stat 0x00000040, connection status changed Apr 15 22:34:10 mbpc kernel: ata1: SError: { DevExch } Apr 15 22:34:10 mbpc kernel: ata1: hard resetting link Apr 15 22:34:11 mbpc kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Apr 15 22:34:11 mbpc kernel: ata1.00: ATA-8: ST1500DL003-9VT16L, CC3C, max UDMA/133 Apr 15 22:34:11 mbpc kernel: ata1.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32) Apr 15 22:34:11 mbpc kernel: ata1.00: configured for UDMA/133 Apr 15 22:34:11 mbpc kernel: ata1: EH complete Apr 15 22:34:11 mbpc kernel: scsi 0:0:0:0: Direct-Access ATA ST1500DL003-9VT1 CC3C PQ: 0 ANSI: 5 Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: Attached scsi generic sg5 type 0 Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: [sdf] 2930277168 512-byte logical blocks: (1.50 TB/1.36 TiB) Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: [sdf] 4096-byte physical blocks Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: [sdf] Write Protect is off Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: [sdf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Apr 15 22:34:18 mbpc kernel: sdf: unknown partition table Apr 15 22:34:18 mbpc kernel: sd 0:0:0:0: [sdf] Attached SCSI disk Apr 15 22:34:19 mbpc kernel: md: bind<sdf> Apr 15 22:34:19 mbpc kernel: bio: create slab <bio-1> at 1 Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sdf operational as raid disk 5 Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sde operational as raid disk 1 Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sdd operational as raid disk 0 Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sdc operational as raid disk 2 Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sdb operational as raid disk 3 Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sda operational as raid disk 4 Apr 15 22:34:19 mbpc kernel: md/raid:md0: allocated 6386kB Apr 15 22:34:19 mbpc kernel: md/raid:md0: raid level 6 active with 6 out of 6 devices, algorithm 2 Apr 15 22:34:19 mbpc kernel: created bitmap (8 pages) for device md0 Apr 15 22:34:19 mbpc kernel: md0: bitmap initialized from disk: read 1/1 pages, set 0 of 14905 bits Apr 15 22:34:19 mbpc kernel: md0: detected capacity change from 0 to 4000814727168 Apr 15 22:34:19 mbpc kernel: md0: unknown partition table And through the standard utilities: # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdf[5] sde[1] sdd[0] sdc[2] sdb[3] sda[4] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU] bitmap: 0/8 pages [0KB], 65536KB chunk unused devices: <none> # mdadm –detail /dev/raidmd0 /dev/raidmd0: Version : 1.2 Creation Time : Mon Mar 26 00:06:24 2012 Raid Level : raid6 Array Size : 3907045632 (3726.05 GiB 4000.81 GB) Used Dev Size : 976761408 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 6 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sun Apr 15 21:44:49 2012 State : active Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : mbpc:0 (local to host mbpc) UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e Events : 5171 Number Major Minor RaidDevice State 0 8 48 0 active sync /dev/sdd 1 8 64 1 active sync /dev/sde 2 8 32 2 active sync /dev/sdc 3 8 16 3 active sync /dev/sdb 4 8 0 4 active sync /dev/sda 5 8 80 5 active sync /dev/sdf # ls -al /dev/rsd* lrwxrwxrwx. 1 root root 3 Apr 15 22:33 /dev/rsda -> sdd lrwxrwxrwx. 1 root root 3 Apr 15 22:34 /dev/rsdb -> sde lrwxrwxrwx. 1 root root 3 Apr 15 22:33 /dev/rsdc -> sdc lrwxrwxrwx. 1 root root 3 Apr 15 22:33 /dev/rsdd -> sdb lrwxrwxrwx. 1 root root 3 Apr 15 22:33 /dev/rsde -> sda lrwxrwxrwx. 1 root root 3 Apr 15 22:34 /dev/rsdf -> sdf # ls -al /dev/raidmd0 lrwxrwxrwx. 1 root root 3 Apr 15 22:34 /dev/raidmd0 -> md0 # # df -m . Filesystem 1M-blocks Used Available Use% Mounted on /dev/mapper/MBPCStorage-MBPCBackup 2861348 2849239 12110 100% /mnt/MBPCBackupx # Again, another successful test.
9	Repeat TEST # 7 above however don't unplug and plug the SDD's back in but scan the bus instead. This will be a lighter test of #7 and #8 above.	The difference here rests in the recovery steps where we trigger the rescan of the SCSI hosts to detect the drives again: echo "0 0 0" >/sys/class/scsi_host/host0/scan echo "0 0 0" >/sys/class/scsi_host/host1/scan echo "0 0 0" >/sys/class/scsi_host/host2/scan echo "0 0 0" >/sys/class/scsi_host/host3/scan echo "0 0 0" >/sys/class/scsi_host/host4/scan echo "0 0 0" >/sys/class/scsi_host/host5/scan Following the above, the SDD's were again visible: # ls -al /dev/rsd* lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsda -> sde lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsdb -> sdc lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsdc -> sdb lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsdd -> sdd lrwxrwxrwx. 1 root root 3 Apr 15 22:49 /dev/rsde -> sdf lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsdf -> sda # # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdf[4] sde[0] sdd[3] sdc[1] sdb[2] sda[5] 3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU] bitmap: 0/8 pages [0KB], 65536KB chunk unused devices: <none> # And this concludes the last test we'll do.

TEST Script 1:

#!/usr/bin/awk -f

# ls -al /bin/gawk /usr/bin/awk /usr/bin/gawk /bin/awk
# lrwxrwxrwx. 1 root root      4 Sep 26 2011 /bin/awk -> gawk
# -rwxr-xr-x. 1 root root 382456 Nov 22 2010 /bin/gawk
# lrwxrwxrwx. 1 root root     14 Sep 26 2011 /usr/bin/awk -> ../../bin/gawk
# lrwxrwxrwx. 1 root root     14 Sep 26 2011 /usr/bin/gawk -> ../../bin/gawk

TNUM=0;

function random () {
    SOURCEC="strings /dev/urandom";
    SOURCEF="";

    pcnt=0;
    print "Creating the array…";
    print "            000—000—000—";
    cnt=0;
    LUMP="";
    LCN=0;
    while (1) {
        retv=((SOURCEC|getline)>0);
        if ( retv == 0 ) break;

        tcnt+=length($0);

        if ( LCN < 1000 ) {
            LUMP=LUMP""$0;
            LCN++;
        } else {
            WST[cnt++]=""LUMP;
            LUMP="";
            LCN=0;
        }

        pcnt++;
        if ( pcnt > 10000 ) {
            printf "BYTE Count: %18.0f %16.0f\r", tcnt, cnt;
            pcnt=0;
        }

        # Stop at 2MB sample.
        # if ( tcnt >= 2097152 )
        # Stop at 16MB sample.
        # if ( tcnt >= 16777216 )
        # Stop at 64MB sample.
        if ( tcnt >= 67108864 )
            break;
    }
    close(SOURCEC);

    # Max out filesystem.
    printf "Filling up the FS with files (fill.awk.NNNNNN.txt: "cnt") ….";
    MKD="mkdir ./fill."TNUM".awk.test";
    system(MKD);
    close(MKD);

    # Write 200,000 files of 64MB. Change this to suit your test.
    for ( j = 0; j < 200000; j++ ) {
        FNM="./fill."TNUM".awk.test/fill.awk."j".txt";
        printf "File Saved: "FNM"\r";
        for ( i = 0; i < cnt; i++ ) {
            print WST[i] >> FNM;
        }
        close(FNM);
        i=0;
    }

}

BEGIN {
    # for (i=1; i < ARGC; ++i) {
    #    printf "ARGV [%d]=%s\n”, i, ARGV [i]
    # }

    random();
}

CONCLUSION

The availability tests were probably the most important here followed by the performance tests. The fact that the software MDADM RAID6 array could survive through fairly brutal failures as per above tests is definitely a surprise to me following rather low expectations. The opinion that I was given prior to setting out on this RAID6 adventure was that I'll either end up with excruciatingly long rebuilds or frequent failures that would render this setup ineffective for what I needed. The above show that performance is there along side much better then expected reliability with all of the above reliability tests passing. This leads us to conclude that the robustness of the MDADM RAID6 array is very solid at least in the tests done here and on the hardware specified above.

Pages: 1 2 3 4 5

This entry was posted on Monday, April 2nd, 2012 at 12:44 am and is filed under Network, NIX Posts, Perl, Web. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

11 Responses to “HTPC / Backup Home Server Solution using Linux”

Mathias on April 2nd, 2012 at 4:37 pm

Hi,

Great post. You don’t need to specify the parameters when creating the XFS file system, see http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E and http://www.spinics.net/lists/raid/msg38074.html . Of course, YMMV.

Did you run those benchmarks while the array was resyncing?
Tom on April 3rd, 2012 at 7:35 am

Hey Mathias,

Thanks for posting. Just added the testing numbers so feel free to have a look and judge yourself.

> logbsize and delaylog
I ran another test with logbsize=128k (couldn’t find anything for delaylog in my mkfs.xfs man page so I’m not sure if that’ll do anything). Little to no difference in this case on first glance. Watch out for the results at some point for a closer look.

One consideration here is that eventually I would grow the LVM and XFS to fill up to 4TB. I’ll be doing this soon Potentially in the future, I may try to grow this array as well to something well over 8TB (Yet to see how to do that). I’m not sure if XFS would auto-adjust in those cases for optimal values for those capacities and the link didn’t touch on that topic.

All in all, I can still run tests on this thing recreating the FS if I need to so feel free to suggest numbers you’d be interested to see. I might leave this topic open for a week or two to see if I can think of anything else or if I’m missing anything. For my setup, having anything > 125MB/s is a bonus as the network is only 1GB/s with that theoretical max.

Cheers!
TK
RAID or not? - Page 2 on August 30th, 2012 at 1:13 pm

[…] could be done safely enough like this guy did and with RAID6 as well with SSD type R/W’s no less. Your size would be limited to the size of the […]
David D on November 1st, 2012 at 11:23 am

Thank you for posting this blog. I was getting desparate. I could not figure out why I could not stop the RAID1 device. Even from Ubuntu Rescue Remix. The LVM group was being assembled from the failed raid. I removed the volume group and was finally able to gain exclusive access to the array to stop it, put in the new disk and rebuild the array.

Nice job.
Best,
Dave.
Anonymous on December 1st, 2012 at 12:19 am

[…] we'll use for this is the APCUPSD daemon available in RPM format. We've set one up for our HTPCB server for a home redundancy / backup solution to protect against power surges and bridge the […]
Anonymous on December 1st, 2012 at 12:23 am

[…] every time while transferring my files. At the point, I not only lost connectivity with the HTPC+B but also my web access most of the time. Here are the culprits and here's how we went […]
Anonymous on December 1st, 2012 at 12:33 am

[…] removed the cable and the adapter and only used a 2 foot cable to my HTPC+B system I've just configured. Voila! Problem solved. Ultimately, it's […]
Anonymous on December 1st, 2012 at 12:35 am

[…] them from system to system to avoid choppy video / sound and also to accommodate the needs of our HTPC+B solution through file […]
Anonymous on December 1st, 2012 at 12:43 am

[…] Linux Networking: Persistent naming rules based on MAC for eth0 and wlan0 Linux: HTPC / Home Backup: MDADM, RAID6, LVM, XFS, CIFS and NFS […]
Anonymous on July 21st, 2013 at 8:51 pm

[…] at this point and 4:15 minutes have passed). While this was going on, we are referencing our HTPC page for […]
Simple Home Backup Solution | Thoughts and Scribbles | MicroDevSys.com on October 13th, 2014 at 11:04 am

[…] HTPC, Backup & Storage […]

You must be logged in to post a comment.

Thoughts and Scribbles | MicroDevSys.com

HTPC / Backup Home Server Solution using Linux

11 Responses to “HTPC / Backup Home Server Solution using Linux”

Leave a Reply

Meta

Recent Entries

Categories

Blogroll

Databases

Java

Languages

Linux

Miscellaneous

Perl

Scripting

Web


	Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved. This work is licensed under a Creative Commons Attribution 3.0 Unported License Privacy / Use / Terms / Disclaimer Policy.

Thoughts and Scribbles | MicroDevSys.com

HTPC / Backup Home Server Solution using Linux

Share this:

11 Responses to “HTPC / Backup Home Server Solution using Linux”

Leave a Reply

Meta

Recent Entries

Categories

Blogroll

Databases

Java

Languages

Linux

Miscellaneous

Perl

Scripting

Web