TEST #
|
TEST CONFIG
|
TEST RESULTS
|
1
|
Fill the storage up to it's maximum possible. For this, we will attempt to write a function to auto fill the RAID6 array at it's maximum speed. At the end of this table is the script to do just that.
An alternate way to do the same using a copy of a file (Be sure to copy the first sample to the storage array first as the script assumes the file resides on the storage):
for (( fcp = 0; fcp < 5000; fcp++ )); do cp -i fill.dat fill.$fcp.dat; done
NOTE: Either way would work. The second method would not be hit by a read penalty since the file would be cached for multiple copies anyway.
|
PASS: RAID6 array filled to capacity. No errors.
# df -m .
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
1907288 1907288 1 100% /mnt/MBPCBackupx
#
# xfs_db -c frag -r /dev/MBPCStorage/MBPCBackup
actual 59209, ideal 29835, fragmentation factor 49.61%
#
smartctl quick check:
# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
/dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdc: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sde: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdf: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdg: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0
#
While filling the FS using the AWK script, the task would complete to 100% capacity and when the message not enough disk space was encountered however, would revert to 85% usage in an instant. This was puzzling however this may be fixed by adding this option to the XFS mount command:
# mount -o inode64
This would allow XFS to save the inode table beyond the first TB of data space. (Have not tested this option as filling the FS using the KSH script worked without that descrepancy.)
|
2
|
Delete all test files created in step 1.
|
PASS: Delete was instantenous.
|
3
|
Fill the array up with a copy of a binary file. (ie Linux ISO's could be a good choice.) Use the second script from step 1.
for (( fcp = 0; fcp < 10000; fcp++ )); do cp -i test.dat $fcp.dat; done
|
PASS: Result showed no errors or issues in filling the array:
# df -m .
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
1907288 1907288 1 100% /mnt/MBPCBackupx
You have new mail in /var/spool/mail/root
#
|
2
|
Grow the LVM and XFS filesystem on top of it to 3TB from the 2TB we've set above [or] on the first page.
|
PASS:
Grow the LVM by 50% of the free space to 3TB (I used a two command combination one liner):
# lvm lvextend -L+$(lvm vgs –units S|awk '{ if ( $1 ~ /MBPCStorage/ ) print $7 / (100 / 50); }')S /dev/MBPCStorage/MBPCBackup
Extending logical volume MBPCBackup to 2.73 TiB
Logical volume MBPCBackup successfully resized
#
# lvm lvdisplay /dev/MBPCStorage/MBPCBackup –units G
— Logical volume —
LV Name /dev/MBPCStorage/MBPCBackup
VG Name MBPCStorage
LV UUID k6dLRW-BUht-tm0n-9GU8-e6ma-M0qW-wJ4TiD
LV Write Access read/write
LV Status available
# open 1
LV Size 3000.41 GB
Current LE 715353
Segments 1
Allocation inherit
Read ahead sectors auto
– currently set to 32768
Block device 253:6
#
I then used this command to extend the XFS filesystem (XFS FS is still at 100% usage at this point). An XFS filesystem is extended while mounted:
# xfs_growfs /mnt/MBPCBackupx/
meta-data=/dev/mapper/MBPCStorage-MBPCBackup isize=256 agcount=64, agsize=7629408 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=488282112, imaxpct=5
= sunit=16 swidth=64 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=16384, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
data blocks changed from 488282112 to 732521472
# cd MBPCBackupx/
# df -m .
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
2861348 1907290 954059 67% /mnt/MBPCBackupx
# xfs_growfs -n /mnt/MBPCBackupx/
meta-data=/dev/mapper/MBPCStorage-MBPCBackup isize=256 agcount=97, agsize=7629408 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=732521472, imaxpct=5
= sunit=16 swidth=64 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=16384, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
#
|
3
|
Refill the array using the method in TEST 1. (We will reuse the second script in this case.)
|
PASS
# cd /mnt/MBPCBackupx
Then start from the last file the system could write, 4838 (File # 4839):
# for (( fcp = 4839; fcp < 10000; fcp++ )); do cp -i test.dat $fcp.dat; done
# df -m .
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
2861348 2861348 1 100% /mnt/MBPCBackupx
#
|
4
|
Remove two disks and reinsert them in swapped order (w/o Writing or reading data or resyncing etc) while array is quiet. Here, we want to see the rebuild time and statistics when used in combination with –bitmap=internal option.
|
PASS
For this test we'll select two disks arbitrarily but first we get their serial numbers so we can identify which disks to pull from the outside (Use hdparm -i /dev/rsda for example to get that):
/dev/rsda (9VX0X9TA) – /dev/sdc
/dev/rsdd (9VX0WJKA) – /dev/sdf
First we'll spin the platters down and effectively shut the disks down:
echo 1 > /sys/block/sdc/device/delete
echo 1 > /sys/block/sdf/device/delete
Allow some appropriate amount of time to let the disks spin down. Check using below command to see that two disks are in fact now unavailable in the array:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdb[4] sdc[0](F) sde[1] sdd[2] sdf[3](F) sda[5]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [_UU_UU]
bitmap: 0/8 pages [0KB], 65536KB chunk
unused devices: <none>
#
The array should still be functioning fine. Delete one of the files that were copied and copy another in it's place as a quick test while the disks are gone:
# df -m .
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
2861348 2861348 1 100% /mnt/MBPCBackupx
#
# ls -al 99[8-9]*
-rwxr–r-x. 1 root root 413368320 Apr 9 00:50 998.dat
-rwxr–r-x. 1 root root 413368320 Apr 10 21:46 999.dat
#
Remove the drives that were turned off and replug them back in in each other's slots then check the array:
# mdadm –detail /dev/raidmd0
/dev/raidmd0:
Version : 1.2
Creation Time : Mon Mar 26 00:06:24 2012
Raid Level : raid6
Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Apr 10 21:54:39 2012
State : active, degraded
Active Devices : 4
Working Devices : 4
Failed Devices : 2
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : mbpc:0 (local to host mbpc)
UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
Events : 3991
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 64 1 active sync /dev/sde
2 8 48 2 active sync /dev/sdd
3 0 0 3 removed
4 8 16 4 active sync /dev/sdb
5 8 0 5 active sync /dev/sda
0 8 32 – faulty spare
3 8 80 – faulty spare
# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i,j}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
ls: cannot access /dev/sdc: No such file or directory
ls: cannot access /dev/sdf: No such file or directory
ls: cannot access /dev/sdj: No such file or directory
/dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sde: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdg: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0
/dev/sdh: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdi: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
#
From the above we can see which ones are marked as faulty (F) and how many are active in the array (6/4). Now the drives are back but now as sdh and sdi. However, our UDEV rules are still creating the original links to point to the drives with the above serial numbers irrespective of the names (sdi, sdh, sdf, sdc etc) given by the system:
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 10 21:52 /dev/rsda -> sdh
lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdb -> sde
lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdc -> sdd
lrwxrwxrwx. 1 root root 3 Apr 10 21:54 /dev/rsdd -> sdi
lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsde -> sdb
lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdf -> sda
#
Now that the drives are plugged in again, we need to tell the system to add them to the array again:
# mdadm –add /dev/raidmd0 /dev/rsda
mdadm: /dev/rsda reports being an active member for /dev/raidmd0, but a –re-add fails.
mdadm: not performing –add as that would convert /dev/rsda in to a spare.
mdadm: To make this a spare, use "mdadm –zero-superblock /dev/rsda" first.
# mdadm –add /dev/raidmd0 /dev/rsdd
mdadm: /dev/rsdd reports being an active member for /dev/raidmd0, but a –re-add fails.
mdadm: not performing –add as that would convert /dev/rsdd in to a spare.
mdadm: To make this a spare, use "mdadm –zero-superblock /dev/rsdd" first.
#
Can't use the following either as the devices are no longer listed under /dev/:
# mdadm –manage /dev/raidmd0 –fail /dev/sdf
mdadm: cannot find /dev/sdf: No such file or directory
# mdadm –manage /dev/raidmd0 –fail /dev/sdc
mdadm: cannot find /dev/sdc: No such file or directory
# mdadm –manage /dev/raidmd0 –remove /dev/sdf
mdadm: cannot find /dev/sdf: No such file or directory
# mdadm –manage /dev/raidmd0 –remove /dev/sdc
mdadm: cannot find /dev/sdc: No such file or directory
#
Instead, we will use the same device names as above but without /dev/ in the name:
# mdadm –manage /dev/raidmd0 –remove sdf
mdadm: hot removed sdf from /dev/raidmd0
#
# mdadm –manage /dev/raidmd0 –remove sdc
mdadm: hot removed sdc from /dev/raidmd0
#
The result of the operation while checking that the data integrity was still fine:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdb[4] sde[1] sdd[2] sda[5]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [_UU_UU]
bitmap: 3/8 pages [12KB], 65536KB chunk
unused devices: <none>
#
# mdadm –detail /dev/raidmd0
/dev/raidmd0:
Version : 1.2
Creation Time : Mon Mar 26 00:06:24 2012
Raid Level : raid6
Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
Raid Devices : 6
Total Devices : 4
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Apr 10 22:33:40 2012
State : active, degraded
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : mbpc:0 (local to host mbpc)
UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
Events : 4317
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 64 1 active sync /dev/sde
2 8 48 2 active sync /dev/sdd
3 0 0 3 removed
4 8 16 4 active sync /dev/sdb
5 8 0 5 active sync /dev/sda
#
Now it appears we can readd the replugged devices. The internal bitmap is now making the recovery nearly instantaneous all the while drive is 100% full with 3TB of data:
# mdadm –add /dev/raidmd0 /dev/rsda
mdadm: re-added /dev/rsda
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdh[0] sdb[4] sde[1] sdd[2] sda[5]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [_UU_UU]
[=======>………….] recovery = 35.3% (344989104/976761408) finish=183.8min speed=57259K/sec
bitmap: 3/8 pages [12KB], 65536KB chunk
unused devices: <none>
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdh[0] sdb[4] sde[1] sdd[2] sda[5]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUU_UU]
bitmap: 3/8 pages [12KB], 65536KB chunk
unused devices: <none>
# date
Tue Apr 10 22:38:15 EDT 2012
#
To give a good perspective how quick that took we look at the logs:
/var/log/messages
Apr 10 22:37:57 mbpc kernel: md: bind<sdh>
Apr 10 22:37:57 mbpc kernel: md: recovery of RAID array md0
Apr 10 22:37:57 mbpc kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Apr 10 22:37:57 mbpc kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Apr 10 22:37:57 mbpc kernel: md: using 128k window, over a total of 976761408k.
Apr 10 22:38:06 mbpc kernel: md: md0: recovery done.
So about 9 seconds. Now we readd the second disk but this time we time the rebuild differently to show how quick it actually takes:
# date; mdadm –add /dev/raidmd0 /dev/rsdd; sleep 5; cat /proc/mdstat; sleep 5; cat /proc/mdstat
Tue Apr 10 22:43:15 EDT 2012
mdadm: re-added /dev/rsdd
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdi[3] sdh[0] sdb[4] sde[1] sdd[2] sda[5]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUU_UU]
[=====>……………] recovery = 25.0% (244315840/976761408) finish=234.9min speed=51955K/sec
bitmap: 3/8 pages [12KB], 65536KB chunk
unused devices: <none>
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdi[3] sdh[0] sdb[4] sde[1] sdd[2] sda[5]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUU_UU]
[=======>………….] recovery = 36.0% (351886304/976761408) finish=215.9min speed=48227K/sec
bitmap: 3/8 pages [12KB], 65536KB chunk
unused devices: <none>
#
Again, checking the logs we see how long the rebuild took:
/var/log/messages
Apr 10 22:43:15 mbpc kernel: md: bind<sdi>
Apr 10 22:43:15 mbpc kernel: md: recovery of RAID array md0
Apr 10 22:43:15 mbpc kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Apr 10 22:43:15 mbpc kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Apr 10 22:43:15 mbpc kernel: md: using 128k window, over a total of 976761408k.
Apr 10 22:43:27 mbpc kernel: md: md0: recovery done.
Let's check if it's all done and if everything checks out fine:
# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i,j}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
ls: cannot access /dev/sdc: No such file or directory
ls: cannot access /dev/sdf: No such file or directory
ls: cannot access /dev/sdj: No such file or directory
/dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sde: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdg: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0
/dev/sdh: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdi: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdi[3] sdh[0] sdb[4] sde[1] sdd[2] sda[5]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
bitmap: 1/8 pages [4KB], 65536KB chunk
unused devices: <none>
#
# mdadm –detail /dev/raidmd0
/dev/raidmd0:
Version : 1.2
Creation Time : Mon Mar 26 00:06:24 2012
Raid Level : raid6
Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Apr 10 22:53:02 2012
State : active
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : mbpc:0 (local to host mbpc)
UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
Events : 4414
Number Major Minor RaidDevice State
0 8 112 0 active sync /dev/sdh
1 8 64 1 active sync /dev/sde
2 8 48 2 active sync /dev/sdd
3 8 128 3 active sync /dev/sdi
4 8 16 4 active sync /dev/sdb
5 8 0 5 active sync /dev/sda
#
And we are back to our normal operating mode making this a successful test.
|
5
|
Do the following in the order listed:
-
Start writing large GB in size files to the storage continously.
-
Begin watching a video or open a large file from another user. (Use switch user)
-
Turn off two disk drives / spin down the platters. (See above or the first page for doing that)
-
Remove the stopped disks from the chassis.
-
Turn off the power at the power supply.
Upon startup / reboot:
-
Bring back the array in degraded mode.
-
Start to watch a video again or accessing the large file.
-
Begin to write data to the RAID6 storage again. (While in degraded mode with two disks missing.)
-
Reinsert the two disks back swapping each disks location.
(ie if disk 1 was taken out of slot 1 and disk 2 was taken out of slot 2, insert disk 1 into slot 2 and disk 2 into slot 1)
|
PASS (FIRST TEST):
For this test, we will remove some files from the above tests to move us back from the 100% usage so we can write large files to the array continuously:
Start using the disks such as watching a video then writing files.
# cd /mnt/MBPCBackupx
# date +%s:%N; cp -p /home/mdadm.dat ./raid6.0.dat; cp raid6.0.dat raid6.1.dat; cp -p /home/mdadm.dat ./raid6.2.dat; date +%s:%N;
(And again if the above finishes too quickly) Get the disk serial numbers first so we know what to unplug:
# hdparm -i /dev/sdb
/dev/sdb:
Model=ST31000520AS, FwRev=CC32, SerialNo=9VX0WK55
# hdparm -i /dev/sdc
/dev/sdc:
Model=ST31000520AS, FwRev=CC32, SerialNo=9VX0X5KC
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 10 21:52 /dev/rsda -> sdh
lrwxrwxrwx. 1 root root 3 Apr 11 00:22 /dev/rsdb -> sdc
lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdc -> sdd
lrwxrwxrwx. 1 root root 3 Apr 10 21:54 /dev/rsdd -> sdi
lrwxrwxrwx. 1 root root 3 Apr 11 00:21 /dev/rsde -> sdb
lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdf -> sda
#
Then spin down the disks:
# echo 1 > /sys/block/sdc/device/delete
# echo 1 > /sys/block/sdb/device/delete
The result was:
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 10 21:52 /dev/rsda -> sdh
lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdc -> sdd
lrwxrwxrwx. 1 root root 3 Apr 10 21:54 /dev/rsdd -> sdi
lrwxrwxrwx. 1 root root 3 Apr 6 23:39 /dev/rsdf -> sda
#
All while reading and writing again:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 2168.50 3847.70 35.20 43.60 9091.60 16289.80 644.20 7.76 97.77 6.53 51.46
sdd 2173.70 3829.50 34.80 39.00 9116.40 16274.20 688.09 9.94 133.18 7.38 54.44
sdb 2169.80 3841.60 36.90 42.20 9110.40 16260.60 641.49 10.49 131.63 6.71 53.09
sdg 2.40 5.80 402.60 1.10 49859.20 27.20 247.15 2.45 6.07 2.39 96.54
dm-0 0.00 0.00 18.20 6.80 348.80 27.20 30.08 2.07 82.67 14.06 35.16
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 25.70 927.60 1638.40 55020.50 118.87 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 386.80 0.00 49510.40 0.00 256.00 1.75 4.52 2.48 95.80
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 25.70 927.40 1638.40 55014.10 118.88 591.18 655.58 0.63 60.52
sdh 2161.40 3841.10 42.50 48.90 9091.60 16160.60 552.56 8.67 94.48 5.72 52.26
sdi 2165.50 3824.70 44.20 47.10 9120.00 16155.00 553.67 9.44 103.01 5.41 49.38
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdi[3] sdh[0] sdd[2] sda[5]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U]
bitmap: 4/8 pages [16KB], 65536KB chunk
unused devices: <none>
#
Then unplug the disks and remove them. After that check that the video is still playing and that the file is still being written to the disk. Once verified, we shut off the power to the system from the switch at the power supply for the server/HTPC+B. Start up the server and bring back the array in the following manner and ascertain the situation:
# mount /mnt/MBPCBackupx/
mount: special device /dev/MBPCStorage/MBPCBackup does not exist
# ls -al /dev/raidmd0
ls: cannot access /dev/raidmd0: No such file or directory
# ls -al /dev/md*
brw-rw—-. 1 root disk 9, 0 Apr 11 00:44 /dev/md0
/dev/md:
total 4
drwxr-xr-x. 2 root root 60 Apr 11 00:44 .
drwxr-xr-x. 22 root root 4400 Apr 11 00:45 ..
-rw——-. 1 root root 54 Apr 11 00:44 md-device-map
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 11 00:44 /dev/rsda -> sdd
lrwxrwxrwx. 1 root root 3 Apr 11 00:44 /dev/rsdc -> sdc
lrwxrwxrwx. 1 root root 3 Apr 11 00:44 /dev/rsdd -> sdb
lrwxrwxrwx. 1 root root 3 Apr 11 00:44 /dev/rsdf -> sda
# mdadm –detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Mon Mar 26 00:06:24 2012
Raid Level : raid6
Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
Raid Devices : 6
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Wed Apr 11 00:38:51 2012
State : active, degraded, Not Started
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : mbpc:0 (local to host mbpc)
UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
Events : 4823
Number Major Minor RaidDevice State
0 8 48 0 active sync /dev/sdd
1 0 0 1 removed
2 8 32 2 active sync /dev/sdc
3 8 16 3 active sync /dev/sdb
4 0 0 4 removed
5 8 0 5 active sync /dev/sda
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdd[0] sdc[2] sdb[3] sda[5]
4395422240 blocks super 1.2
unused devices: <none>
#
# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
ls: cannot access /dev/sdf: No such file or directory
ls: cannot access /dev/sdg: No such file or directory
ls: cannot access /dev/sdh: No such file or directory
ls: cannot access /dev/sdi: No such file or directory
/dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdc: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sde: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0
#
So at least we can see things and that our array still has 4 disks and non list as having bad blocks or any other problems that would kill this array. So let's try to reassemble it again:
# mdadm –assemble –scan
#
No result from trying to start the array. Hmm. So let's try with the verbose option:
# mdadm –assemble –scan -v
mdadm: looking for devices for /dev/md/0
mdadm: cannot open device /dev/dm-5: Device or resource busy
mdadm: cannot open device /dev/dm-4: Device or resource busy
mdadm: cannot open device /dev/dm-3: Device or resource busy
mdadm: no RAID superblock on /dev/dm-2
mdadm: cannot open device /dev/dm-1: Device or resource busy
mdadm: cannot open device /dev/dm-0: Device or resource busy
mdadm: cannot open device /dev/sde3: Device or resource busy
mdadm: cannot open device /dev/sde2: Device or resource busy
mdadm: cannot open device /dev/sde1: Device or resource busy
mdadm: cannot open device /dev/sde: Device or resource busy
mdadm: cannot open device /dev/sdd: Device or resource busy
mdadm: cannot open device /dev/sdc: Device or resource busy
mdadm: cannot open device /dev/sdb: Device or resource busy
mdadm: cannot open device /dev/sda: Device or resource busy
So let's examine this array further:
# mdadm –examine –scan
ARRAY /dev/md/0 metadata=1.2 UUID=2f36ac48:5e3e4c54:72177c53:bea3e41e name=mbpc:0
# cat /etc/mdadm.conf
ARRAY /dev/md/0 metadata=1.2 name=mbpc:0 UUID=2f36ac48:5e3e4c54:72177c53:bea3e41e
#
# cat /sys/block/md0/md/array_state
inactive
#
Try to stop it:
# mdadm -S /dev/md0
mdadm: stopped /dev/md0
# mdadm –detail /dev/md0
mdadm: cannot open /dev/md0: No such file or directory
#
Now we're getting somewhere and this time we're getting more useful data:
# mdadm –assemble –scan
mdadm: /dev/md/0 assembled from 4 drives – not enough to start the array while not clean – consider –force.
#
More strangeness and now all the devices in what's left are marked as spares (Now I start to worry):
# mdadm –assemble –scan –force
#
# ls -al /dev/md
md/ md0
# ls -al /dev/md0
brw-rw—-. 1 root disk 9, 0 Apr 11 01:10 /dev/md0
# mdadm –detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdd[0](S) sda[5](S) sdb[3](S) sdc[2](S)
4395422240 blocks super 1.2
unused devices: <none>
#
Ok. So let's try to be more explicit and forceful here. After all, we want our data badly folks:
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 11 01:25 /dev/rsda -> sdd
lrwxrwxrwx. 1 root root 3 Apr 11 01:25 /dev/rsdc -> sdc
lrwxrwxrwx. 1 root root 3 Apr 11 01:25 /dev/rsdd -> sdb
lrwxrwxrwx. 1 root root 3 Apr 11 01:25 /dev/rsdf -> sda
#
# mdadm –detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
# mdadm -S /dev/md0
mdadm: stopped /dev/md0
# mdadm –detail /dev/md0
mdadm: cannot open /dev/md0: No such file or directory
# mdadm -v –assemble –scan –force /dev/md0 /dev/rsda/ dev/rsdc /dev/rsdd /dev/rsdf
mdadm: looking for devices for /dev/md0
mdadm: cannot open device /dev/dm-5: Device or resource busy
mdadm: cannot open device /dev/dm-4: Device or resource busy
mdadm: cannot open device /dev/dm-3: Device or resource busy
mdadm: no RAID superblock on /dev/dm-2
mdadm: cannot open device /dev/dm-1: Device or resource busy
mdadm: cannot open device /dev/dm-0: Device or resource busy
mdadm: cannot open device /dev/sde3: Device or resource busy
mdadm: cannot open device /dev/sde2: Device or resource busy
mdadm: cannot open device /dev/sde1: Device or resource busy
mdadm: cannot open device /dev/sde: Device or resource busy
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdb is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sda is identified as a member of /dev/md0, slot 5.
mdadm: Marking array /dev/md0 as 'clean'
mdadm: no uptodate device for slot 1 of /dev/md0
mdadm: added /dev/sdc to /dev/md0 as 2
mdadm: added /dev/sdb to /dev/md0 as 3
mdadm: no uptodate device for slot 4 of /dev/md0
mdadm: added /dev/sda to /dev/md0 as 5
mdadm: added /dev/sdd to /dev/md0 as 0
mdadm: /dev/md0 has been started with 4 drives (out of 6).
mdadm: /dev/rsda/ not identified in config file.
mdadm: dev/rsdc not identified in config file.
mdadm: /dev/rsdd not identified in config file.
mdadm: /dev/rsdf not identified in config file.
#
Now let's check the array:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdd[0] sda[5] sdb[3] sdc[2]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U]
bitmap: 5/8 pages [20KB], 65536KB chunk
unused devices: <none>
#
# mdadm –detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Mon Mar 26 00:06:24 2012
Raid Level : raid6
Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
Raid Devices : 6
Total Devices : 4
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Wed Apr 11 00:38:51 2012
State : active, degraded
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : mbpc:0 (local to host mbpc)
UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
Events : 4823
Number Major Minor RaidDevice State
0 8 48 0 active sync /dev/sdd
1 0 0 1 removed
2 8 32 2 active sync /dev/sdc
3 8 16 3 active sync /dev/sdb
4 0 0 4 removed
5 8 0 5 active sync /dev/sda
#
# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
ls: cannot access /dev/sdf: No such file or directory
ls: cannot access /dev/sdg: No such file or directory
ls: cannot access /dev/sdh: No such file or directory
ls: cannot access /dev/sdi: No such file or directory
/dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdc: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sde: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0
#
Now we are back in the situation we were in reliability test step 5 above. Before we push the drives we failed back again, let's start writing again and using some files like watching a home video again:
# mount /mnt/MBPCBackupx/
mount: special device /dev/MBPCStorage/MBPCBackup does not exist
# lvm vgchange -a y
3 logical volume(s) in volume group "VGEntertain" now active
3 logical volume(s) in volume group "mbpcvg" now active
1 logical volume(s) in volume group "MBPCStorage" now active
# mount /mnt/MBPCBackupx/
# cd /mnt/MBPCBackupx/
# df -m .
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
2861348 2849239 12110 100% /mnt/MBPCBackupx
# ls -al |tail
-rwxr–r-x. 1 root root 413368320 Apr 10 21:46 999.dat
-rwxr–r-x. 1 root root 413368320 Apr 9 00:14 99.dat
-rwxr–r-x. 1 root root 413368320 Apr 9 00:10 9.dat
-rwxr–r–. 1 root root 4720553072 May 4 2011 raid6.0.dat
-rwxr–r–. 1 root root 4720553072 Apr 10 23:57 raid6.1.dat
-rwxr–r–. 1 root root 4720553072 May 4 2011 raid6.2.dat
-rwx——. 1 root root 3757387776 Apr 11 00:38 raid6.3.dat
-rwxrwSrwx. 1 root root 413368320 Jun 21 2008 test.dat
-rw-r–r–. 1 root root 9984418 Apr 9 23:11 test.log
#
Again, file writing and reading was fine even with the two missing disks above. Now it's time to replug the drives we took out and have the array reassemble. From here, the steps to do that would be identical to TEST 4 above this time starting further down as the devices would be already removed:
# mdadm –add /dev/raidmd0 /dev/rsdb;
# cat /proc/mdstat;
Wed Apr 11 02:16:40 EDT 2012
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdg[1] sdd[0] sda[5] sdb[3] sdc[2]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U]
[>………………..] recovery = 0.0% (0/976761408) finish=1017459.8min speed=0K/sec
bitmap: 5/8 pages [20KB], 65536KB chunk
unused devices: <none>
#
This time the array did not recover very quickly even with an internal bitmap (after about 2 minutes) but still relatively quicker then the standard few hours:
# cat /proc/mdstat;
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdg[1] sdd[0] sda[5] sdb[3] sdc[2]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U]
[=>……………….] recovery = 6.2% (61041460/976761408) finish=1529.0min speed=9980K/sec
bitmap: 5/8 pages [20KB], 65536KB chunk
unused devices: <none>
#
While this is going on, let's put back the second disk:
# mdadm –add /dev/raidmd0 /dev/rsde
mdadm: re-added /dev/rsde
# cat /proc/mdstat;
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdf[4](S) sdg[1] sdd[0] sda[5] sdb[3] sdc[2]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U]
[==>………………] recovery = 11.8% (115617408/976761408) finish=7203.2min speed=1992K/sec
bitmap: 5/8 pages [20KB], 65536KB chunk
unused devices: <none>
#
# mdadm –detail /dev/raidmd0
/dev/raidmd0:
Version : 1.2
Creation Time : Mon Mar 26 00:06:24 2012
Raid Level : raid6
Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Wed Apr 11 02:23:29 2012
State : active, degraded, recovering
Active Devices : 4
Working Devices : 6
Failed Devices : 0
Spare Devices : 2
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 12% complete
Name : mbpc:0 (local to host mbpc)
UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
Events : 4882
Number Major Minor RaidDevice State
0 8 48 0 active sync /dev/sdd
1 8 96 1 spare rebuilding /dev/sdg
2 8 32 2 active sync /dev/sdc
3 8 16 3 active sync /dev/sdb
4 0 0 4 removed
5 8 0 5 active sync /dev/sda
4 8 80 – spare /dev/sdf
#
Once recovered, we'll check the parameters of the disks to ensure settings are correct and adjust otherwise using steps from PAGE 1 of this post:
# for dn in $(ls /dev/rsd*); do WCH=$(/sbin/hdparm -I $dn|grep -i "Write cache"); echo $dn": $WCH"; done
/dev/rsda: Write cache
/dev/rsdb: * Write cache
/dev/rsdc: Write cache
/dev/rsdd: Write cache
/dev/rsde: * Write cache
/dev/rsdf: Write cache
#
Disable the cache on the two readded disks (Again only if using XFS) as they would have it enabled by default when plugged back in:
# hdparm -W 0 /dev/rsdb;
/dev/rsdb:
setting drive write-caching to 0 (off)
write-caching = 0 (off)
# hdparm -W 0 /dev/rsde;
/dev/rsde:
setting drive write-caching to 0 (off)
write-caching = 0 (off)
#
|
6
|
This is a repeat of test #5 above I've done earlier, in fact, prior to doing test 5 actually. This was also successful. The setup is as follows:
Do the following all in sequence and in quick succession of each other:
-
Start writing large GB files repeatedly to the mount on which the RAID6 attay is on.
-
Unplug SATA cable from one drive.* # ( simulate )*
-
Unplug SATA cable from a second drive.* # ( simulate )*
-
Begin accessing one of the files on the RAID6 array.
-
Shut the Power Supply down using the on/off switch on the Power Supply.
-
For this test, 2TB RAID6 drive is ~< 2% full.
* Simulate a drive beign physically pulled by essentially cutting power to the device like this: echo 1 > /sys/block/<DEVICE>/device/delete
|
PASS (SECOND TEST):
First, let's check that the array is fine and no disks show errors or potential problems that could grow into problems for us:
# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
/dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdc: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sde: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdf: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdg: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0
#
Again check with
# mdadm –detail /dev/raidmd0
/dev/raidmd0:
Version : 1.2
Creation Time : Mon Mar 26 00:06:24 2012
Raid Level : raid6
Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Mar 27 00:30:49 2012
State : active
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : mbpc:0 (local to host mbpc)
UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
Events : 3063
Number Major Minor RaidDevice State
0 8 32 0 active sync /dev/sdc
1 8 64 1 active sync /dev/sde
2 8 48 2 active sync /dev/sdd
3 8 80 3 active sync /dev/sdf
4 8 16 4 active sync /dev/sdb
5 8 0 5 active sync /dev/sda
to ensure everything is within an expected state and the array shows active with all disks participating in the RAID6 array showing as active sync.
Next spin down / cut power to two of the disks:
echo 1 > /sys/block/sdc/device/delete
echo 1 > /sys/block/sda/device/delete
The result of the operation should show this:
# mdadm –detail /dev/raidmd0
.
.
State : active, degraded
.
.
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 64 1 active sync /dev/sde
2 8 48 2 active sync /dev/sdd
3 8 80 3 active sync /dev/sdf
4 8 16 4 active sync /dev/sdb
5 0 0 5 removed
Now we pull the plug while our write job to the disk is in progress writing 4.7GB files. After the reboot, the system came back up but did not mount:
# mdadm –detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Mon Mar 26 00:06:24 2012
Raid Level : raid6
Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
Raid Devices : 6
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Tue Mar 27 00:36:15 2012
State : active, degraded, Not Started
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : mbpc:0 (local to host mbpc)
UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
Events : 3132
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 64 1 active sync /dev/sde
2 8 48 2 active sync /dev/sdd
3 8 80 3 active sync /dev/sdf
4 8 16 4 active sync /dev/sdb
5 0 0 5 removed
#
So let's try to reassemble the array:
# mdadm –assemble –scan
mdadm: /dev/md/0 is already in use.
#
So we stop the array as before:
# mdadm –stop /dev/md0
mdadm: stopped /dev/md0
#
Now we're getting somewhere:
# mdadm –assemble –scan
mdadm: /dev/md/0 assembled from 4 drives – not enough to start the array while not clean – consider –force.
#
Looks like we may have to add the disks back first as per the first suggestion before we try to force (Relying on my experience here, force option to anything can sometimes have unintended consequences so I tend to use it as a method of last resort). To do this, we look at our UDEV rules to tell us which disks we are to add:
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsda -> sdc
lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsdb -> sde
lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsdc -> sdd
lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsdd -> sdf
lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsde -> sdb
lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsdf -> sda
#
But then I realize, I can't tell which disks I pulled (I can't tell from the outside either because all 6 are still cabled the same, just have their power cut / platters spun down from the OS where they don't appear now either as a result.):
No problem. Let's try to start the array in degraded mode which is the way I would choose anyway as we're running a test here folks:
# mdadm –assemble –scan –force
mdadm: Marking array /dev/md/0 as 'clean'
mdadm: /dev/md/0 has been started with 4 drives (out of 6).
#
And check again to make sure we are clean:
# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
/dev/sda: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdb: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdc: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdd: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sde: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdf: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
/dev/sdg: 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always – 0
#
Now before we reassemble, I want to see if we can mount our array:
# mount /mnt/MBPCBackupx/
mount: special device /dev/MBPCStorage/MBPCBackup does not exist
# lvm vgchange -a y
3 logical volume(s) in volume group "VGEntertain" now active
3 logical volume(s) in volume group "mbpcvg" now active
1 logical volume(s) in volume group "MBPCStorage" now active
#
And try mounting now. Success. Now we can see the files with the original dates applied to the completed ones:
# ls -al
total 17499768
drwxr-xr-x. 2 root root 108 Mar 27 00:34 .
drwxr-xr-x. 6 root root 4096 Mar 24 23:53 ..
-rwxr–r–. 1 root root 4720553072 May 4 2011 testX0.m2ts
-rwxr–r–. 1 root root 4720553072 May 4 2011 testX1.m2ts
-rwx——. 1 root root 3219345408 Mar 27 00:35 testX2.m2ts
-rwxr–r–. 1 root root 4720553072 May 4 2011 testXseq.m2ts
#
# xfs_db -c frag -r /dev/MBPCStorage/MBPCBackup
actual 4, ideal 4, fragmentation factor 0.00%
#
# xfs_check /dev/MBPCStorage/MBPCBackup
#
I'm going to try to access a file off this disk by switching users to my videouser non privileged user account. I want to see that the file is read fine and still workable before readding any disks. This was a success as the video played fine. Now to readd the disks:
# mdadm /dev/md0 -a /dev/rsda
mdadm: re-added /dev/rsda
#
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdc[0] sde[1] sdb[4] sdf[3] sdd[2]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [_UUUU_]
[=>……………….] recovery = 6.3% (61590408/976761408) finish=334.2min speed=45632K/sec
bitmap: 2/8 pages [8KB], 65536KB chunk
unused devices: <none>
#
Now because of our internal bitmap, the array started to rebuild very very quickly, which is good. Now to add another drive while this is going on…too late…the resync was done before I could even blink:
# mdadm –detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Mon Mar 26 00:06:24 2012
Raid Level : raid6
Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
Raid Devices : 6
Total Devices : 5
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Mar 27 01:16:43 2012
State : active, degraded
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : mbpc:0 (local to host mbpc)
UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
Events : 3186
Number Major Minor RaidDevice State
0 8 32 0 active sync /dev/sdc
1 8 64 1 active sync /dev/sde
2 8 48 2 active sync /dev/sdd
3 8 80 3 active sync /dev/sdf
4 8 16 4 active sync /dev/sdb
5 0 0 5 removed
#
# mdadm /dev/md0 -a /dev/rsdf
mdadm: re-added /dev/rsdf
#
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sda[5] sdc[0] sde[1] sdb[4] sdf[3] sdd[2]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUUUU_]
[=>……………….] recovery = 5.5% (54287616/976761408) finish=324.2min speed=47413K/sec
bitmap: 2/8 pages [8KB], 65536KB chunk
unused devices: <none>
#
And the rebuilding jumps by leaps and bounds and is done in seconds:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sda[5] sdc[0] sde[1] sdb[4] sdf[3] sdd[2]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUUUU_]
[====>…………….] recovery = 24.9% (244180084/976761408) finish=277.1min speed=44047K/sec
bitmap: 2/8 pages [8KB], 65536KB chunk
unused devices: <none>
#
And resync of the second disk is done before you know it:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sda[5] sdc[0] sde[1] sdb[4] sdf[3] sdd[2]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
bitmap: 0/8 pages [0KB], 65536KB chunk
unused devices: <none>
#
And now our array is back up. NOTE: We've added the disks all the while rebuilding the array when mounted under /mnt/MBPCBackupx:
# mdadm –detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Mon Mar 26 00:06:24 2012
Raid Level : raid6
Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Mar 27 01:20:05 2012
State : active
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : mbpc:0 (local to host mbpc)
UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
Events : 3210
Number Major Minor RaidDevice State
0 8 32 0 active sync /dev/sdc
1 8 64 1 active sync /dev/sde
2 8 48 2 active sync /dev/sdd
3 8 80 3 active sync /dev/sdf
4 8 16 4 active sync /dev/sdb
5 8 0 5 active sync /dev/sda
#
And this concludes successful AVAILABILITY TEST #6.
|
7
|
Unmount the array off of /mnt/MBPCBackupx, stop it, spin down all disks then reassemble the array in the same hard disk order and still ensure you can see the data. Before adding each disk back, power down each disk then bring it up using commands above or on the first page of this post.
|
PASS:
For this test, we run the following commands to basically disassemble the array and then try to reassemble it (Assuming the RAID6 is mounted on MBPCBackupx and under /dev/raidmd0 or /dev/md0):
# du -sh .
2.8T .
# pwd
/mnt/MBPCBackupx
# cd ..
# umount MBPCBackupx
# lvm lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
MBPCBackup MBPCStorage -wi-a– 2.73t
.
.
.
# lvm vgs
VG #PV #LV #SN Attr VSize VFree
MBPCStorage 1 1 0 wz–n- 3.64t 931.70g
.
.
# ls -al /dev/MBPCStorage/MBPCBackup
lrwxrwxrwx. 1 root root 7 Apr 15 17:51 /dev/MBPCStorage/MBPCBackup -> ../dm-6
# lvm vgchange -a n MBPCStorage
0 logical volume(s) in volume group "MBPCStorage" now active
# lvm lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
MBPCBackup MBPCStorage -wi-–– 2.73t
.
.
.
# ls -al /dev/MBPCStorage/MBPCBackup
ls: cannot access /dev/MBPCStorage/MBPCBackup: No such file or directory
# mdadm –stop /dev/md0
mdadm: stopped /dev/md0
# ls -al /dev/rsd*
ls: cannot access /dev/rsd*: No such file or directory
# udevadm trigger 2>&1
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsda -> sdf
lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsdb -> sdb
lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsdc -> sdd
lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsdd -> sdc
lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsde -> sde
lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsdf -> sda
# mdadm –stop /dev/md0
mdadm: stopped /dev/md0
# ls -al /dev/md0
ls: cannot access /dev/md0: No such file or directory
# ls -al /dev/raidmd0
ls: cannot access /dev/raidmd0: No such file or directory
#
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
unused devices: <none>
#
And now we can power down / spin down each disk that corresponds to the actual rsdX number above:
echo 1 > /sys/block/sda/device/delete
echo 1 > /sys/block/sdb/device/delete
echo 1 > /sys/block/sdc/device/delete
echo 1 > /sys/block/sdd/device/delete
echo 1 > /sys/block/sde/device/delete
echo 1 > /sys/block/sdf/device/delete
(You should hear clicking sounds of the heads returning to their spot as well as the subtle sound of the disk spinning down much like when you shut down the system.) Allow for a minute to let the disks spin down. At this point I'll unplug the SATA cables and replug them back in. I expect that when I reassemble my array, all my files will be there.
/var/log/messages
Apr 15 17:54:44 mbpc kernel: md0: detected capacity change from 4000814727168 to 0
Apr 15 17:54:44 mbpc kernel: md: md0 stopped.
Apr 15 17:54:44 mbpc kernel: md: unbind<sdd>
Apr 15 17:54:44 mbpc kernel: md: export_rdev(sdd)
Apr 15 17:54:44 mbpc kernel: md: unbind<sdb>
Apr 15 17:54:44 mbpc kernel: md: export_rdev(sdb)
Apr 15 17:54:44 mbpc kernel: md: unbind<sde>
Apr 15 17:54:44 mbpc kernel: md: export_rdev(sde)
Apr 15 17:54:44 mbpc kernel: md: unbind<sda>
Apr 15 17:54:44 mbpc kernel: md: export_rdev(sda)
Apr 15 17:54:44 mbpc kernel: md: unbind<sdc>
Apr 15 17:54:44 mbpc kernel: md: export_rdev(sdc)
Apr 15 17:54:44 mbpc kernel: md: unbind<sdf>
Apr 15 17:54:44 mbpc kernel: md: export_rdev(sdf)
Apr 15 18:26:34 mbpc kernel: sd 0:0:0:0: [sda] Stopping disk
Apr 15 18:26:35 mbpc kernel: ata1.00: disabled
Apr 15 18:26:43 mbpc kernel: sd 1:0:0:0: [sdb] Stopping disk
Apr 15 18:26:44 mbpc kernel: ata2.00: disabled
Apr 15 18:27:14 mbpc kernel: sd 2:0:0:0: [sdc] Stopping disk
Apr 15 18:27:14 mbpc kernel: ata3.00: disabled
Apr 15 18:27:21 mbpc kernel: sd 3:0:0:0: [sdd] Stopping disk
Apr 15 18:27:21 mbpc kernel: ata4.00: disabled
Apr 15 18:27:27 mbpc kernel: sd 4:0:0:0: [sde] Stopping disk
Apr 15 18:27:28 mbpc kernel: ata5.00: disabled
Apr 15 18:27:33 mbpc kernel: sd 5:0:0:0: [sdf] Stopping disk
Apr 15 18:27:34 mbpc kernel: ata6.00: disabled
Apr 15 18:30:00 mbpc kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Apr 15 18:30:00 mbpc kernel: ata3: irq_stat 0x00400000, PHY RDY changed
Apr 15 18:30:00 mbpc kernel: ata3: SError: { RecovComm Persist PHYRdyChg 10B8B }
Apr 15 18:30:00 mbpc kernel: ata3: hard resetting link
Apr 15 18:30:00 mbpc kernel: ata3: SATA link down (SStatus 0 SControl 300)
Apr 15 18:30:00 mbpc kernel: ata3: EH complete
Apr 15 18:32:01 mbpc kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Apr 15 18:32:01 mbpc kernel: ata5: irq_stat 0x00400000, PHY RDY changed
Apr 15 18:32:01 mbpc kernel: ata5: SError: { RecovComm Persist PHYRdyChg 10B8B }
Apr 15 18:32:01 mbpc kernel: ata5: hard resetting link
Apr 15 18:32:02 mbpc kernel: ata5: SATA link down (SStatus 0 SControl 300)
Apr 15 18:32:02 mbpc kernel: ata5: EH complete
Apr 15 18:32:12 mbpc kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Apr 15 18:32:12 mbpc kernel: ata4: irq_stat 0x00400000, PHY RDY changed
Apr 15 18:32:12 mbpc kernel: ata4: SError: { RecovComm Persist PHYRdyChg 10B8B }
Apr 15 18:32:12 mbpc kernel: ata4: hard resetting link
Apr 15 18:32:12 mbpc kernel: ata4: SATA link down (SStatus 0 SControl 300)
Apr 15 18:32:12 mbpc kernel: ata4: EH complete
Apr 15 18:32:18 mbpc kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Apr 15 18:32:18 mbpc kernel: ata6: irq_stat 0x00400000, PHY RDY changed
Apr 15 18:32:18 mbpc kernel: ata6: SError: { RecovComm Persist PHYRdyChg 10B8B }
Apr 15 18:32:18 mbpc kernel: ata6: hard resetting link
Apr 15 18:32:19 mbpc kernel: ata6: SATA link down (SStatus 0 SControl 300)
Apr 15 18:32:19 mbpc kernel: ata6: EH complete
Apr 15 18:32:22 mbpc kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Apr 15 18:32:22 mbpc kernel: ata2: irq_stat 0x00400000, PHY RDY changed
Apr 15 18:32:22 mbpc kernel: ata2: SError: { RecovComm Persist PHYRdyChg 10B8B }
Apr 15 18:32:22 mbpc kernel: ata2: hard resetting link
Apr 15 18:32:23 mbpc kernel: ata2: SATA link down (SStatus 0 SControl 300)
Apr 15 18:32:23 mbpc kernel: ata2: EH complete
Apr 15 18:32:26 mbpc kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Apr 15 18:32:26 mbpc kernel: ata1: irq_stat 0x00400000, PHY RDY changed
Apr 15 18:32:26 mbpc kernel: ata1: SError: { RecovComm Persist PHYRdyChg 10B8B }
Apr 15 18:32:26 mbpc kernel: ata1: hard resetting link
Apr 15 18:32:26 mbpc kernel: ata1: SATA link down (SStatus 0 SControl 300)
Apr 15 18:32:26 mbpc kernel: ata1: EH complete
The smartd daemon might complain as well, but this is normal in our case:
/var/log/messages
Apr 15 18:33:25 mbpc smartd[2325]: Device: /dev/sda [SAT], open() failed: No such device
Apr 15 18:33:25 mbpc smartd[2325]: Sending warning via mail to root …
Apr 15 18:33:25 mbpc smartd[2325]: Warning via mail to root: successful
Apr 15 18:33:25 mbpc smartd[2325]: Device: /dev/sdb [SAT], open() failed: No such device
Apr 15 18:33:25 mbpc smartd[2325]: Sending warning via mail to root …
Apr 15 18:33:25 mbpc smartd[2325]: Warning via mail to root: successful
Apr 15 18:33:25 mbpc smartd[2325]: Device: /dev/sdc [SAT], open() failed: No such device
Apr 15 18:33:25 mbpc smartd[2325]: Sending warning via mail to root …
Apr 15 18:33:26 mbpc smartd[2325]: Warning via mail to root: successful
Apr 15 18:33:26 mbpc smartd[2325]: Device: /dev/sdd [SAT], open() failed: No such device
Apr 15 18:33:26 mbpc smartd[2325]: Sending warning via mail to root …
Apr 15 18:33:26 mbpc smartd[2325]: Warning via mail to root: successful
Apr 15 18:33:26 mbpc smartd[2325]: Device: /dev/sde [SAT], open() failed: No such device
Apr 15 18:33:26 mbpc smartd[2325]: Sending warning via mail to root …
Apr 15 18:33:26 mbpc smartd[2325]: Warning via mail to root: successful
Apr 15 18:33:26 mbpc smartd[2325]: Device: /dev/sdf [SAT], open() failed: No such device
Apr 15 18:33:26 mbpc smartd[2325]: Sending warning via mail to root …
Apr 15 18:33:26 mbpc smartd[2325]: Warning via mail to root: successful
Now let's plug everything back in again. So here we go folks (Hopefully you didn't forget the cable sequence). Now the interesting thing was that once we replugged all the SATA cables back in, the array reassembled itself without having to do so manually!:
/var/log/messages
Apr 15 18:36:19 mbpc kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen
Apr 15 18:36:19 mbpc kernel: ata6: irq_stat 0x00000040, connection status changed
Apr 15 18:36:19 mbpc kernel: ata6: SError: { CommWake 10B8B DevExch }
Apr 15 18:36:19 mbpc kernel: ata6: hard resetting link
Apr 15 18:36:20 mbpc kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 18:36:20 mbpc kernel: ata6.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 18:36:20 mbpc kernel: ata6.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 18:36:20 mbpc kernel: ata6.00: configured for UDMA/133
Apr 15 18:36:20 mbpc kernel: ata6: EH complete
Apr 15 18:36:20 mbpc kernel: scsi 5:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5
Apr 15 18:36:20 mbpc kernel: sd 5:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 18:36:20 mbpc kernel: sd 5:0:0:0: [sda] Write Protect is off
Apr 15 18:36:20 mbpc kernel: sd 5:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 18:36:20 mbpc kernel: sd 5:0:0:0: Attached scsi generic sg0 type 0
Apr 15 18:36:25 mbpc kernel: sda: unknown partition table
Apr 15 18:36:25 mbpc kernel: sd 5:0:0:0: [sda] Attached SCSI disk
Apr 15 18:36:25 mbpc kernel: md: bind<sda>
Apr 15 18:36:26 mbpc kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
Apr 15 18:36:26 mbpc kernel: ata5: irq_stat 0x00000040, connection status changed
Apr 15 18:36:26 mbpc kernel: ata5: SError: { DevExch }
Apr 15 18:36:26 mbpc kernel: ata5: hard resetting link
Apr 15 18:36:27 mbpc kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 18:36:27 mbpc kernel: ata5.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 18:36:27 mbpc kernel: ata5.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 18:36:27 mbpc kernel: ata5.00: configured for UDMA/133
Apr 15 18:36:27 mbpc kernel: ata5: EH complete
Apr 15 18:36:27 mbpc kernel: scsi 4:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5
Apr 15 18:36:27 mbpc kernel: sd 4:0:0:0: Attached scsi generic sg1 type 0
Apr 15 18:36:27 mbpc kernel: sd 4:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 18:36:27 mbpc kernel: sd 4:0:0:0: [sdb] Write Protect is off
Apr 15 18:36:27 mbpc kernel: sd 4:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 18:36:32 mbpc kernel: sdb: unknown partition table
Apr 15 18:36:32 mbpc kernel: sd 4:0:0:0: [sdb] Attached SCSI disk
Apr 15 18:36:32 mbpc kernel: md: bind<sdb>
Apr 15 18:36:34 mbpc kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
Apr 15 18:36:34 mbpc kernel: ata4: irq_stat 0x00000040, connection status changed
Apr 15 18:36:34 mbpc kernel: ata4: SError: { DevExch }
Apr 15 18:36:34 mbpc kernel: ata4: hard resetting link
Apr 15 18:36:35 mbpc kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 18:36:35 mbpc kernel: ata4.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 18:36:35 mbpc kernel: ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 18:36:35 mbpc kernel: ata4.00: configured for UDMA/133
Apr 15 18:36:35 mbpc kernel: ata4: EH complete
Apr 15 18:36:35 mbpc kernel: scsi 3:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5
Apr 15 18:36:35 mbpc kernel: sd 3:0:0:0: Attached scsi generic sg2 type 0
Apr 15 18:36:35 mbpc kernel: sd 3:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 18:36:35 mbpc kernel: sd 3:0:0:0: [sdc] Write Protect is off
Apr 15 18:36:35 mbpc kernel: sd 3:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 18:36:40 mbpc kernel: sdc: unknown partition table
Apr 15 18:36:40 mbpc kernel: sd 3:0:0:0: [sdc] Attached SCSI disk
Apr 15 18:36:40 mbpc kernel: md: bind<sdc>
Apr 15 18:36:42 mbpc kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen
Apr 15 18:36:42 mbpc kernel: ata3: irq_stat 0x00000040, connection status changed
Apr 15 18:36:42 mbpc kernel: ata3: SError: { CommWake 10B8B DevExch }
Apr 15 18:36:42 mbpc kernel: ata3: hard resetting link
Apr 15 18:36:43 mbpc kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 18:36:43 mbpc kernel: ata3.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 18:36:43 mbpc kernel: ata3.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 18:36:43 mbpc kernel: ata3.00: configured for UDMA/133
Apr 15 18:36:43 mbpc kernel: ata3: EH complete
Apr 15 18:36:43 mbpc kernel: scsi 2:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5
Apr 15 18:36:43 mbpc kernel: sd 2:0:0:0: Attached scsi generic sg3 type 0
Apr 15 18:36:43 mbpc kernel: sd 2:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 18:36:43 mbpc kernel: sd 2:0:0:0: [sdd] Write Protect is off
Apr 15 18:36:43 mbpc kernel: sd 2:0:0:0: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 18:36:48 mbpc kernel: sdd: unknown partition table
Apr 15 18:36:48 mbpc kernel: sd 2:0:0:0: [sdd] Attached SCSI disk
Apr 15 18:36:48 mbpc kernel: md: bind<sdd>
Apr 15 18:36:48 mbpc kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x4080000 action 0xe frozen
Apr 15 18:36:48 mbpc kernel: ata2: irq_stat 0x00000040, connection status changed
Apr 15 18:36:48 mbpc kernel: ata2: SError: { 10B8B DevExch }
Apr 15 18:36:48 mbpc kernel: ata2: hard resetting link
Apr 15 18:36:49 mbpc kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 18:36:49 mbpc kernel: ata2.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 18:36:49 mbpc kernel: ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 18:36:49 mbpc kernel: ata2.00: configured for UDMA/133
Apr 15 18:36:49 mbpc kernel: ata2: EH complete
Apr 15 18:36:49 mbpc kernel: scsi 1:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5
Apr 15 18:36:49 mbpc kernel: sd 1:0:0:0: Attached scsi generic sg4 type 0
Apr 15 18:36:49 mbpc kernel: sd 1:0:0:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 18:36:49 mbpc kernel: sd 1:0:0:0: [sde] Write Protect is off
Apr 15 18:36:49 mbpc kernel: sd 1:0:0:0: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 18:36:54 mbpc kernel: sde: unknown partition table
Apr 15 18:36:54 mbpc kernel: sd 1:0:0:0: [sde] Attached SCSI disk
Apr 15 18:36:55 mbpc kernel: md: bind<sde>
Apr 15 18:36:56 mbpc kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4080000 action 0xe frozen
Apr 15 18:36:56 mbpc kernel: ata1: irq_stat 0x00000040, connection status changed
Apr 15 18:36:56 mbpc kernel: ata1: SError: { 10B8B DevExch }
Apr 15 18:36:56 mbpc kernel: ata1: hard resetting link
Apr 15 18:36:57 mbpc kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr 15 18:36:57 mbpc kernel: ata1.00: ATA-8: ST1500DL003-9VT16L, CC3C, max UDMA/133
Apr 15 18:36:57 mbpc kernel: ata1.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 18:36:57 mbpc kernel: ata1.00: configured for UDMA/133
Apr 15 18:36:57 mbpc kernel: ata1: EH complete
Apr 15 18:36:57 mbpc kernel: scsi 0:0:0:0: Direct-Access ATA ST1500DL003-9VT1 CC3C PQ: 0 ANSI: 5
Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: Attached scsi generic sg5 type 0
Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: [sdf] 2930277168 512-byte logical blocks: (1.50 TB/1.36 TiB)
Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: [sdf] 4096-byte physical blocks
Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: [sdf] Write Protect is off
Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: [sdf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 18:37:04 mbpc kernel: sdf: unknown partition table
Apr 15 18:37:04 mbpc kernel: sd 0:0:0:0: [sdf] Attached SCSI disk
Apr 15 18:37:04 mbpc kernel: md: bind<sdf>
Apr 15 18:37:04 mbpc kernel: bio: create slab <bio-1> at 1
Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sdf operational as raid disk 5
Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sde operational as raid disk 1
Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sdd operational as raid disk 0
Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sdc operational as raid disk 2
Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sdb operational as raid disk 4
Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sda operational as raid disk 3
Apr 15 18:37:04 mbpc kernel: md/raid:md0: allocated 6386kB
Apr 15 18:37:04 mbpc kernel: md/raid:md0: raid level 6 active with 6 out of 6 devices, algorithm 2
Apr 15 18:37:04 mbpc kernel: created bitmap (8 pages) for device md0
Apr 15 18:37:04 mbpc kernel: md0: bitmap initialized from disk: read 1/1 pages, set 0 of 14905 bits
Apr 15 18:37:04 mbpc kernel: md0: detected capacity change from 0 to 4000814727168
Apr 15 18:37:04 mbpc kernel: md0: unknown partition table
And let's check with the standard mdadm tools:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdf[5] sde[1] sdd[0] sdc[2] sdb[4] sda[3]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
bitmap: 0/8 pages [0KB], 65536KB chunk
unused devices: <none>
#
# ls -al /dev/raidmd0
lrwxrwxrwx. 1 root root 3 Apr 15 18:37 /dev/raidmd0 -> md0
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsda -> sdd
lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsdb -> sde
lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsdc -> sdc
lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsdd -> sda
lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsde -> sdb
lrwxrwxrwx. 1 root root 3 Apr 15 18:37 /dev/rsdf -> sdf
#
# mdadm –detail /dev/raidmd0
/dev/raidmd0:
Version : 1.2
Creation Time : Mon Mar 26 00:06:24 2012
Raid Level : raid6
Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sun Apr 15 17:52:23 2012
State : active
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : mbpc:0 (local to host mbpc)
UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
Events : 5171
Number Major Minor RaidDevice State
0 8 48 0 active sync /dev/sdd
1 8 64 1 active sync /dev/sde
2 8 32 2 active sync /dev/sdc
3 8 0 3 active sync /dev/sda
4 8 16 4 active sync /dev/sdb
5 8 80 5 active sync /dev/sdf
#
Wow! Now that, I do have to say I didn't expect to have happened all on it's own like that. Now let's try to set the VG active again and mount the array back up and check the status of our files:
# lvm vgs
VG #PV #LV #SN Attr VSize VFree
MBPCStorage 1 1 0 wz–n- 3.64t 931.70g
# lvm lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
MBPCBackup MBPCStorage -wi-–– 2.73t
# lvm vgchange -a y MBPCStorage
1 logical volume(s) in volume group "MBPCStorage" now active
# lvm lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
MBPCBackup MBPCStorage -wi-a– 2.73t
# lvm vgs
VG #PV #LV #SN Attr VSize VFree
MBPCStorage 1 1 0 wz–n- 3.64t 931.70g
#
And now for the mount:
# mount /mnt/MBPCBackupx/
# mount|grep MBPC
/dev/mapper/MBPCStorage-MBPCBackup on /mnt/MBPCBackupx type xfs (rw,noatime,nodiratime,logbufs=8,allocsize=512m)
# cd /mnt/MBPCBackup
bash: cd: /mnt/MBPCBackup: No such file or directory
# cd /mnt/MBPCBackupx/
# du -sh .
2.8T .
# df -m .
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
2861348 2849239 12110 100% /mnt/MBPCBackupx
#
This test is definitely a success!
|
8
|
In this test, we'll repeat TEST #7 however reassemble in a different order then how it was originally assembled. The UDEV rules we've created earlier will be key here.
|
PASS:
Because the only thing that is different is how we replug the SATA cables, everything will be the same as in AVAILABILITY TEST #7 above with the exception that we'll replug the SATA cables in random order then what we originally had. The result:
/var/log/messages
Apr 15 22:33:02 mbpc kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x4080000 action 0xe frozen
Apr 15 22:33:02 mbpc kernel: ata6: irq_stat 0x00000040, connection status changed
Apr 15 22:33:02 mbpc kernel: ata6: SError: { 10B8B DevExch }
Apr 15 22:33:02 mbpc kernel: ata6: hard resetting link
Apr 15 22:33:03 mbpc kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 22:33:03 mbpc kernel: ata6.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 22:33:03 mbpc kernel: ata6.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 22:33:03 mbpc kernel: ata6.00: configured for UDMA/133
Apr 15 22:33:03 mbpc kernel: ata6: EH complete
Apr 15 22:33:03 mbpc kernel: scsi 5:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5
Apr 15 22:33:03 mbpc kernel: sd 5:0:0:0: Attached scsi generic sg0 type 0
Apr 15 22:33:03 mbpc kernel: sd 5:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 22:33:03 mbpc kernel: sd 5:0:0:0: [sda] Write Protect is off
Apr 15 22:33:03 mbpc kernel: sd 5:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 22:33:08 mbpc kernel: sda: unknown partition table
Apr 15 22:33:08 mbpc kernel: sd 5:0:0:0: [sda] Attached SCSI disk
Apr 15 22:33:08 mbpc kernel: md: bind<sda>
Apr 15 22:33:10 mbpc kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen
Apr 15 22:33:10 mbpc kernel: ata4: irq_stat 0x00000040, connection status changed
Apr 15 22:33:10 mbpc kernel: ata4: SError: { CommWake 10B8B DevExch }
Apr 15 22:33:10 mbpc kernel: ata4: hard resetting link
Apr 15 22:33:10 mbpc kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 22:33:10 mbpc kernel: ata4.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 22:33:10 mbpc kernel: ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 22:33:10 mbpc kernel: ata4.00: configured for UDMA/133
Apr 15 22:33:10 mbpc kernel: ata4: EH complete
Apr 15 22:33:10 mbpc kernel: scsi 3:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5
Apr 15 22:33:10 mbpc kernel: sd 3:0:0:0: Attached scsi generic sg1 type 0
Apr 15 22:33:10 mbpc kernel: sd 3:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 22:33:10 mbpc kernel: sd 3:0:0:0: [sdb] Write Protect is off
Apr 15 22:33:10 mbpc kernel: sd 3:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 22:33:16 mbpc kernel: sdb: unknown partition table
Apr 15 22:33:16 mbpc kernel: sd 3:0:0:0: [sdb] Attached SCSI disk
Apr 15 22:33:16 mbpc kernel: md: bind<sdb>
Apr 15 22:33:20 mbpc kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
Apr 15 22:33:20 mbpc kernel: ata2: irq_stat 0x00000040, connection status changed
Apr 15 22:33:20 mbpc kernel: ata2: SError: { DevExch }
Apr 15 22:33:20 mbpc kernel: ata2: hard resetting link
Apr 15 22:33:21 mbpc kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 22:33:21 mbpc kernel: ata2.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 22:33:21 mbpc kernel: ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 22:33:21 mbpc kernel: ata2.00: configured for UDMA/133
Apr 15 22:33:21 mbpc kernel: ata2: EH complete
Apr 15 22:33:21 mbpc kernel: scsi 1:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5
Apr 15 22:33:21 mbpc kernel: sd 1:0:0:0: Attached scsi generic sg2 type 0
Apr 15 22:33:21 mbpc kernel: sd 1:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 22:33:21 mbpc kernel: sd 1:0:0:0: [sdc] Write Protect is off
Apr 15 22:33:21 mbpc kernel: sd 1:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 22:33:25 mbpc smartd[2325]: Device: /dev/sdc [SAT], open() failed: Permission denied
Apr 15 22:33:25 mbpc smartd[2325]: Device: /dev/sdd [SAT], open() failed: No such device
Apr 15 22:33:25 mbpc smartd[2325]: Device: /dev/sde [SAT], open() failed: No such device
Apr 15 22:33:25 mbpc smartd[2325]: Device: /dev/sdf [SAT], open() failed: No such device
Apr 15 22:33:26 mbpc kernel: sdc: unknown partition table
Apr 15 22:33:26 mbpc kernel: sd 1:0:0:0: [sdc] Attached SCSI disk
Apr 15 22:33:27 mbpc kernel: md: bind<sdc>
Apr 15 22:33:44 mbpc kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen
Apr 15 22:33:44 mbpc kernel: ata5: irq_stat 0x00000040, connection status changed
Apr 15 22:33:44 mbpc kernel: ata5: SError: { CommWake 10B8B DevExch }
Apr 15 22:33:44 mbpc kernel: ata5: hard resetting link
Apr 15 22:33:49 mbpc kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 22:33:49 mbpc kernel: ata5: link online but 1 devices misclassified, retrying
Apr 15 22:33:49 mbpc kernel: ata5: reset failed (errno=-11), retrying in 5 secs
Apr 15 22:33:54 mbpc kernel: ata5: hard resetting link
Apr 15 22:33:54 mbpc kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 22:33:54 mbpc kernel: ata5.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 22:33:54 mbpc kernel: ata5.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 22:33:54 mbpc kernel: ata5.00: configured for UDMA/133
Apr 15 22:33:54 mbpc kernel: ata5: EH complete
Apr 15 22:33:54 mbpc kernel: scsi 4:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5
Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: Attached scsi generic sg3 type 0
Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: [sdd] Write Protect is off
Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 22:33:54 mbpc kernel: sdd: unknown partition table
Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: [sdd] Attached SCSI disk
Apr 15 22:33:55 mbpc kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen
Apr 15 22:33:55 mbpc kernel: ata3: irq_stat 0x00000040, connection status changed
Apr 15 22:33:55 mbpc kernel: ata3: SError: { CommWake 10B8B DevExch }
Apr 15 22:33:55 mbpc kernel: ata3: hard resetting link
Apr 15 22:33:55 mbpc kernel: md: bind<sdd>
Apr 15 22:33:55 mbpc kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 22:33:55 mbpc kernel: ata3.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 22:33:55 mbpc kernel: ata3.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 22:33:55 mbpc kernel: ata3.00: configured for UDMA/133
Apr 15 22:33:55 mbpc kernel: ata3: EH complete
Apr 15 22:33:55 mbpc kernel: scsi 2:0:0:0: Direct-Access ATA ST31000520AS CC32 PQ: 0 ANSI: 5
Apr 15 22:33:55 mbpc kernel: sd 2:0:0:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 22:33:55 mbpc kernel: sd 2:0:0:0: [sde] Write Protect is off
Apr 15 22:33:55 mbpc kernel: sd 2:0:0:0: Attached scsi generic sg4 type 0
Apr 15 22:33:55 mbpc kernel: sd 2:0:0:0: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 22:34:01 mbpc kernel: sde: unknown partition table
Apr 15 22:34:01 mbpc kernel: sd 2:0:0:0: [sde] Attached SCSI disk
Apr 15 22:34:01 mbpc kernel: md: bind<sde>
Apr 15 22:34:10 mbpc kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
Apr 15 22:34:10 mbpc kernel: ata1: irq_stat 0x00000040, connection status changed
Apr 15 22:34:10 mbpc kernel: ata1: SError: { DevExch }
Apr 15 22:34:10 mbpc kernel: ata1: hard resetting link
Apr 15 22:34:11 mbpc kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr 15 22:34:11 mbpc kernel: ata1.00: ATA-8: ST1500DL003-9VT16L, CC3C, max UDMA/133
Apr 15 22:34:11 mbpc kernel: ata1.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 22:34:11 mbpc kernel: ata1.00: configured for UDMA/133
Apr 15 22:34:11 mbpc kernel: ata1: EH complete
Apr 15 22:34:11 mbpc kernel: scsi 0:0:0:0: Direct-Access ATA ST1500DL003-9VT1 CC3C PQ: 0 ANSI: 5
Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: Attached scsi generic sg5 type 0
Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: [sdf] 2930277168 512-byte logical blocks: (1.50 TB/1.36 TiB)
Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: [sdf] 4096-byte physical blocks
Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: [sdf] Write Protect is off
Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: [sdf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 22:34:18 mbpc kernel: sdf: unknown partition table
Apr 15 22:34:18 mbpc kernel: sd 0:0:0:0: [sdf] Attached SCSI disk
Apr 15 22:34:19 mbpc kernel: md: bind<sdf>
Apr 15 22:34:19 mbpc kernel: bio: create slab <bio-1> at 1
Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sdf operational as raid disk 5
Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sde operational as raid disk 1
Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sdd operational as raid disk 0
Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sdc operational as raid disk 2
Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sdb operational as raid disk 3
Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sda operational as raid disk 4
Apr 15 22:34:19 mbpc kernel: md/raid:md0: allocated 6386kB
Apr 15 22:34:19 mbpc kernel: md/raid:md0: raid level 6 active with 6 out of 6 devices, algorithm 2
Apr 15 22:34:19 mbpc kernel: created bitmap (8 pages) for device md0
Apr 15 22:34:19 mbpc kernel: md0: bitmap initialized from disk: read 1/1 pages, set 0 of 14905 bits
Apr 15 22:34:19 mbpc kernel: md0: detected capacity change from 0 to 4000814727168
Apr 15 22:34:19 mbpc kernel: md0: unknown partition table
And through the standard utilities:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdf[5] sde[1] sdd[0] sdc[2] sdb[3] sda[4]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
bitmap: 0/8 pages [0KB], 65536KB chunk
unused devices: <none>
# mdadm –detail /dev/raidmd0
/dev/raidmd0:
Version : 1.2
Creation Time : Mon Mar 26 00:06:24 2012
Raid Level : raid6
Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sun Apr 15 21:44:49 2012
State : active
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : mbpc:0 (local to host mbpc)
UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
Events : 5171
Number Major Minor RaidDevice State
0 8 48 0 active sync /dev/sdd
1 8 64 1 active sync /dev/sde
2 8 32 2 active sync /dev/sdc
3 8 16 3 active sync /dev/sdb
4 8 0 4 active sync /dev/sda
5 8 80 5 active sync /dev/sdf
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 15 22:33 /dev/rsda -> sdd
lrwxrwxrwx. 1 root root 3 Apr 15 22:34 /dev/rsdb -> sde
lrwxrwxrwx. 1 root root 3 Apr 15 22:33 /dev/rsdc -> sdc
lrwxrwxrwx. 1 root root 3 Apr 15 22:33 /dev/rsdd -> sdb
lrwxrwxrwx. 1 root root 3 Apr 15 22:33 /dev/rsde -> sda
lrwxrwxrwx. 1 root root 3 Apr 15 22:34 /dev/rsdf -> sdf
# ls -al /dev/raidmd0
lrwxrwxrwx. 1 root root 3 Apr 15 22:34 /dev/raidmd0 -> md0
#
# df -m .
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
2861348 2849239 12110 100% /mnt/MBPCBackupx
#
Again, another successful test.
|
9
|
Repeat TEST # 7 above however don't unplug and plug the SDD's back in but scan the bus instead. This will be a lighter test of #7 and #8 above.
|
The difference here rests in the recovery steps where we trigger the rescan of the SCSI hosts to detect the drives again:
echo "0 0 0" >/sys/class/scsi_host/host0/scan
echo "0 0 0" >/sys/class/scsi_host/host1/scan
echo "0 0 0" >/sys/class/scsi_host/host2/scan
echo "0 0 0" >/sys/class/scsi_host/host3/scan
echo "0 0 0" >/sys/class/scsi_host/host4/scan
echo "0 0 0" >/sys/class/scsi_host/host5/scan
Following the above, the SDD's were again visible:
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsda -> sde
lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsdb -> sdc
lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsdc -> sdb
lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsdd -> sdd
lrwxrwxrwx. 1 root root 3 Apr 15 22:49 /dev/rsde -> sdf
lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsdf -> sda
#
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdf[4] sde[0] sdd[3] sdc[1] sdb[2] sda[5]
3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
bitmap: 0/8 pages [0KB], 65536KB chunk
unused devices: <none>
#
And this concludes the last test we'll do.
|
Hi,
Great post. You don’t need to specify the parameters when creating the XFS file system, see http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E and http://www.spinics.net/lists/raid/msg38074.html . Of course, YMMV.
Did you run those benchmarks while the array was resyncing?
Hey Mathias,
Thanks for posting. Just added the testing numbers so feel free to have a look and judge yourself.
> logbsize and delaylog
I ran another test with logbsize=128k (couldn’t find anything for delaylog in my mkfs.xfs man page so I’m not sure if that’ll do anything). Little to no difference in this case on first glance. Watch out for the results at some point for a closer look.
One consideration here is that eventually I would grow the LVM and XFS to fill up to 4TB. I’ll be doing this soon Potentially in the future, I may try to grow this array as well to something well over 8TB (Yet to see how to do that). I’m not sure if XFS would auto-adjust in those cases for optimal values for those capacities and the link didn’t touch on that topic.
All in all, I can still run tests on this thing recreating the FS if I need to so feel free to suggest numbers you’d be interested to see. I might leave this topic open for a week or two to see if I can think of anything else or if I’m missing anything. For my setup, having anything > 125MB/s is a bonus as the network is only 1GB/s with that theoretical max.
Cheers!
TK
[…] could be done safely enough like this guy did and with RAID6 as well with SSD type R/W’s no less. Your size would be limited to the size of the […]
Thank you for posting this blog. I was getting desparate. I could not figure out why I could not stop the RAID1 device. Even from Ubuntu Rescue Remix. The LVM group was being assembled from the failed raid. I removed the volume group and was finally able to gain exclusive access to the array to stop it, put in the new disk and rebuild the array.
Nice job.
Best,
Dave.
[…] we'll use for this is the APCUPSD daemon available in RPM format. We've set one up for our HTPCB server for a home redundancy / backup solution to protect against power surges and bridge the […]
[…] every time while transferring my files. At the point, I not only lost connectivity with the HTPC+B but also my web access most of the time. Here are the culprits and here's how we went […]
[…] removed the cable and the adapter and only used a 2 foot cable to my HTPC+B system I've just configured. Voila! Problem solved. Ultimately, it's […]
[…] them from system to system to avoid choppy video / sound and also to accommodate the needs of our HTPC+B solution through file […]
[…] Linux Networking: Persistent naming rules based on MAC for eth0 and wlan0 Linux: HTPC / Home Backup: MDADM, RAID6, LVM, XFS, CIFS and NFS […]
[…] at this point and 4:15 minutes have passed). While this was going on, we are referencing our HTPC page for […]
[…] HTPC, Backup & Storage […]