Header Shadow Image


HTPC / Backup Home Server Solution using Linux

Pages: 1 2 3 4 5

 

RELIABILITY AND AVAILABILITY TESTS

Performance tests are one thing but if such a setup as we have here doesn't stay up and is prone to crashing and data losses, performance will mean little.  In this section, we'll put the array to the test by subjecting it to various scenarios that attempt to simulate real world events that could result in data loss or destroy the array. 

TEST # TEST CONFIG TEST RESULTS
1

Fill the storage up to it's maximum possible.  For this, we will attempt to write a function to auto fill the RAID6 array at it's maximum speed.  At the end of this table is the script to do just that.

An alternate way to do the same using a copy of a file (Be sure to copy the first sample to the storage array first as the script assumes the file resides on the storage):

for (( fcp = 0; fcp < 5000; fcp++ )); do cp -i fill.dat fill.$fcp.dat; done

NOTE: Either way would work.  The second method would not be hit by a read penalty since the file would be cached for multiple copies anyway.

 

PASS: RAID6 array filled to capacity.  No errors.

# df -m .
Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
                       1907288   1907288         1 100% /mnt/MBPCBackupx
#

 

# xfs_db -c frag -r /dev/MBPCStorage/MBPCBackup
actual 59209, ideal 29835, fragmentation factor 49.61%
#

smartctl quick check:

# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
/dev/sda: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdb: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdc: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdd: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sde: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdf: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdg: 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       –       0
#

 

While filling the FS using the AWK script, the task would complete to 100% capacity and when the message not enough disk space was encountered however, would revert to 85% usage in an instant.  This was puzzling however this may be fixed by adding this option to the XFS mount command:


# mount -o inode64

This would allow XFS to save the inode table beyond the first TB of data space. (Have not tested this option as filling the FS using the KSH script worked without that descrepancy.)

2

Delete all test files created in step 1.

 

PASS: Delete was instantenous.
3

Fill the array up with a copy of a binary file.  (ie Linux ISO's could be a good choice.)  Use the second script from step 1.

for (( fcp = 0; fcp < 10000; fcp++ )); do cp -i test.dat $fcp.dat; done

PASS:  Result showed no errors or issues in filling the array:

 

 

 

 

 

 

 

 

 

 

 

# df -m .
Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
                       1907288   1907288         1 100% /mnt/MBPCBackupx
You have new mail in /var/spool/mail/root
#

 

2 Grow the LVM and XFS filesystem on top of it to 3TB from the 2TB we've set above [or] on the first page.

PASS:

Grow the LVM by 50% of the free space to 3TB (I used a two command combination one liner):

#  lvm lvextend -L+$(lvm vgs –units S|awk '{ if ( $1 ~ /MBPCStorage/ ) print $7 / (100 / 50); }')S /dev/MBPCStorage/MBPCBackup
  Extending logical volume MBPCBackup to 2.73 TiB
  Logical volume MBPCBackup successfully resized
#

# lvm lvdisplay  /dev/MBPCStorage/MBPCBackup –units G
  — Logical volume —
  LV Name                /dev/MBPCStorage/MBPCBackup
  VG Name                MBPCStorage
  LV UUID                k6dLRW-BUht-tm0n-9GU8-e6ma-M0qW-wJ4TiD
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                3000.41 GB
  Current LE             715353
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  – currently set to     32768
  Block device           253:6
  
#

I then used this command to extend the XFS filesystem (XFS FS is still at 100% usage at this point).  An XFS filesystem is extended while mounted:

# xfs_growfs /mnt/MBPCBackupx/
meta-data=/dev/mapper/MBPCStorage-MBPCBackup isize=256    agcount=64, agsize=7629408 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=488282112, imaxpct=5
         =                       sunit=16     swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=16 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 488282112 to 732521472
# cd MBPCBackupx/
# df -m .
Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
                       2861348   1907290    954059  67% /mnt/MBPCBackupx
# xfs_growfs -n /mnt/MBPCBackupx/
meta-data=/dev/mapper/MBPCStorage-MBPCBackup isize=256    agcount=97, agsize=7629408 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=732521472, imaxpct=5
         =                       sunit=16     swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=16 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
#

 

3 Refill the array using the method in TEST 1. (We will reuse the second script in this case.)

PASS

# cd /mnt/MBPCBackupx

Then start from the last file the system could write, 4838 (File # 4839):

# for (( fcp = 4839; fcp < 10000; fcp++ )); do cp -i test.dat $fcp.dat; done

# df -m .
Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
                       2861348   2861348         1 100% /mnt/MBPCBackupx
#

4

Remove two disks and reinsert them in swapped order (w/o Writing or reading data or resyncing etc) while array is quiet.  Here, we want to see the rebuild time and statistics when used in combination with –bitmap=internal option.

PASS

For this test we'll select two disks arbitrarily but first we get their serial numbers so we can identify which disks to pull from the outside (Use hdparm -i /dev/rsda for example to get that):

/dev/rsda  (9VX0X9TA) – /dev/sdc
/dev/rsdd  (9VX0WJKA) – /dev/sdf

 

First we'll spin the platters down and effectively shut the disks down:

echo 1 > /sys/block/sdc/device/delete
echo 1 > /sys/block/sdf/device/delete

Allow some appropriate amount of time to let the disks spin down.  Check using below command to see that two disks are in fact now unavailable in the array:

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdb[4] sdc[0](F) sde[1] sdd[2] sdf[3](F) sda[5]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [_UU_UU]
      bitmap: 0/8 pages [0KB], 65536KB chunk

unused devices: <none>
#

The array should still be functioning fine.  Delete one of the files that were copied and copy another in it's place as a quick test while the disks are gone:

# df -m .
Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
                       2861348   2861348         1 100% /mnt/MBPCBackupx
#

# ls -al 99[8-9]*
-rwxr–r-x. 1 root root 413368320 Apr  9 00:50 998.dat
-rwxr–r-x. 1 root root 413368320 Apr 10 21:46 999.dat
#

Remove the drives that were turned off and replug them back in in each other's slots then check the array:

# mdadm –detail /dev/raidmd0
/dev/raidmd0:
        Version : 1.2
  Creation Time : Mon Mar 26 00:06:24 2012
     Raid Level : raid6
     Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Apr 10 21:54:39 2012
          State : active, degraded
 Active Devices : 4
Working Devices : 4
 Failed Devices : 2
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : mbpc:0  (local to host mbpc)
           UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
         Events : 3991

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       64        1      active sync   /dev/sde
       2       8       48        2      active sync   /dev/sdd
       3       0        0        3      removed
       4       8       16        4      active sync   /dev/sdb
       5       8        0        5      active sync   /dev/sda

       0       8       32        –      faulty spare
       3       8       80        –      faulty spare
# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i,j}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
ls: cannot access /dev/sdc: No such file or directory
ls: cannot access /dev/sdf: No such file or directory
ls: cannot access /dev/sdj: No such file or directory
/dev/sda: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdb: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdd: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sde: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdg: 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       –       0
/dev/sdh: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdi: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
#

From the above we can see which ones are marked as faulty (F) and how many are active in the array (6/4).  Now the drives are back but now as sdh and sdi.  However, our UDEV rules are still creating the original links to point to the drives with the above serial numbers irrespective of the names (sdi, sdh, sdf, sdc etc) given by the system:

# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 10 21:52 /dev/rsda -> sdh
lrwxrwxrwx. 1 root root 3 Apr  6 23:39 /dev/rsdb -> sde
lrwxrwxrwx. 1 root root 3 Apr  6 23:39 /dev/rsdc -> sdd
lrwxrwxrwx. 1 root root 3 Apr 10 21:54 /dev/rsdd -> sdi
lrwxrwxrwx. 1 root root 3 Apr  6 23:39 /dev/rsde -> sdb
lrwxrwxrwx. 1 root root 3 Apr  6 23:39 /dev/rsdf -> sda
#

Now that the drives are plugged in again, we need to tell the system to add them to the array again:

# mdadm –add /dev/raidmd0 /dev/rsda
mdadm: /dev/rsda reports being an active member for /dev/raidmd0, but a –re-add fails.
mdadm: not performing –add as that would convert /dev/rsda in to a spare.
mdadm: To make this a spare, use "mdadm –zero-superblock /dev/rsda" first.
# mdadm –add /dev/raidmd0 /dev/rsdd
mdadm: /dev/rsdd reports being an active member for /dev/raidmd0, but a –re-add fails.
mdadm: not performing –add as that would convert /dev/rsdd in to a spare.
mdadm: To make this a spare, use "mdadm –zero-superblock /dev/rsdd" first.
#

Can't use the following either as the devices are no longer listed under /dev/:

# mdadm –manage /dev/raidmd0 –fail /dev/sdf
mdadm: cannot find /dev/sdf: No such file or directory
# mdadm –manage /dev/raidmd0 –fail /dev/sdc
mdadm: cannot find /dev/sdc: No such file or directory
# mdadm –manage /dev/raidmd0 –remove /dev/sdf
mdadm: cannot find /dev/sdf: No such file or directory
# mdadm –manage /dev/raidmd0 –remove /dev/sdc
mdadm: cannot find /dev/sdc: No such file or directory
#

Instead, we will use the same device names as above but without /dev/ in the name:

# mdadm –manage /dev/raidmd0 –remove sdf
mdadm: hot removed sdf from /dev/raidmd0
#

# mdadm –manage /dev/raidmd0 –remove sdc
mdadm: hot removed sdc from /dev/raidmd0
#

The result of the operation while checking that the data integrity was still fine:

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdb[4] sde[1] sdd[2] sda[5]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [_UU_UU]
      bitmap: 3/8 pages [12KB], 65536KB chunk

unused devices: <none>
#

# mdadm –detail /dev/raidmd0
/dev/raidmd0:
        Version : 1.2
  Creation Time : Mon Mar 26 00:06:24 2012
     Raid Level : raid6
     Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 4
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Apr 10 22:33:40 2012
          State : active, degraded
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : mbpc:0  (local to host mbpc)
           UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
         Events : 4317

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       64        1      active sync   /dev/sde
       2       8       48        2      active sync   /dev/sdd
       3       0        0        3      removed
       4       8       16        4      active sync   /dev/sdb
       5       8        0        5      active sync   /dev/sda
#

Now it appears we can readd the replugged devices.  The internal bitmap is now making the recovery nearly instantaneous all the while drive is 100% full with 3TB of data:

# mdadm –add /dev/raidmd0 /dev/rsda
mdadm: re-added /dev/rsda
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdh[0] sdb[4] sde[1] sdd[2] sda[5]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [_UU_UU]
      [=======>………….]  recovery = 35.3% (344989104/976761408) finish=183.8min speed=57259K/sec
      bitmap: 3/8 pages [12KB], 65536KB chunk

unused devices: <none>
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdh[0] sdb[4] sde[1] sdd[2] sda[5]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUU_UU]
      bitmap: 3/8 pages [12KB], 65536KB chunk

unused devices: <none>
# date
Tue Apr 10 22:38:15 EDT 2012
#

To give a good perspective how quick that took we look at the logs:

/var/log/messages
Apr 10 22:37:57 mbpc kernel: md: bind<sdh>
Apr 10 22:37:57 mbpc kernel: md: recovery of RAID array md0
Apr 10 22:37:57 mbpc kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Apr 10 22:37:57 mbpc kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Apr 10 22:37:57 mbpc kernel: md: using 128k window, over a total of 976761408k.
Apr 10 22:38:06 mbpc kernel: md: md0: recovery done.
 

So about 9 seconds.  Now we readd the second disk but this time we time the rebuild differently to show how quick it actually takes:

# date; mdadm –add /dev/raidmd0 /dev/rsdd; sleep 5; cat /proc/mdstat; sleep 5; cat /proc/mdstat
Tue Apr 10 22:43:15 EDT 2012
mdadm: re-added /dev/rsdd
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdi[3] sdh[0] sdb[4] sde[1] sdd[2] sda[5]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUU_UU]
      [=====>……………]  recovery = 25.0% (244315840/976761408) finish=234.9min speed=51955K/sec
      bitmap: 3/8 pages [12KB], 65536KB chunk

unused devices: <none>
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdi[3] sdh[0] sdb[4] sde[1] sdd[2] sda[5]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUU_UU]
      [=======>………….]  recovery = 36.0% (351886304/976761408) finish=215.9min speed=48227K/sec
      bitmap: 3/8 pages [12KB], 65536KB chunk

unused devices: <none>
#

Again, checking the logs we see how long the rebuild took:

/var/log/messages
Apr 10 22:43:15 mbpc kernel: md: bind<sdi>
Apr 10 22:43:15 mbpc kernel: md: recovery of RAID array md0
Apr 10 22:43:15 mbpc kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Apr 10 22:43:15 mbpc kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Apr 10 22:43:15 mbpc kernel: md: using 128k window, over a total of 976761408k.
Apr 10 22:43:27 mbpc kernel: md: md0: recovery done.
 

Let's check if it's all done and if everything checks out fine:

# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i,j}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
ls: cannot access /dev/sdc: No such file or directory
ls: cannot access /dev/sdf: No such file or directory
ls: cannot access /dev/sdj: No such file or directory
/dev/sda: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdb: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdd: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sde: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdg: 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       –       0
/dev/sdh: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdi: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdi[3] sdh[0] sdb[4] sde[1] sdd[2] sda[5]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      bitmap: 1/8 pages [4KB], 65536KB chunk

unused devices: <none>
#

# mdadm –detail /dev/raidmd0
/dev/raidmd0:
        Version : 1.2
  Creation Time : Mon Mar 26 00:06:24 2012
     Raid Level : raid6
     Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Apr 10 22:53:02 2012
          State : active
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : mbpc:0  (local to host mbpc)
           UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
         Events : 4414

    Number   Major   Minor   RaidDevice State
       0       8      112        0      active sync   /dev/sdh
       1       8       64        1      active sync   /dev/sde
       2       8       48        2      active sync   /dev/sdd
       3       8      128        3      active sync   /dev/sdi
       4       8       16        4      active sync   /dev/sdb
       5       8        0        5      active sync   /dev/sda
#

And we are back to our normal operating mode making this a successful test.

5 Do the following in the order listed:

 

 

 

 

 

 

 

 

 

 

 

  • Start writing large GB in size files to the storage continously.
  • Begin watching a video or open a large file from another user. (Use switch user)
  • Turn off two disk drives / spin down the platters. (See above or the first page for doing that)
  • Remove the stopped disks from the chassis.
  • Turn off the power at the power supply.

Upon startup / reboot:

  • Bring back the array in degraded mode.
  • Start to watch a video again or accessing the large file.
  • Begin to write data to the RAID6 storage again. (While in degraded mode with two disks missing.)
  • Reinsert the two disks back swapping each disks location.
    (ie if disk 1 was taken out of slot 1 and disk 2 was taken out of slot 2, insert disk 1 into slot 2 and disk 2 into slot 1)

PASS (FIRST TEST):

For this test, we will remove some files from the above tests to move us back from the 100% usage so we can write large files to the array continuously: 

Start using the disks such as watching a video then writing files.

# cd /mnt/MBPCBackupx
# date +%s:%N; cp -p /home/mdadm.dat ./raid6.0.dat; cp raid6.0.dat raid6.1.dat; cp -p /home/mdadm.dat ./raid6.2.dat; date +%s:%N;

(And again if the above finishes too quickly)  Get the disk serial numbers first so we know what to unplug:

# hdparm -i /dev/sdb
/dev/sdb:
 Model=ST31000520AS, FwRev=CC32, SerialNo=9VX0WK55

# hdparm -i /dev/sdc
/dev/sdc:
 Model=ST31000520AS, FwRev=CC32, SerialNo=9VX0X5KC

# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 10 21:52 /dev/rsda -> sdh
lrwxrwxrwx. 1 root root 3 Apr 11 00:22 /dev/rsdb -> sdc
lrwxrwxrwx. 1 root root 3 Apr  6 23:39 /dev/rsdc -> sdd
lrwxrwxrwx. 1 root root 3 Apr 10 21:54 /dev/rsdd -> sdi
lrwxrwxrwx. 1 root root 3 Apr 11 00:21 /dev/rsde -> sdb
lrwxrwxrwx. 1 root root 3 Apr  6 23:39 /dev/rsdf -> sda
#

Then spin down the disks:

# echo 1 > /sys/block/sdc/device/delete
# echo 1 > /sys/block/sdb/device/delete

The result was:

# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 10 21:52 /dev/rsda -> sdh
lrwxrwxrwx. 1 root root 3 Apr  6 23:39 /dev/rsdc -> sdd
lrwxrwxrwx. 1 root root 3 Apr 10 21:54 /dev/rsdd -> sdi
lrwxrwxrwx. 1 root root 3 Apr  6 23:39 /dev/rsdf -> sda
#

All while reading and writing again:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda            2168.50  3847.70   35.20   43.60  9091.60 16289.80   644.20     7.76   97.77   6.53  51.46
sdd            2173.70  3829.50   34.80   39.00  9116.40 16274.20   688.09     9.94  133.18   7.38  54.44
sdb            2169.80  3841.60   36.90   42.20  9110.40 16260.60   641.49    10.49  131.63   6.71  53.09
sdg               2.40     5.80  402.60    1.10 49859.20    27.20   247.15     2.45    6.07   2.39  96.54
dm-0              0.00     0.00   18.20    6.80   348.80    27.20    30.08     2.07   82.67  14.06  35.16
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md0               0.00     0.00   25.70  927.60  1638.40 55020.50   118.87     0.00    0.00   0.00   0.00
dm-3              0.00     0.00  386.80    0.00 49510.40     0.00   256.00     1.75    4.52   2.48  95.80
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-6              0.00     0.00   25.70  927.40  1638.40 55014.10   118.88   591.18  655.58   0.63  60.52
sdh            2161.40  3841.10   42.50   48.90  9091.60 16160.60   552.56     8.67   94.48   5.72  52.26
sdi            2165.50  3824.70   44.20   47.10  9120.00 16155.00   553.67     9.44  103.01   5.41  49.38

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdi[3] sdh[0] sdd[2] sda[5]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U]
      bitmap: 4/8 pages [16KB], 65536KB chunk

unused devices: <none>
#

Then unplug the disks and remove them.  After that check that the video is still playing and that the file is still being written to the disk.  Once verified, we shut off the power to the system from the switch at the power supply for the server/HTPC+B. Start up the server and bring back the array in the following manner and ascertain the situation:

# mount /mnt/MBPCBackupx/
mount: special device /dev/MBPCStorage/MBPCBackup does not exist
# ls -al /dev/raidmd0
ls: cannot access /dev/raidmd0: No such file or directory
# ls -al /dev/md*
brw-rw—-. 1 root disk 9, 0 Apr 11 00:44 /dev/md0

/dev/md:
total 4
drwxr-xr-x.  2 root root   60 Apr 11 00:44 .
drwxr-xr-x. 22 root root 4400 Apr 11 00:45 ..
-rw——-.  1 root root   54 Apr 11 00:44 md-device-map
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 11 00:44 /dev/rsda -> sdd
lrwxrwxrwx. 1 root root 3 Apr 11 00:44 /dev/rsdc -> sdc
lrwxrwxrwx. 1 root root 3 Apr 11 00:44 /dev/rsdd -> sdb
lrwxrwxrwx. 1 root root 3 Apr 11 00:44 /dev/rsdf -> sda
# mdadm –detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Mon Mar 26 00:06:24 2012
     Raid Level : raid6
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Wed Apr 11 00:38:51 2012
          State : active, degraded, Not Started
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : mbpc:0  (local to host mbpc)
           UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
         Events : 4823

    Number   Major   Minor   RaidDevice State
       0       8       48        0      active sync   /dev/sdd
       1       0        0        1      removed
       2       8       32        2      active sync   /dev/sdc
       3       8       16        3      active sync   /dev/sdb
       4       0        0        4      removed
       5       8        0        5      active sync   /dev/sda
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdd[0] sdc[2] sdb[3] sda[5]
      4395422240 blocks super 1.2
      
unused devices: <none>
#

# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
ls: cannot access /dev/sdf: No such file or directory
ls: cannot access /dev/sdg: No such file or directory
ls: cannot access /dev/sdh: No such file or directory
ls: cannot access /dev/sdi: No such file or directory
/dev/sda: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdb: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdc: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdd: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sde: 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       –       0
#

So at least we can see things and that our array still has 4 disks and non list as having bad blocks or any other problems that would kill this array.  So let's try to reassemble it again:

# mdadm –assemble –scan
#

No result from trying to start the array.  Hmm.  So let's try with the verbose option:

# mdadm –assemble –scan -v
mdadm: looking for devices for /dev/md/0
mdadm: cannot open device /dev/dm-5: Device or resource busy
mdadm: cannot open device /dev/dm-4: Device or resource busy
mdadm: cannot open device /dev/dm-3: Device or resource busy
mdadm: no RAID superblock on /dev/dm-2
mdadm: cannot open device /dev/dm-1: Device or resource busy
mdadm: cannot open device /dev/dm-0: Device or resource busy
mdadm: cannot open device /dev/sde3: Device or resource busy
mdadm: cannot open device /dev/sde2: Device or resource busy
mdadm: cannot open device /dev/sde1: Device or resource busy
mdadm: cannot open device /dev/sde: Device or resource busy
mdadm: cannot open device /dev/sdd: Device or resource busy
mdadm: cannot open device /dev/sdc: Device or resource busy
mdadm: cannot open device /dev/sdb: Device or resource busy
mdadm: cannot open device /dev/sda: Device or resource busy

So let's examine this array further:

# mdadm –examine –scan
ARRAY /dev/md/0 metadata=1.2 UUID=2f36ac48:5e3e4c54:72177c53:bea3e41e name=mbpc:0
# cat /etc/mdadm.conf
ARRAY /dev/md/0 metadata=1.2 name=mbpc:0 UUID=2f36ac48:5e3e4c54:72177c53:bea3e41e
#

# cat /sys/block/md0/md/array_state
inactive
#

Try to stop it:

# mdadm -S /dev/md0
mdadm: stopped /dev/md0
# mdadm –detail /dev/md0
mdadm: cannot open /dev/md0: No such file or directory
#

Now we're getting somewhere and this time we're getting more useful data:

# mdadm –assemble –scan
mdadm: /dev/md/0 assembled from 4 drives – not enough to start the array while not clean – consider –force.
#

More strangeness and now all the devices in what's left are marked as spares (Now I start to worry):

# mdadm –assemble –scan –force
#
# ls -al /dev/md
md/  md0 
# ls -al /dev/md0
brw-rw—-. 1 root disk 9, 0 Apr 11 01:10 /dev/md0
# mdadm –detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdd[0](S) sda[5](S) sdb[3](S) sdc[2](S)
      4395422240 blocks super 1.2
      
unused devices: <none>
#

Ok.  So let's try to be more explicit and forceful here.  After all, we want our data badly folks:

# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 11 01:25 /dev/rsda -> sdd
lrwxrwxrwx. 1 root root 3 Apr 11 01:25 /dev/rsdc -> sdc
lrwxrwxrwx. 1 root root 3 Apr 11 01:25 /dev/rsdd -> sdb
lrwxrwxrwx. 1 root root 3 Apr 11 01:25 /dev/rsdf -> sda
#

# mdadm –detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
# mdadm -S /dev/md0
mdadm: stopped /dev/md0
# mdadm –detail /dev/md0
mdadm: cannot open /dev/md0: No such file or directory
# mdadm -v –assemble –scan –force /dev/md0 /dev/rsda/ dev/rsdc /dev/rsdd /dev/rsdf
mdadm: looking for devices for /dev/md0
mdadm: cannot open device /dev/dm-5: Device or resource busy
mdadm: cannot open device /dev/dm-4: Device or resource busy
mdadm: cannot open device /dev/dm-3: Device or resource busy
mdadm: no RAID superblock on /dev/dm-2
mdadm: cannot open device /dev/dm-1: Device or resource busy
mdadm: cannot open device /dev/dm-0: Device or resource busy
mdadm: cannot open device /dev/sde3: Device or resource busy
mdadm: cannot open device /dev/sde2: Device or resource busy
mdadm: cannot open device /dev/sde1: Device or resource busy
mdadm: cannot open device /dev/sde: Device or resource busy
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdb is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sda is identified as a member of /dev/md0, slot 5.
mdadm: Marking array /dev/md0 as 'clean'
mdadm: no uptodate device for slot 1 of /dev/md0
mdadm: added /dev/sdc to /dev/md0 as 2
mdadm: added /dev/sdb to /dev/md0 as 3
mdadm: no uptodate device for slot 4 of /dev/md0
mdadm: added /dev/sda to /dev/md0 as 5
mdadm: added /dev/sdd to /dev/md0 as 0
mdadm: /dev/md0 has been started with 4 drives (out of 6).
mdadm: /dev/rsda/ not identified in config file.
mdadm: dev/rsdc not identified in config file.
mdadm: /dev/rsdd not identified in config file.
mdadm: /dev/rsdf not identified in config file.
#

Now let's check the array:

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdd[0] sda[5] sdb[3] sdc[2]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U]
      bitmap: 5/8 pages [20KB], 65536KB chunk

unused devices: <none>
#

# mdadm –detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Mon Mar 26 00:06:24 2012
     Raid Level : raid6
     Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 4
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed Apr 11 00:38:51 2012
          State : active, degraded
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : mbpc:0  (local to host mbpc)
           UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
         Events : 4823

    Number   Major   Minor   RaidDevice State
       0       8       48        0      active sync   /dev/sdd
       1       0        0        1      removed
       2       8       32        2      active sync   /dev/sdc
       3       8       16        3      active sync   /dev/sdb
       4       0        0        4      removed
       5       8        0        5      active sync   /dev/sda
#

# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
ls: cannot access /dev/sdf: No such file or directory
ls: cannot access /dev/sdg: No such file or directory
ls: cannot access /dev/sdh: No such file or directory
ls: cannot access /dev/sdi: No such file or directory
/dev/sda: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdb: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdc: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdd: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sde: 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       –       0
#

Now we are back in the situation we were in reliability test step 5 above.  Before we push the drives we failed back again, let's start writing again and using some files like watching a home video again: 

# mount /mnt/MBPCBackupx/
mount: special device /dev/MBPCStorage/MBPCBackup does not exist
# lvm vgchange -a y
  3 logical volume(s) in volume group "VGEntertain" now active
  3 logical volume(s) in volume group "mbpcvg" now active
  1 logical volume(s) in volume group "MBPCStorage" now active
# mount /mnt/MBPCBackupx/
# cd /mnt/MBPCBackupx/
# df -m .
Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
                       2861348   2849239     12110 100% /mnt/MBPCBackupx
# ls -al |tail
-rwxr–r-x. 1 root      root       413368320 Apr 10 21:46 999.dat
-rwxr–r-x. 1 root      root       413368320 Apr  9 00:14 99.dat
-rwxr–r-x. 1 root      root       413368320 Apr  9 00:10 9.dat
-rwxr–r–. 1 root      root      4720553072 May  4  2011 raid6.0.dat
-rwxr–r–. 1 root      root      4720553072 Apr 10 23:57 raid6.1.dat
-rwxr–r–. 1 root      root      4720553072 May  4  2011 raid6.2.dat
-rwx——. 1 root      root      3757387776 Apr 11 00:38 raid6.3.dat
-rwxrwSrwx. 1 root      root       413368320 Jun 21  2008 test.dat
-rw-r–r–. 1 root      root         9984418 Apr  9 23:11 test.log
#

Again, file writing and reading was fine even with the two missing disks above.  Now it's time to replug the drives we took out and have the array reassemble. From here, the steps to do that would be identical to TEST 4 above this time starting further down as the devices would be already removed:

# mdadm –add /dev/raidmd0 /dev/rsdb;
# cat /proc/mdstat;
Wed Apr 11 02:16:40 EDT 2012
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdg[1] sdd[0] sda[5] sdb[3] sdc[2]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U]
      [>………………..]  recovery =  0.0% (0/976761408) finish=1017459.8min speed=0K/sec
      bitmap: 5/8 pages [20KB], 65536KB chunk

unused devices: <none>
#

This time the array did not recover very quickly even with an internal bitmap (after about 2 minutes) but still relatively quicker then the standard few hours:

# cat /proc/mdstat;
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdg[1] sdd[0] sda[5] sdb[3] sdc[2]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U]
      [=>……………….]  recovery =  6.2% (61041460/976761408) finish=1529.0min speed=9980K/sec
      bitmap: 5/8 pages [20KB], 65536KB chunk

unused devices: <none>
#

While this is going on, let's put back the second disk:

# mdadm –add /dev/raidmd0 /dev/rsde
mdadm: re-added /dev/rsde
# cat /proc/mdstat;
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdf[4](S) sdg[1] sdd[0] sda[5] sdb[3] sdc[2]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [U_UU_U]
      [==>………………]  recovery = 11.8% (115617408/976761408) finish=7203.2min speed=1992K/sec
      bitmap: 5/8 pages [20KB], 65536KB chunk

unused devices: <none>
#

# mdadm –detail /dev/raidmd0
/dev/raidmd0:
        Version : 1.2
  Creation Time : Mon Mar 26 00:06:24 2012
     Raid Level : raid6
     Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed Apr 11 02:23:29 2012
          State : active, degraded, recovering
 Active Devices : 4
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 2

         Layout : left-symmetric
     Chunk Size : 64K

 Rebuild Status : 12% complete

           Name : mbpc:0  (local to host mbpc)
           UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
         Events : 4882

    Number   Major   Minor   RaidDevice State
       0       8       48        0      active sync   /dev/sdd
       1       8       96        1      spare rebuilding   /dev/sdg
       2       8       32        2      active sync   /dev/sdc
       3       8       16        3      active sync   /dev/sdb
       4       0        0        4      removed
       5       8        0        5      active sync   /dev/sda

       4       8       80        –      spare   /dev/sdf
#

Once recovered, we'll check the parameters of the disks to ensure settings are correct and adjust otherwise using steps from PAGE 1 of this post:

# for dn in $(ls /dev/rsd*); do WCH=$(/sbin/hdparm -I $dn|grep -i "Write cache"); echo $dn": $WCH"; done
/dev/rsda:             Write cache
/dev/rsdb:        *    Write cache
/dev/rsdc:             Write cache
/dev/rsdd:             Write cache
/dev/rsde:        *    Write cache
/dev/rsdf:             Write cache
#

Disable the cache on the two readded disks (Again only if using XFS) as they would have it enabled by default when plugged back in:

# hdparm -W 0 /dev/rsdb;

/dev/rsdb:
 setting drive write-caching to 0 (off)
 write-caching =  0 (off)
# hdparm -W 0 /dev/rsde;

/dev/rsde:
 setting drive write-caching to 0 (off)
 write-caching =  0 (off)
#

 

6

This is a repeat of test #5 above I've done earlier, in fact, prior to doing test 5 actually.  This was also successful.  The setup is as follows:

Do the following all in sequence and in quick succession of each other:

  • Start writing large GB files repeatedly to the mount on which the RAID6 attay is on.
     
  • Unplug SATA cable from one drive.*                        # ( simulate )*
     
  • Unplug SATA cable from a second drive.*                    # ( simulate )*
     
  • Begin accessing one of the files on the RAID6 array.
     
  • Shut the Power Supply down using the on/off switch on the Power Supply.
     
  • For this test, 2TB RAID6 drive is ~< 2% full.

* Simulate a drive beign physically pulled by essentially cutting power to the device like this: echo 1 > /sys/block/<DEVICE>/device/delete

PASS (SECOND TEST):

First, let's check that the array is fine and no disks show errors or potential problems that could grow into problems for us:

# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
/dev/sda: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdb: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdc: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdd: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sde: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdf: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdg: 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       –       0
#

Again check with

# mdadm –detail /dev/raidmd0
/dev/raidmd0:
        Version : 1.2
  Creation Time : Mon Mar 26 00:06:24 2012
     Raid Level : raid6
     Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Mar 27 00:30:49 2012
          State : active
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : mbpc:0  (local to host mbpc)
           UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
         Events : 3063

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       64        1      active sync   /dev/sde
       2       8       48        2      active sync   /dev/sdd
       3       8       80        3      active sync   /dev/sdf
       4       8       16        4      active sync   /dev/sdb
       5       8        0        5      active sync   /dev/sda

to ensure everything is within an expected state and the array shows active with all disks participating in the RAID6 array showing as active sync. 
Next spin down / cut power to two of the disks:

echo 1 > /sys/block/sdc/device/delete
echo 1 > /sys/block/sda/device/delete

The result of the operation should show this:

# mdadm –detail /dev/raidmd0
.
.
          State : active, degraded
.
.
    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       64        1      active sync   /dev/sde
       2       8       48        2      active sync   /dev/sdd
       3       8       80        3      active sync   /dev/sdf
       4       8       16        4      active sync   /dev/sdb
       5       0        0        5      removed

Now we pull the plug while our write job to the disk is in progress writing 4.7GB files.  After the reboot, the system came back up but did not mount:

# mdadm –detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Mon Mar 26 00:06:24 2012
     Raid Level : raid6
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Tue Mar 27 00:36:15 2012
          State : active, degraded, Not Started
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : mbpc:0  (local to host mbpc)
           UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
         Events : 3132

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       64        1      active sync   /dev/sde
       2       8       48        2      active sync   /dev/sdd
       3       8       80        3      active sync   /dev/sdf
       4       8       16        4      active sync   /dev/sdb
       5       0        0        5      removed
#

So let's try to reassemble the array:

# mdadm –assemble –scan
mdadm: /dev/md/0 is already in use.
#

So we stop the array as before:

# mdadm –stop /dev/md0
mdadm: stopped /dev/md0
#

Now we're getting somewhere:

# mdadm –assemble –scan
mdadm: /dev/md/0 assembled from 4 drives – not enough to start the array while not clean – consider –force.
#

Looks like we may have to add the disks back first as per the first suggestion before we try to force (Relying on my experience here, force option to anything can sometimes have unintended consequences so I tend to use it as a method of last resort).  To do this, we look at our UDEV rules to tell us which disks we are to add:

# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsda -> sdc
lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsdb -> sde
lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsdc -> sdd
lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsdd -> sdf
lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsde -> sdb
lrwxrwxrwx. 1 root root 3 Mar 27 00:55 /dev/rsdf -> sda
#

But then I realize, I can't tell which disks I pulled (I can't tell from the outside either because all 6 are still cabled the same, just have their power cut / platters spun down from the OS where they don't appear now either as a result.):

No problem.  Let's try to start the array in degraded mode which is the way I would choose anyway as we're running a test here folks:

# mdadm –assemble –scan –force
mdadm: Marking array /dev/md/0 as 'clean'
mdadm: /dev/md/0 has been started with 4 drives (out of 6).
#

And check again to make sure we are clean:

# for ddn in $(ls /dev/sd{a,b,c,d,e,f,g,h,i}); do DDN=$(smartctl -A $ddn|grep -i Current_Pending_Sector); echo $ddn": $DDN"; done
/dev/sda: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdb: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdc: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdd: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sde: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdf: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       –       0
/dev/sdg: 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       –       0
#

Now before we reassemble, I want to see if we can mount our array:

# mount /mnt/MBPCBackupx/
mount: special device /dev/MBPCStorage/MBPCBackup does not exist
# lvm vgchange -a y
  3 logical volume(s) in volume group "VGEntertain" now active
  3 logical volume(s) in volume group "mbpcvg" now active
  1 logical volume(s) in volume group "MBPCStorage" now active
#

And try mounting now.  Success.  Now we can see the files with the original dates applied to the completed ones:

# ls -al
total 17499768
drwxr-xr-x. 2 root root        108 Mar 27 00:34 .
drwxr-xr-x. 6 root root       4096 Mar 24 23:53 ..
-rwxr–r–. 1 root root 4720553072 May  4  2011 testX0.m2ts
-rwxr–r–. 1 root root 4720553072 May  4  2011 testX1.m2ts
-rwx——. 1 root root 3219345408 Mar 27 00:35 testX2.m2ts
-rwxr–r–. 1 root root 4720553072 May  4  2011 testXseq.m2ts
#

# xfs_db -c frag -r /dev/MBPCStorage/MBPCBackup
actual 4, ideal 4, fragmentation factor 0.00%
#

# xfs_check /dev/MBPCStorage/MBPCBackup
#

I'm going to try to access a file off this disk by switching users to my videouser non privileged user account.  I want to see that the file is read fine and still workable before readding any disks.  This was a success as the video played fine.  Now to readd the disks:

# mdadm /dev/md0 -a /dev/rsda
mdadm: re-added /dev/rsda
#

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdc[0] sde[1] sdb[4] sdf[3] sdd[2]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/4] [_UUUU_]
      [=>……………….]  recovery =  6.3% (61590408/976761408) finish=334.2min speed=45632K/sec
      bitmap: 2/8 pages [8KB], 65536KB chunk

unused devices: <none>
#

Now because of our internal bitmap, the array started to rebuild very very quickly, which is good.  Now to add another drive while this is going on…too late…the resync was done before I could even blink:

# mdadm –detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Mon Mar 26 00:06:24 2012
     Raid Level : raid6
     Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 5
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Mar 27 01:16:43 2012
          State : active, degraded
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : mbpc:0  (local to host mbpc)
           UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
         Events : 3186

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       64        1      active sync   /dev/sde
       2       8       48        2      active sync   /dev/sdd
       3       8       80        3      active sync   /dev/sdf
       4       8       16        4      active sync   /dev/sdb
       5       0        0        5      removed
#

# mdadm /dev/md0 -a /dev/rsdf
mdadm: re-added /dev/rsdf
#

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sda[5] sdc[0] sde[1] sdb[4] sdf[3] sdd[2]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUUUU_]
      [=>……………….]  recovery =  5.5% (54287616/976761408) finish=324.2min speed=47413K/sec
      bitmap: 2/8 pages [8KB], 65536KB chunk

unused devices: <none>
#

And the rebuilding jumps by leaps and bounds and is done in seconds:

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sda[5] sdc[0] sde[1] sdb[4] sdf[3] sdd[2]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/5] [UUUUU_]
      [====>…………….]  recovery = 24.9% (244180084/976761408) finish=277.1min speed=44047K/sec
      bitmap: 2/8 pages [8KB], 65536KB chunk

unused devices: <none>
#

And resync of the second disk is done before you know it:

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sda[5] sdc[0] sde[1] sdb[4] sdf[3] sdd[2]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      bitmap: 0/8 pages [0KB], 65536KB chunk

unused devices: <none>
#

And now our array is back up.  NOTE: We've added the disks all the while rebuilding the array when mounted under /mnt/MBPCBackupx:

# mdadm –detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Mon Mar 26 00:06:24 2012
     Raid Level : raid6
     Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Mar 27 01:20:05 2012
          State : active
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : mbpc:0  (local to host mbpc)
           UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
         Events : 3210

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       64        1      active sync   /dev/sde
       2       8       48        2      active sync   /dev/sdd
       3       8       80        3      active sync   /dev/sdf
       4       8       16        4      active sync   /dev/sdb
       5       8        0        5      active sync   /dev/sda
#

And this concludes successful AVAILABILITY TEST #6.

7 Unmount the array off of /mnt/MBPCBackupx, stop it, spin down all disks then reassemble the array in the same hard disk order and still ensure you can see the data.  Before adding each disk back, power down each disk then bring it up using commands above or on the first page of this post.

PASS:

For this test, we run the following commands to basically disassemble the array and then try to reassemble it (Assuming the RAID6 is mounted on MBPCBackupx and under /dev/raidmd0 or /dev/md0):

# du -sh .
2.8T    .
# pwd
/mnt/MBPCBackupx
# cd ..
# umount MBPCBackupx
# lvm lvs
  LV         VG          Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  MBPCBackup MBPCStorage -wi-a–   2.73t
.
.
.
# lvm vgs
  VG          #PV #LV #SN Attr   VSize   VFree 
  MBPCStorage   1   1   0 wz–n-   3.64t 931.70g
.
.
# ls -al /dev/MBPCStorage/MBPCBackup
lrwxrwxrwx. 1 root root 7 Apr 15 17:51 /dev/MBPCStorage/MBPCBackup -> ../dm-6
# lvm vgchange -a n MBPCStorage
  0 logical volume(s) in volume group "MBPCStorage" now active
# lvm lvs
  LV         VG          Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  MBPCBackup MBPCStorage -wi-–   2.73t
.
.
.
# ls -al /dev/MBPCStorage/MBPCBackup
ls: cannot access /dev/MBPCStorage/MBPCBackup: No such file or directory
# mdadm –stop /dev/md0
mdadm: stopped /dev/md0
# ls -al /dev/rsd*
ls: cannot access /dev/rsd*: No such file or directory
# udevadm trigger 2>&1
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsda -> sdf
lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsdb -> sdb
lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsdc -> sdd
lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsdd -> sdc
lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsde -> sde
lrwxrwxrwx. 1 root root 3 Apr 15 17:54 /dev/rsdf -> sda
# mdadm –stop /dev/md0
mdadm: stopped /dev/md0
# ls -al /dev/md0
ls: cannot access /dev/md0: No such file or directory
# ls -al /dev/raidmd0
ls: cannot access /dev/raidmd0: No such file or directory
#
# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4]
unused devices: <none>
#

And now we can power down / spin down each disk that corresponds to the actual rsdX number above:

echo 1 > /sys/block/sda/device/delete
echo 1 > /sys/block/sdb/device/delete
echo 1 > /sys/block/sdc/device/delete
echo 1 > /sys/block/sdd/device/delete
echo 1 > /sys/block/sde/device/delete
echo 1 > /sys/block/sdf/device/delete

(You should hear clicking sounds of the heads returning to their spot as well as the subtle sound of the disk spinning down much like when you shut down the system.)  Allow for a minute to let the disks spin down.  At this point I'll unplug the SATA cables and replug them back in.  I expect that when I reassemble my array, all my files will be there. 

/var/log/messages
Apr 15 17:54:44 mbpc kernel: md0: detected capacity change from 4000814727168 to 0
Apr 15 17:54:44 mbpc kernel: md: md0 stopped.
Apr 15 17:54:44 mbpc kernel: md: unbind<sdd>
Apr 15 17:54:44 mbpc kernel: md: export_rdev(sdd)
Apr 15 17:54:44 mbpc kernel: md: unbind<sdb>
Apr 15 17:54:44 mbpc kernel: md: export_rdev(sdb)
Apr 15 17:54:44 mbpc kernel: md: unbind<sde>
Apr 15 17:54:44 mbpc kernel: md: export_rdev(sde)
Apr 15 17:54:44 mbpc kernel: md: unbind<sda>
Apr 15 17:54:44 mbpc kernel: md: export_rdev(sda)
Apr 15 17:54:44 mbpc kernel: md: unbind<sdc>
Apr 15 17:54:44 mbpc kernel: md: export_rdev(sdc)
Apr 15 17:54:44 mbpc kernel: md: unbind<sdf>
Apr 15 17:54:44 mbpc kernel: md: export_rdev(sdf)
Apr 15 18:26:34 mbpc kernel: sd 0:0:0:0: [sda] Stopping disk
Apr 15 18:26:35 mbpc kernel: ata1.00: disabled
Apr 15 18:26:43 mbpc kernel: sd 1:0:0:0: [sdb] Stopping disk
Apr 15 18:26:44 mbpc kernel: ata2.00: disabled
Apr 15 18:27:14 mbpc kernel: sd 2:0:0:0: [sdc] Stopping disk
Apr 15 18:27:14 mbpc kernel: ata3.00: disabled
Apr 15 18:27:21 mbpc kernel: sd 3:0:0:0: [sdd] Stopping disk
Apr 15 18:27:21 mbpc kernel: ata4.00: disabled
Apr 15 18:27:27 mbpc kernel: sd 4:0:0:0: [sde] Stopping disk
Apr 15 18:27:28 mbpc kernel: ata5.00: disabled
Apr 15 18:27:33 mbpc kernel: sd 5:0:0:0: [sdf] Stopping disk
Apr 15 18:27:34 mbpc kernel: ata6.00: disabled
Apr 15 18:30:00 mbpc kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Apr 15 18:30:00 mbpc kernel: ata3: irq_stat 0x00400000, PHY RDY changed
Apr 15 18:30:00 mbpc kernel: ata3: SError: { RecovComm Persist PHYRdyChg 10B8B }
Apr 15 18:30:00 mbpc kernel: ata3: hard resetting link
Apr 15 18:30:00 mbpc kernel: ata3: SATA link down (SStatus 0 SControl 300)
Apr 15 18:30:00 mbpc kernel: ata3: EH complete
Apr 15 18:32:01 mbpc kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Apr 15 18:32:01 mbpc kernel: ata5: irq_stat 0x00400000, PHY RDY changed
Apr 15 18:32:01 mbpc kernel: ata5: SError: { RecovComm Persist PHYRdyChg 10B8B }
Apr 15 18:32:01 mbpc kernel: ata5: hard resetting link
Apr 15 18:32:02 mbpc kernel: ata5: SATA link down (SStatus 0 SControl 300)
Apr 15 18:32:02 mbpc kernel: ata5: EH complete
Apr 15 18:32:12 mbpc kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Apr 15 18:32:12 mbpc kernel: ata4: irq_stat 0x00400000, PHY RDY changed
Apr 15 18:32:12 mbpc kernel: ata4: SError: { RecovComm Persist PHYRdyChg 10B8B }
Apr 15 18:32:12 mbpc kernel: ata4: hard resetting link
Apr 15 18:32:12 mbpc kernel: ata4: SATA link down (SStatus 0 SControl 300)
Apr 15 18:32:12 mbpc kernel: ata4: EH complete
Apr 15 18:32:18 mbpc kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Apr 15 18:32:18 mbpc kernel: ata6: irq_stat 0x00400000, PHY RDY changed
Apr 15 18:32:18 mbpc kernel: ata6: SError: { RecovComm Persist PHYRdyChg 10B8B }
Apr 15 18:32:18 mbpc kernel: ata6: hard resetting link
Apr 15 18:32:19 mbpc kernel: ata6: SATA link down (SStatus 0 SControl 300)
Apr 15 18:32:19 mbpc kernel: ata6: EH complete
Apr 15 18:32:22 mbpc kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Apr 15 18:32:22 mbpc kernel: ata2: irq_stat 0x00400000, PHY RDY changed
Apr 15 18:32:22 mbpc kernel: ata2: SError: { RecovComm Persist PHYRdyChg 10B8B }
Apr 15 18:32:22 mbpc kernel: ata2: hard resetting link
Apr 15 18:32:23 mbpc kernel: ata2: SATA link down (SStatus 0 SControl 300)
Apr 15 18:32:23 mbpc kernel: ata2: EH complete
Apr 15 18:32:26 mbpc kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Apr 15 18:32:26 mbpc kernel: ata1: irq_stat 0x00400000, PHY RDY changed
Apr 15 18:32:26 mbpc kernel: ata1: SError: { RecovComm Persist PHYRdyChg 10B8B }
Apr 15 18:32:26 mbpc kernel: ata1: hard resetting link
Apr 15 18:32:26 mbpc kernel: ata1: SATA link down (SStatus 0 SControl 300)
Apr 15 18:32:26 mbpc kernel: ata1: EH complete

 

The smartd daemon might complain as well, but this is normal in our case:

/var/log/messages
Apr 15 18:33:25 mbpc smartd[2325]: Device: /dev/sda [SAT], open() failed: No such device
Apr 15 18:33:25 mbpc smartd[2325]: Sending warning via mail to root …
Apr 15 18:33:25 mbpc smartd[2325]: Warning via mail to root: successful
Apr 15 18:33:25 mbpc smartd[2325]: Device: /dev/sdb [SAT], open() failed: No such device
Apr 15 18:33:25 mbpc smartd[2325]: Sending warning via mail to root …
Apr 15 18:33:25 mbpc smartd[2325]: Warning via mail to root: successful
Apr 15 18:33:25 mbpc smartd[2325]: Device: /dev/sdc [SAT], open() failed: No such device
Apr 15 18:33:25 mbpc smartd[2325]: Sending warning via mail to root …
Apr 15 18:33:26 mbpc smartd[2325]: Warning via mail to root: successful
Apr 15 18:33:26 mbpc smartd[2325]: Device: /dev/sdd [SAT], open() failed: No such device
Apr 15 18:33:26 mbpc smartd[2325]: Sending warning via mail to root …
Apr 15 18:33:26 mbpc smartd[2325]: Warning via mail to root: successful
Apr 15 18:33:26 mbpc smartd[2325]: Device: /dev/sde [SAT], open() failed: No such device
Apr 15 18:33:26 mbpc smartd[2325]: Sending warning via mail to root …
Apr 15 18:33:26 mbpc smartd[2325]: Warning via mail to root: successful
Apr 15 18:33:26 mbpc smartd[2325]: Device: /dev/sdf [SAT], open() failed: No such device
Apr 15 18:33:26 mbpc smartd[2325]: Sending warning via mail to root …
Apr 15 18:33:26 mbpc smartd[2325]: Warning via mail to root: successful

 

Now let's plug everything back in again.  So here we go folks (Hopefully you didn't forget the cable sequence).  Now the interesting thing was that once we replugged all the SATA cables back in, the array reassembled itself without having to do so manually!:

/var/log/messages
Apr 15 18:36:19 mbpc kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen
Apr 15 18:36:19 mbpc kernel: ata6: irq_stat 0x00000040, connection status changed
Apr 15 18:36:19 mbpc kernel: ata6: SError: { CommWake 10B8B DevExch }
Apr 15 18:36:19 mbpc kernel: ata6: hard resetting link
Apr 15 18:36:20 mbpc kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 18:36:20 mbpc kernel: ata6.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 18:36:20 mbpc kernel: ata6.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 18:36:20 mbpc kernel: ata6.00: configured for UDMA/133
Apr 15 18:36:20 mbpc kernel: ata6: EH complete
Apr 15 18:36:20 mbpc kernel: scsi 5:0:0:0: Direct-Access     ATA      ST31000520AS     CC32 PQ: 0 ANSI: 5
Apr 15 18:36:20 mbpc kernel: sd 5:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 18:36:20 mbpc kernel: sd 5:0:0:0: [sda] Write Protect is off
Apr 15 18:36:20 mbpc kernel: sd 5:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 18:36:20 mbpc kernel: sd 5:0:0:0: Attached scsi generic sg0 type 0
Apr 15 18:36:25 mbpc kernel: sda: unknown partition table
Apr 15 18:36:25 mbpc kernel: sd 5:0:0:0: [sda] Attached SCSI disk
Apr 15 18:36:25 mbpc kernel: md: bind<sda>
Apr 15 18:36:26 mbpc kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
Apr 15 18:36:26 mbpc kernel: ata5: irq_stat 0x00000040, connection status changed
Apr 15 18:36:26 mbpc kernel: ata5: SError: { DevExch }
Apr 15 18:36:26 mbpc kernel: ata5: hard resetting link
Apr 15 18:36:27 mbpc kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 18:36:27 mbpc kernel: ata5.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 18:36:27 mbpc kernel: ata5.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 18:36:27 mbpc kernel: ata5.00: configured for UDMA/133
Apr 15 18:36:27 mbpc kernel: ata5: EH complete
Apr 15 18:36:27 mbpc kernel: scsi 4:0:0:0: Direct-Access     ATA      ST31000520AS     CC32 PQ: 0 ANSI: 5
Apr 15 18:36:27 mbpc kernel: sd 4:0:0:0: Attached scsi generic sg1 type 0
Apr 15 18:36:27 mbpc kernel: sd 4:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 18:36:27 mbpc kernel: sd 4:0:0:0: [sdb] Write Protect is off
Apr 15 18:36:27 mbpc kernel: sd 4:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 18:36:32 mbpc kernel: sdb: unknown partition table
Apr 15 18:36:32 mbpc kernel: sd 4:0:0:0: [sdb] Attached SCSI disk
Apr 15 18:36:32 mbpc kernel: md: bind<sdb>
Apr 15 18:36:34 mbpc kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
Apr 15 18:36:34 mbpc kernel: ata4: irq_stat 0x00000040, connection status changed
Apr 15 18:36:34 mbpc kernel: ata4: SError: { DevExch }
Apr 15 18:36:34 mbpc kernel: ata4: hard resetting link
Apr 15 18:36:35 mbpc kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 18:36:35 mbpc kernel: ata4.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 18:36:35 mbpc kernel: ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 18:36:35 mbpc kernel: ata4.00: configured for UDMA/133
Apr 15 18:36:35 mbpc kernel: ata4: EH complete
Apr 15 18:36:35 mbpc kernel: scsi 3:0:0:0: Direct-Access     ATA      ST31000520AS     CC32 PQ: 0 ANSI: 5
Apr 15 18:36:35 mbpc kernel: sd 3:0:0:0: Attached scsi generic sg2 type 0
Apr 15 18:36:35 mbpc kernel: sd 3:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 18:36:35 mbpc kernel: sd 3:0:0:0: [sdc] Write Protect is off
Apr 15 18:36:35 mbpc kernel: sd 3:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 18:36:40 mbpc kernel: sdc: unknown partition table
Apr 15 18:36:40 mbpc kernel: sd 3:0:0:0: [sdc] Attached SCSI disk
Apr 15 18:36:40 mbpc kernel: md: bind<sdc>
Apr 15 18:36:42 mbpc kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen
Apr 15 18:36:42 mbpc kernel: ata3: irq_stat 0x00000040, connection status changed
Apr 15 18:36:42 mbpc kernel: ata3: SError: { CommWake 10B8B DevExch }
Apr 15 18:36:42 mbpc kernel: ata3: hard resetting link
Apr 15 18:36:43 mbpc kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 18:36:43 mbpc kernel: ata3.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 18:36:43 mbpc kernel: ata3.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 18:36:43 mbpc kernel: ata3.00: configured for UDMA/133
Apr 15 18:36:43 mbpc kernel: ata3: EH complete
Apr 15 18:36:43 mbpc kernel: scsi 2:0:0:0: Direct-Access     ATA      ST31000520AS     CC32 PQ: 0 ANSI: 5
Apr 15 18:36:43 mbpc kernel: sd 2:0:0:0: Attached scsi generic sg3 type 0
Apr 15 18:36:43 mbpc kernel: sd 2:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 18:36:43 mbpc kernel: sd 2:0:0:0: [sdd] Write Protect is off
Apr 15 18:36:43 mbpc kernel: sd 2:0:0:0: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 18:36:48 mbpc kernel: sdd: unknown partition table
Apr 15 18:36:48 mbpc kernel: sd 2:0:0:0: [sdd] Attached SCSI disk
Apr 15 18:36:48 mbpc kernel: md: bind<sdd>
Apr 15 18:36:48 mbpc kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x4080000 action 0xe frozen
Apr 15 18:36:48 mbpc kernel: ata2: irq_stat 0x00000040, connection status changed
Apr 15 18:36:48 mbpc kernel: ata2: SError: { 10B8B DevExch }
Apr 15 18:36:48 mbpc kernel: ata2: hard resetting link
Apr 15 18:36:49 mbpc kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 18:36:49 mbpc kernel: ata2.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 18:36:49 mbpc kernel: ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 18:36:49 mbpc kernel: ata2.00: configured for UDMA/133
Apr 15 18:36:49 mbpc kernel: ata2: EH complete
Apr 15 18:36:49 mbpc kernel: scsi 1:0:0:0: Direct-Access     ATA      ST31000520AS     CC32 PQ: 0 ANSI: 5
Apr 15 18:36:49 mbpc kernel: sd 1:0:0:0: Attached scsi generic sg4 type 0
Apr 15 18:36:49 mbpc kernel: sd 1:0:0:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 18:36:49 mbpc kernel: sd 1:0:0:0: [sde] Write Protect is off
Apr 15 18:36:49 mbpc kernel: sd 1:0:0:0: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 18:36:54 mbpc kernel: sde: unknown partition table
Apr 15 18:36:54 mbpc kernel: sd 1:0:0:0: [sde] Attached SCSI disk
Apr 15 18:36:55 mbpc kernel: md: bind<sde>
Apr 15 18:36:56 mbpc kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4080000 action 0xe frozen
Apr 15 18:36:56 mbpc kernel: ata1: irq_stat 0x00000040, connection status changed
Apr 15 18:36:56 mbpc kernel: ata1: SError: { 10B8B DevExch }
Apr 15 18:36:56 mbpc kernel: ata1: hard resetting link
Apr 15 18:36:57 mbpc kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr 15 18:36:57 mbpc kernel: ata1.00: ATA-8: ST1500DL003-9VT16L, CC3C, max UDMA/133
Apr 15 18:36:57 mbpc kernel: ata1.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 18:36:57 mbpc kernel: ata1.00: configured for UDMA/133
Apr 15 18:36:57 mbpc kernel: ata1: EH complete
Apr 15 18:36:57 mbpc kernel: scsi 0:0:0:0: Direct-Access     ATA      ST1500DL003-9VT1 CC3C PQ: 0 ANSI: 5
Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: Attached scsi generic sg5 type 0
Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: [sdf] 2930277168 512-byte logical blocks: (1.50 TB/1.36 TiB)
Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: [sdf] 4096-byte physical blocks
Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: [sdf] Write Protect is off
Apr 15 18:36:57 mbpc kernel: sd 0:0:0:0: [sdf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 18:37:04 mbpc kernel: sdf: unknown partition table
Apr 15 18:37:04 mbpc kernel: sd 0:0:0:0: [sdf] Attached SCSI disk
Apr 15 18:37:04 mbpc kernel: md: bind<sdf>
Apr 15 18:37:04 mbpc kernel: bio: create slab <bio-1> at 1
Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sdf operational as raid disk 5
Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sde operational as raid disk 1
Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sdd operational as raid disk 0
Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sdc operational as raid disk 2
Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sdb operational as raid disk 4
Apr 15 18:37:04 mbpc kernel: md/raid:md0: device sda operational as raid disk 3
Apr 15 18:37:04 mbpc kernel: md/raid:md0: allocated 6386kB
Apr 15 18:37:04 mbpc kernel: md/raid:md0: raid level 6 active with 6 out of 6 devices, algorithm 2
Apr 15 18:37:04 mbpc kernel: created bitmap (8 pages) for device md0
Apr 15 18:37:04 mbpc kernel: md0: bitmap initialized from disk: read 1/1 pages, set 0 of 14905 bits
Apr 15 18:37:04 mbpc kernel: md0: detected capacity change from 0 to 4000814727168

Apr 15 18:37:04 mbpc kernel: md0: unknown partition table

And let's check with the standard mdadm tools:

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdf[5] sde[1] sdd[0] sdc[2] sdb[4] sda[3]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      bitmap: 0/8 pages [0KB], 65536KB chunk

unused devices: <none>
#

# ls -al /dev/raidmd0
lrwxrwxrwx. 1 root root 3 Apr 15 18:37 /dev/raidmd0 -> md0
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsda -> sdd
lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsdb -> sde
lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsdc -> sdc
lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsdd -> sda
lrwxrwxrwx. 1 root root 3 Apr 15 18:36 /dev/rsde -> sdb
lrwxrwxrwx. 1 root root 3 Apr 15 18:37 /dev/rsdf -> sdf
#

# mdadm –detail /dev/raidmd0
/dev/raidmd0:
        Version : 1.2
  Creation Time : Mon Mar 26 00:06:24 2012
     Raid Level : raid6
     Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sun Apr 15 17:52:23 2012
          State : active
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : mbpc:0  (local to host mbpc)
           UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
         Events : 5171

    Number   Major   Minor   RaidDevice State
       0       8       48        0      active sync   /dev/sdd
       1       8       64        1      active sync   /dev/sde
       2       8       32        2      active sync   /dev/sdc
       3       8        0        3      active sync   /dev/sda
       4       8       16        4      active sync   /dev/sdb
       5       8       80        5      active sync   /dev/sdf
#

Wow!  Now that, I do have to say I didn't expect to have happened all on it's own like that.  Now let's try to set the VG active again and mount the array back up and check the status of our files:

# lvm vgs
  VG          #PV #LV #SN Attr   VSize   VFree 
  MBPCStorage   1   1   0 wz–n-   3.64t 931.70g
# lvm lvs
  LV         VG          Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  MBPCBackup MBPCStorage -wi-–   2.73t                                     
# lvm vgchange -a y MBPCStorage
  1 logical volume(s) in volume group "MBPCStorage" now active
# lvm lvs
  LV         VG          Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  MBPCBackup MBPCStorage -wi-a–   2.73t
# lvm vgs
  VG          #PV #LV #SN Attr   VSize   VFree 
  MBPCStorage   1   1   0 wz–n-   3.64t 931.70g
#

And now for the mount:

# mount /mnt/MBPCBackupx/
# mount|grep MBPC
/dev/mapper/MBPCStorage-MBPCBackup on /mnt/MBPCBackupx type xfs (rw,noatime,nodiratime,logbufs=8,allocsize=512m)
# cd /mnt/MBPCBackup
bash: cd: /mnt/MBPCBackup: No such file or directory
# cd /mnt/MBPCBackupx/
# du -sh .
2.8T    .
# df -m .
Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
                       2861348   2849239     12110 100% /mnt/MBPCBackupx
#

This test is definitely a success!

8 In this test, we'll repeat TEST #7 however reassemble in a different order then how it was originally assembled.  The UDEV rules we've created earlier will be key here. 

PASS:

Because the only thing that is different is how we replug the SATA cables, everything will be the same as in AVAILABILITY TEST #7 above with the exception that we'll replug the SATA cables in random order then what we originally had.  The result:

/var/log/messages
Apr 15 22:33:02 mbpc kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x4080000 action 0xe frozen
Apr 15 22:33:02 mbpc kernel: ata6: irq_stat 0x00000040, connection status changed
Apr 15 22:33:02 mbpc kernel: ata6: SError: { 10B8B DevExch }
Apr 15 22:33:02 mbpc kernel: ata6: hard resetting link
Apr 15 22:33:03 mbpc kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 22:33:03 mbpc kernel: ata6.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 22:33:03 mbpc kernel: ata6.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 22:33:03 mbpc kernel: ata6.00: configured for UDMA/133
Apr 15 22:33:03 mbpc kernel: ata6: EH complete
Apr 15 22:33:03 mbpc kernel: scsi 5:0:0:0: Direct-Access     ATA      ST31000520AS     CC32 PQ: 0 ANSI: 5
Apr 15 22:33:03 mbpc kernel: sd 5:0:0:0: Attached scsi generic sg0 type 0
Apr 15 22:33:03 mbpc kernel: sd 5:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 22:33:03 mbpc kernel: sd 5:0:0:0: [sda] Write Protect is off
Apr 15 22:33:03 mbpc kernel: sd 5:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 22:33:08 mbpc kernel: sda: unknown partition table
Apr 15 22:33:08 mbpc kernel: sd 5:0:0:0: [sda] Attached SCSI disk
Apr 15 22:33:08 mbpc kernel: md: bind<sda>
Apr 15 22:33:10 mbpc kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen
Apr 15 22:33:10 mbpc kernel: ata4: irq_stat 0x00000040, connection status changed
Apr 15 22:33:10 mbpc kernel: ata4: SError: { CommWake 10B8B DevExch }
Apr 15 22:33:10 mbpc kernel: ata4: hard resetting link
Apr 15 22:33:10 mbpc kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 22:33:10 mbpc kernel: ata4.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 22:33:10 mbpc kernel: ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 22:33:10 mbpc kernel: ata4.00: configured for UDMA/133
Apr 15 22:33:10 mbpc kernel: ata4: EH complete
Apr 15 22:33:10 mbpc kernel: scsi 3:0:0:0: Direct-Access     ATA      ST31000520AS     CC32 PQ: 0 ANSI: 5
Apr 15 22:33:10 mbpc kernel: sd 3:0:0:0: Attached scsi generic sg1 type 0
Apr 15 22:33:10 mbpc kernel: sd 3:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 22:33:10 mbpc kernel: sd 3:0:0:0: [sdb] Write Protect is off
Apr 15 22:33:10 mbpc kernel: sd 3:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 22:33:16 mbpc kernel: sdb: unknown partition table
Apr 15 22:33:16 mbpc kernel: sd 3:0:0:0: [sdb] Attached SCSI disk
Apr 15 22:33:16 mbpc kernel: md: bind<sdb>
Apr 15 22:33:20 mbpc kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
Apr 15 22:33:20 mbpc kernel: ata2: irq_stat 0x00000040, connection status changed
Apr 15 22:33:20 mbpc kernel: ata2: SError: { DevExch }
Apr 15 22:33:20 mbpc kernel: ata2: hard resetting link
Apr 15 22:33:21 mbpc kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 22:33:21 mbpc kernel: ata2.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 22:33:21 mbpc kernel: ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 22:33:21 mbpc kernel: ata2.00: configured for UDMA/133
Apr 15 22:33:21 mbpc kernel: ata2: EH complete
Apr 15 22:33:21 mbpc kernel: scsi 1:0:0:0: Direct-Access     ATA      ST31000520AS     CC32 PQ: 0 ANSI: 5
Apr 15 22:33:21 mbpc kernel: sd 1:0:0:0: Attached scsi generic sg2 type 0
Apr 15 22:33:21 mbpc kernel: sd 1:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 22:33:21 mbpc kernel: sd 1:0:0:0: [sdc] Write Protect is off
Apr 15 22:33:21 mbpc kernel: sd 1:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 22:33:25 mbpc smartd[2325]: Device: /dev/sdc [SAT], open() failed: Permission denied
Apr 15 22:33:25 mbpc smartd[2325]: Device: /dev/sdd [SAT], open() failed: No such device
Apr 15 22:33:25 mbpc smartd[2325]: Device: /dev/sde [SAT], open() failed: No such device
Apr 15 22:33:25 mbpc smartd[2325]: Device: /dev/sdf [SAT], open() failed: No such device
Apr 15 22:33:26 mbpc kernel: sdc: unknown partition table
Apr 15 22:33:26 mbpc kernel: sd 1:0:0:0: [sdc] Attached SCSI disk
Apr 15 22:33:27 mbpc kernel: md: bind<sdc>
Apr 15 22:33:44 mbpc kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen
Apr 15 22:33:44 mbpc kernel: ata5: irq_stat 0x00000040, connection status changed
Apr 15 22:33:44 mbpc kernel: ata5: SError: { CommWake 10B8B DevExch }
Apr 15 22:33:44 mbpc kernel: ata5: hard resetting link
Apr 15 22:33:49 mbpc kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 22:33:49 mbpc kernel: ata5: link online but 1 devices misclassified, retrying
Apr 15 22:33:49 mbpc kernel: ata5: reset failed (errno=-11), retrying in 5 secs
Apr 15 22:33:54 mbpc kernel: ata5: hard resetting link
Apr 15 22:33:54 mbpc kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 22:33:54 mbpc kernel: ata5.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 22:33:54 mbpc kernel: ata5.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 22:33:54 mbpc kernel: ata5.00: configured for UDMA/133
Apr 15 22:33:54 mbpc kernel: ata5: EH complete
Apr 15 22:33:54 mbpc kernel: scsi 4:0:0:0: Direct-Access     ATA      ST31000520AS     CC32 PQ: 0 ANSI: 5
Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: Attached scsi generic sg3 type 0
Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: [sdd] Write Protect is off
Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 22:33:54 mbpc kernel: sdd: unknown partition table
Apr 15 22:33:54 mbpc kernel: sd 4:0:0:0: [sdd] Attached SCSI disk
Apr 15 22:33:55 mbpc kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x40c0000 action 0xe frozen
Apr 15 22:33:55 mbpc kernel: ata3: irq_stat 0x00000040, connection status changed
Apr 15 22:33:55 mbpc kernel: ata3: SError: { CommWake 10B8B DevExch }
Apr 15 22:33:55 mbpc kernel: ata3: hard resetting link
Apr 15 22:33:55 mbpc kernel: md: bind<sdd>
Apr 15 22:33:55 mbpc kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 15 22:33:55 mbpc kernel: ata3.00: ATA-8: ST31000520AS, CC32, max UDMA/133
Apr 15 22:33:55 mbpc kernel: ata3.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 22:33:55 mbpc kernel: ata3.00: configured for UDMA/133
Apr 15 22:33:55 mbpc kernel: ata3: EH complete
Apr 15 22:33:55 mbpc kernel: scsi 2:0:0:0: Direct-Access     ATA      ST31000520AS     CC32 PQ: 0 ANSI: 5
Apr 15 22:33:55 mbpc kernel: sd 2:0:0:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Apr 15 22:33:55 mbpc kernel: sd 2:0:0:0: [sde] Write Protect is off
Apr 15 22:33:55 mbpc kernel: sd 2:0:0:0: Attached scsi generic sg4 type 0
Apr 15 22:33:55 mbpc kernel: sd 2:0:0:0: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 22:34:01 mbpc kernel: sde: unknown partition table
Apr 15 22:34:01 mbpc kernel: sd 2:0:0:0: [sde] Attached SCSI disk
Apr 15 22:34:01 mbpc kernel: md: bind<sde>
Apr 15 22:34:10 mbpc kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
Apr 15 22:34:10 mbpc kernel: ata1: irq_stat 0x00000040, connection status changed
Apr 15 22:34:10 mbpc kernel: ata1: SError: { DevExch }
Apr 15 22:34:10 mbpc kernel: ata1: hard resetting link
Apr 15 22:34:11 mbpc kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr 15 22:34:11 mbpc kernel: ata1.00: ATA-8: ST1500DL003-9VT16L, CC3C, max UDMA/133
Apr 15 22:34:11 mbpc kernel: ata1.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Apr 15 22:34:11 mbpc kernel: ata1.00: configured for UDMA/133
Apr 15 22:34:11 mbpc kernel: ata1: EH complete
Apr 15 22:34:11 mbpc kernel: scsi 0:0:0:0: Direct-Access     ATA      ST1500DL003-9VT1 CC3C PQ: 0 ANSI: 5
Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: Attached scsi generic sg5 type 0
Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: [sdf] 2930277168 512-byte logical blocks: (1.50 TB/1.36 TiB)
Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: [sdf] 4096-byte physical blocks
Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: [sdf] Write Protect is off
Apr 15 22:34:11 mbpc kernel: sd 0:0:0:0: [sdf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 15 22:34:18 mbpc kernel: sdf: unknown partition table
Apr 15 22:34:18 mbpc kernel: sd 0:0:0:0: [sdf] Attached SCSI disk
Apr 15 22:34:19 mbpc kernel: md: bind<sdf>
Apr 15 22:34:19 mbpc kernel: bio: create slab <bio-1> at 1
Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sdf operational as raid disk 5
Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sde operational as raid disk 1
Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sdd operational as raid disk 0
Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sdc operational as raid disk 2
Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sdb operational as raid disk 3
Apr 15 22:34:19 mbpc kernel: md/raid:md0: device sda operational as raid disk 4
Apr 15 22:34:19 mbpc kernel: md/raid:md0: allocated 6386kB
Apr 15 22:34:19 mbpc kernel: md/raid:md0: raid level 6 active with 6 out of 6 devices, algorithm 2
Apr 15 22:34:19 mbpc kernel: created bitmap (8 pages) for device md0
Apr 15 22:34:19 mbpc kernel: md0: bitmap initialized from disk: read 1/1 pages, set 0 of 14905 bits
Apr 15 22:34:19 mbpc kernel: md0: detected capacity change from 0 to 4000814727168
Apr 15 22:34:19 mbpc kernel: md0: unknown partition table

And through the standard utilities:

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdf[5] sde[1] sdd[0] sdc[2] sdb[3] sda[4]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      bitmap: 0/8 pages [0KB], 65536KB chunk

unused devices: <none>
# mdadm –detail /dev/raidmd0
/dev/raidmd0:
        Version : 1.2
  Creation Time : Mon Mar 26 00:06:24 2012
     Raid Level : raid6
     Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sun Apr 15 21:44:49 2012
          State : active
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : mbpc:0  (local to host mbpc)
           UUID : 2f36ac48:5e3e4c54:72177c53:bea3e41e
         Events : 5171

    Number   Major   Minor   RaidDevice State
       0       8       48        0      active sync   /dev/sdd
       1       8       64        1      active sync   /dev/sde
       2       8       32        2      active sync   /dev/sdc
       3       8       16        3      active sync   /dev/sdb
       4       8        0        4      active sync   /dev/sda
       5       8       80        5      active sync   /dev/sdf
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 15 22:33 /dev/rsda -> sdd
lrwxrwxrwx. 1 root root 3 Apr 15 22:34 /dev/rsdb -> sde
lrwxrwxrwx. 1 root root 3 Apr 15 22:33 /dev/rsdc -> sdc
lrwxrwxrwx. 1 root root 3 Apr 15 22:33 /dev/rsdd -> sdb
lrwxrwxrwx. 1 root root 3 Apr 15 22:33 /dev/rsde -> sda
lrwxrwxrwx. 1 root root 3 Apr 15 22:34 /dev/rsdf -> sdf
# ls -al /dev/raidmd0
lrwxrwxrwx. 1 root root 3 Apr 15 22:34 /dev/raidmd0 -> md0
#

# df -m .
Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/mapper/MBPCStorage-MBPCBackup
                       2861348   2849239     12110 100% /mnt/MBPCBackupx
#

Again, another successful test.

9 Repeat TEST # 7 above however don't unplug and plug the SDD's back in but scan the bus instead.  This will be a lighter test of #7 and #8 above.

The difference here rests in the recovery steps where we trigger the rescan of the SCSI hosts to detect the drives again:

echo "0 0 0" >/sys/class/scsi_host/host0/scan
echo "0 0 0" >/sys/class/scsi_host/host1/scan
echo "0 0 0" >/sys/class/scsi_host/host2/scan
echo "0 0 0" >/sys/class/scsi_host/host3/scan
echo "0 0 0" >/sys/class/scsi_host/host4/scan
echo "0 0 0" >/sys/class/scsi_host/host5/scan

Following the above, the SDD's were again visible:

# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsda -> sde
lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsdb -> sdc
lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsdc -> sdb
lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsdd -> sdd
lrwxrwxrwx. 1 root root 3 Apr 15 22:49 /dev/rsde -> sdf
lrwxrwxrwx. 1 root root 3 Apr 15 22:48 /dev/rsdf -> sda
#

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdf[4] sde[0] sdd[3] sdc[1] sdb[2] sda[5]
      3907045632 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      bitmap: 0/8 pages [0KB], 65536KB chunk

unused devices: <none>
#

And this concludes the last test we'll do. 

 

TEST Script 1:

#!/usr/bin/awk -f

# ls -al /bin/gawk /usr/bin/awk /usr/bin/gawk /bin/awk
# lrwxrwxrwx. 1 root root      4 Sep 26  2011 /bin/awk -> gawk
# -rwxr-xr-x. 1 root root 382456 Nov 22  2010 /bin/gawk
# lrwxrwxrwx. 1 root root     14 Sep 26  2011 /usr/bin/awk -> ../../bin/gawk
# lrwxrwxrwx. 1 root root     14 Sep 26  2011 /usr/bin/gawk -> ../../bin/gawk

TNUM=0;

function random () {
    SOURCEC="strings /dev/urandom";
    SOURCEF="";

    pcnt=0;
    print "Creating the array…";
    print "            000—000—000—";
    cnt=0;
    LUMP="";
    LCN=0;
    while (1) {
        retv=((SOURCEC|getline)>0);
        if ( retv == 0 ) break;

        tcnt+=length($0);

        if ( LCN < 1000 ) {
            LUMP=LUMP""$0;
            LCN++;
        } else {
            WST[cnt++]=""LUMP;
            LUMP="";
            LCN=0;
        }

        pcnt++;
        if ( pcnt > 10000 ) {
            printf "BYTE Count: %18.0f %16.0f\r", tcnt, cnt;
            pcnt=0;
        }

        # Stop at 2MB sample.
        # if ( tcnt >= 2097152 )
        # Stop at 16MB sample.
        # if ( tcnt >= 16777216 )
        # Stop at 64MB sample.
        if ( tcnt >= 67108864 )
            break;
    }
    close(SOURCEC);

    # Max out filesystem.
    printf "Filling up the FS with files (fill.awk.NNNNNN.txt: "cnt") ….";
    MKD="mkdir ./fill."TNUM".awk.test";
    system(MKD);
    close(MKD);

    # Write 200,000 files of 64MB.  Change this to suit your test.
    for ( j = 0; j < 200000; j++ ) {
        FNM="./fill."TNUM".awk.test/fill.awk."j".txt";
        printf "File Saved: "FNM"\r";
        for ( i = 0; i < cnt; i++ ) {
            print WST[i] >> FNM;
        }
        close(FNM);
        i=0;
    }
   
}

BEGIN {
    # for (i=1; i < ARGC; ++i) {
    #    printf "ARGV [%d]=%s\n”, i, ARGV [i]
    # }

    random();
}

 

CONCLUSION

The availability tests were probably the most important here followed by the performance tests.  The fact that the software MDADM RAID6 array could survive through fairly brutal failures as per above tests is definitely a surprise to me following rather low expectations.  The opinion that I was given prior to setting out on this RAID6 adventure was that I'll either end up with excruciatingly long rebuilds or frequent failures that would render this setup ineffective for what I needed.  The above show that performance is there along side much better then expected reliability with all of the above reliability tests passing.  This leads us to conclude that the robustness of the MDADM RAID6 array is very solid at least in the tests done here and on the hardware specified above.
 

Pages: 1 2 3 4 5

11 Responses to “HTPC / Backup Home Server Solution using Linux”

  1. Hi,

    Great post. You don’t need to specify the parameters when creating the XFS file system, see http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E and http://www.spinics.net/lists/raid/msg38074.html . Of course, YMMV.

    Did you run those benchmarks while the array was resyncing?

  2. Hey Mathias,

    Thanks for posting. Just added the testing numbers so feel free to have a look and judge yourself.

    > logbsize and delaylog
    I ran another test with logbsize=128k (couldn’t find anything for delaylog in my mkfs.xfs man page so I’m not sure if that’ll do anything). Little to no difference in this case on first glance. Watch out for the results at some point for a closer look.

    One consideration here is that eventually I would grow the LVM and XFS to fill up to 4TB. I’ll be doing this soon Potentially in the future, I may try to grow this array as well to something well over 8TB (Yet to see how to do that). I’m not sure if XFS would auto-adjust in those cases for optimal values for those capacities and the link didn’t touch on that topic.

    All in all, I can still run tests on this thing recreating the FS if I need to so feel free to suggest numbers you’d be interested to see. I might leave this topic open for a week or two to see if I can think of anything else or if I’m missing anything. For my setup, having anything > 125MB/s is a bonus as the network is only 1GB/s with that theoretical max.

    Cheers!
    TK

  3. […] could be done safely enough like this guy did and with RAID6 as well with SSD type R/W’s no less. Your size would be limited to the size of the […]

  4. Thank you for posting this blog.  I was getting desparate.  I could not figure out why I could not stop the RAID1 device.  Even from Ubuntu Rescue Remix.  The LVM group was being assembled from the failed raid.  I removed the volume group and was finally able to gain exclusive access to the array to stop it, put in the new disk and rebuild the array.
     
    Nice job.
    Best,
    Dave.

  5. […] we'll use for this is the APCUPSD daemon available in RPM format. We've set one up for our HTPCB server for a home redundancy / backup solution to protect against power surges and bridge the […]

  6. […] every time while transferring my files.  At the point, I not only lost connectivity with the HTPC+B but also my web access most of the time.  Here are the culprits and here's how we went […]

  7. […] removed the cable and the adapter and only used a 2 foot cable to my HTPC+B system I've just configured.  Voila!  Problem solved.  Ultimately, it's […]

  8. […] them from system to system to avoid choppy video / sound and also to accommodate the needs of our HTPC+B solution through file […]

  9. […] Linux Networking: Persistent naming rules based on MAC for eth0 and wlan0 Linux: HTPC / Home Backup: MDADM, RAID6, LVM, XFS, CIFS and NFS […]

  10. […] at this point and 4:15 minutes have passed).  While this was going on, we are referencing our HTPC page for […]

  11. […] HTPC, Backup & Storage […]

Leave a Reply

You must be logged in to post a comment.


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License