Header Shadow Image


HTPC / Backup Home Server Solution using Linux

Pages: 1 2 3 4 5

ZFS ATTEMPTS

Prior to using XFS, we gave ZFS a serious try here. This came after some research online on the best options available.  There are other contenders such as BRTFS (In development) and ReiserFS which is no longer developed.  BRTFS is intrieguing however it's development state would prevent it from being used on most critical environments and we also needed a good record of stability.  ZFS being taunted as the next best thing in filesystems and even the last word in filesystems, seemed like a great choice for this setup.  

So naturally it was the first thing we tried.  It's important to mention one specific item about ZFS: We used a port of ZFS to Linux called ZFS Fuse.  Sun Microsystems / Oracle licencing fobids the original code to be used.

ZFS was originally designed for the Solaris OS and the source code is licenced to Oracle (Formerly Sun Microsystems).  So as with any port, we were hopeful that the means of porting the code over to Linux didn't effect performance.   We definitely expected a solid piece of software:
 

ITEM # DESCRIPTION COMMAND(S)
#1
 
Install the ZFS Fuse RPM.
In this case, I opted to use the zfs-fuse (Mind you, this is April 2012
 
# rpm -aq|grep -i zfs
zfs-fuse-0.6.9-6.20100709git.el6.x86_64
# which zfs
/usr/bin/zfs
# which zpool
/usr/bin/zpool
#
#2 Create the ZFS Filesystem.

We should now be ready to create our ZFS Filesystem.  In this case /dev/raidmd0 is already a raid so there is only one device to use:

# zpool create MBPCBackupz raidz /dev/MBPCStorage/MBPCBackup

If you receive an error that the ZFS Daemon is required for this to start issue the following:

# zpool create MBPCBackupz raidz /dev/MBPCStorage/MBPCBackup
connect: No such file or directory
Please make sure that the zfs-fuse daemon is running.
internal error: failed to initialize ZFS library
# setup
# service zfs-fuse restart
Starting zfs-fuse:                                         [  OK  ]
Immunizing zfs-fuse against OOM kills                      [  OK  ]
Mounting zfs partitions:                                   [  OK  ]
# ps -ef|grep -i zfs
root      3222     1  0 02:18 ?        00:00:00 /usr/bin/zfs-fuse -p /var/run/zfs-fuse.pid
root      3310 23918  0 02:18 pts/2    00:00:00 grep -i zfs
#
 
Trying again:
 
# zpool create MBPCBackupz raidz /dev/MBPCStorage/MBPCBackup
invalid vdev specification: raidz requires at least 2 devices
#
 
And finally again after changing raidz to disk (raidz is about as good as RAID5 but we have RAID6 above.):
 
# zpool create MBPCBackupz /dev/MBPCStorage/MBPCBackup
#
 
#3

Check that your ZFS exists.  
Redo if changes are needed.

Checking your ZFS is fairly straight forward.  Apparently, thought has been given to maintenance and support time and dollar costs for ZFS:
 
# zfs list
NAME          USED  AVAIL  REFER  MOUNTPOINT
MBPCBackupz  73.5K  1.78T    21K  /MBPCBackupz
#
 
# zpool list
NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
MBPCBackupz  1.81T    78K  1.81T     0%  1.00x  ONLINE  -
#
 
Obviously, I failed to realize that the mount point could make a difference.  So let's try again:
 
# zpool destroy MBPCBackupz
# zpool list
no pools available
#
 
 
# zpool create MBPCBackupz /dev/MBPCStorage/MBPCBackup -m /mnt/MBPCBackupz
# zpool list
NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
MBPCBackupz  1.81T    87K  1.81T     0%  1.00x  ONLINE  -
#
 
#4 ZFS on our zpool.

The ZFS filesystem has quite the extensive array of utilities for all sorts stats:

# zfs get all MBPCBackupz
NAME         PROPERTY              VALUE                  SOURCE
MBPCBackupz  type                  filesystem             -
MBPCBackupz  creation              Sun Feb 26  2:51 2012  -
MBPCBackupz  used                  701M                   -
MBPCBackupz  available             1.78T                  -
MBPCBackupz  referenced            701M                   -
MBPCBackupz  compressratio         1.00x                  -
MBPCBackupz  mounted               yes                    -
MBPCBackupz  quota                 none                   default
MBPCBackupz  reservation           none                   default
MBPCBackupz  recordsize            128K                   default
MBPCBackupz  mountpoint            /mnt/MBPCBackupz       local
MBPCBackupz  sharenfs              off                    default
MBPCBackupz  checksum              on                     default
MBPCBackupz  compression           off                    default
MBPCBackupz  atime                 on                     default
MBPCBackupz  devices               on                     default
MBPCBackupz  exec                  on                     default
MBPCBackupz  setuid                on                     default
MBPCBackupz  readonly              off                    default
MBPCBackupz  zoned                 off                    default
MBPCBackupz  snapdir               hidden                 default
MBPCBackupz  aclmode               groupmask              default
MBPCBackupz  aclinherit            restricted             default
MBPCBackupz  canmount              on                     default
MBPCBackupz  xattr                 on                     default
MBPCBackupz  copies                1                      default
MBPCBackupz  version               4                      -
MBPCBackupz  utf8only              off                    -
MBPCBackupz  normalization         none                   -
MBPCBackupz  casesensitivity       sensitive              -
MBPCBackupz  vscan                 off                    default
MBPCBackupz  nbmand                off                    default
MBPCBackupz  sharesmb              off                    default
MBPCBackupz  refquota              none                   default
MBPCBackupz  refreservation        none                   default
MBPCBackupz  primarycache          all                    default
MBPCBackupz  secondarycache        all                    default
MBPCBackupz  usedbysnapshots       0                      -
MBPCBackupz  usedbydataset         701M                   -
MBPCBackupz  usedbychildren        61.5K                  -
MBPCBackupz  usedbyrefreservation  0                      -
MBPCBackupz  logbias               latency                default
MBPCBackupz  dedup                 off                    default
MBPCBackupz  mlslabel              off                    -
#
 
#5 (Optional) Set the compression to allow for more space.
Let's enable compression to get a bit more space out of this combination:
 
# zfs set compression=on MBPCBackupz
#
 
After deleting a temp file we copied earlier, space only drops after a few seconds:
 
# zfs list
NAME          USED  AVAIL  REFER  MOUNTPOINT
MBPCBackupz   701M  1.78T   701M  /mnt/MBPCBackupz
# zfs list
NAME          USED  AVAIL  REFER  MOUNTPOINT
MBPCBackupz  82.5K  1.78T    21K  /mnt/MBPCBackupz
#
# zpool list
NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
MBPCBackupz  1.81T   122K  1.81T     0%  1.00x  ONLINE  -
#
 
NOTE: One thing we have to do is to remind ourselves of the fact that we are doing this customization and creating the ZFS filesystem all the while the RAID6 is still resyncing:
 
top – 03:08:40 up 11:08,  5 users,  load average: 1.76, 1.33, 1.22
Tasks: 244 total,   2 running, 241 sleeping,   0 stopped,   1 zombie
Cpu0  :  6.3%us, 36.3%sy,  0.0%ni, 45.9%id,  0.0%wa,  0.0%hi, 11.6%si,  0.0%st
Cpu1  : 11.3%us, 24.5%sy,  0.0%ni, 63.9%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   3920884k total,  2939120k used,   981764k free,    27644k buffers
Swap:  4194296k total,        8k used,  4194288k free,  1771964k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  893 root      20   0     0    0    0 S 47.0  0.0 218:28.05 md127_raid6
28276 root      20   0     0    0    0 D 17.9  0.0  16:55.23 md127_resync
 
 
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid6 sdi[3] sdh[0] sde[1] sdd[2] sdb[4] sda[5]
      3907045696 blocks super 1.2 level 6, 16k chunk, algorithm 2 [6/5] [UUU_UU]
      [======>..............]  recovery = 31.2% (305691368/976761424) finish=220.8min speed=50643K/sec
 
unused devices: <none>
#
 

#6 Rudimentary Performance Test
Now we will copy a large file and time the operation from start to finish while the RAID 6 is resyncing the last disk and with the compression set to ON on the ZFS pool (4.7GB file):
 
# date +%s:%N; cp -p /mnt/OD1.5TB/SampleBinary.dat .; date +%s:%N;
1330243909:207559045
1330244114:162053993
#
 
So this took 1330244114 – 1330243909 = 205 seconds or just 3 minutes.  It's hard to say if the limitation of the source single 1.5TB disk or the RAID6 array we created above is to blame for the slow speed.  We can retest this later.  Now let's check the compression ratio of our ZFS pool after the copy:
 
 
# zfs get compressratio MBPCBackupz
NAME         PROPERTY       VALUE  SOURCE
MBPCBackupz  compressratio  1.00x  -
# zpool status MBPCBackupz
  pool: MBPCBackupz
 state: ONLINE
 scrub: none requested
config:
 
NAME                      STATE     READ WRITE CKSUM
MBPCBackupz               ONLINE       0     0     0
 MBPCStorage/MBPCBackup  ONLINE       0     0     0
 
errors: No known data errors
#
 
Still 1.00x for compression but no errors.  Not surprised, file was already a compressed one. :)  Just for fun, and though theres no errors, let's scrub the pool:
# zpool status MBPCBackupz
  pool: MBPCBackupz
 state: ONLINE
   see: http://www.sun.com/msg/ZFS-8000-EY
 scrub: scrub in progress for 0h0m, 23.54% done, 0h0m to go
config:
 
NAME                      STATE     READ WRITE CKSUM
MBPCBackupz               ONLINE       0     0     0
 MBPCStorage/MBPCBackup  ONLINE       0     0     0
 
errors: No known data errors
#
 
So the overall speed of copy and compression of about 4.7GB, even though little was compressed, was 23MB per second.  Degraded but still not bad.  Checking with du -ah, we can see that the compression was only lightly effective on the large 4.7GB file but significant (from 127MB to 27MB) on the text file:
 
# du -ah
4.4G ./SampleBinary.dat
27M ./wpa_supplicant_watch.log-20111226
4.5G .
# du -a
4606003 ./SampleBinary.dat
27639 ./wpa_supplicant_watch.log-20111226
4633651 .
#
 
Very nice and successful setup, but only from a ease of configuration perspective:
 
RAID6 (mdadm)
LVM2
ZFS
gzip
 
Some static iostat statistics on our configuration:
 
# iostat
Linux 2.6.32-131.12.1.el6.x86_64 (mbpc) 02/26/2012 _x86_64_ (2 CPU)
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.70    0.05   24.86    0.73    0.00   63.67
 
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda             739.42     64390.64        66.14 2673766075    2746394
sdb             735.83     64390.75        66.15 2673770703    2746802
sdd             836.45     67273.85        66.13 2793488937    2746106
sde             722.54     64390.85        66.11 2673774628    2745154
sdg               7.87       337.86       340.54   14029176   14140448
dm-0             43.79        62.07       338.18    2577228   14042688
dm-1              0.01         0.09         0.00       3914          0
dm-2              0.01         0.09         0.00       3650          0
md127            28.93       223.94       258.45    9298765   10731930
dm-3              1.09       267.77         0.06   11118820       2632
dm-4              0.79         6.36         0.00     263956         64
dm-5              0.13         0.79         0.22      32660       9192
sdh             580.92     17345.18     47111.61  720243498 1956269152
sdi             105.80        38.88     17364.42    1614442  721042636
dm-6             28.85       223.27       258.45    9271035   10731850
 
#
 
# iostat -x -k -d
Linux 2.6.32-131.12.1.el6.x86_64 (mbpc) 02/26/2012 _x86_64_ (2 CPU)
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda            7309.77     7.59  739.35    0.45 32206.22    33.05    87.16     0.86    1.17   0.26  19.47
sdb            7313.38     7.59  735.75    0.46 32206.26    33.05    87.58     0.77    1.04   0.25  18.46
sdd            7573.79     7.60  836.72    0.43 33652.00    33.05    80.47     1.76    2.11   0.30  25.10
sde            7326.68     7.59  722.47    0.45 32206.31    33.03    89.19     0.84    1.16   0.27  19.41
sdg               0.90    37.46    2.55    5.32   168.83   170.18    86.15     0.16   20.75   2.93   2.30
dm-0              0.00     0.00    1.44   42.33    31.01   169.00     9.14     3.58   81.86   0.46   2.02
dm-1              0.00     0.00    0.01    0.00     0.05     0.00     7.91     0.00    4.54   2.65   0.00
dm-2              0.00     0.00    0.01    0.00     0.04     0.00     7.90     0.00    1.20   1.18   0.00
md127             0.00     0.00   13.40   15.51   111.90   129.15    16.67     0.00    0.00   0.00   0.00
dm-3              0.00     0.00    1.08    0.01   133.80     0.03   245.79     0.01    4.82   2.63   0.29
dm-4              0.00     0.00    0.79    0.00     3.18     0.00     8.00     0.00    4.28   0.40   0.03
dm-5              0.00     0.00    0.10    0.03     0.39     0.11     7.99     0.00   14.05   0.52   0.01
sdh            1943.95  5507.00  228.05  353.33  8697.72 23541.54   110.90     3.15    5.41   0.80  46.23
sdi               2.70  2063.57    2.04  104.07    19.43  8707.34   164.47     0.56    5.25   1.13  11.98
dm-6              0.00     0.00   13.32   15.51   111.57   129.15    16.70     0.41   14.12   0.15   0.44
 
#
INTERESTING NOTE: After we copied both files above, the system appears to be still compressing the files as the ratio went from 1.01x to 1.02x after copy was done for both files.  This would appear to be a nice feature however I'm not sure I would want lingering processes on the system when production jobs need the CPU:
 
# zfs get compressratio MBPCBackupz
NAME         PROPERTY       VALUE  SOURCE
MBPCBackupz  compressratio  1.02x  -
#
Full read/write from sdg to md127(sda, sdb, sdd, sde, sdh, sdi) (Nothing at 100%?)
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda              15.33  1414.90   21.17   77.53   153.33  6109.10   126.90     0.43    4.13   2.40  23.69
sdb              15.03  1413.50   22.03   84.20   155.33  6124.30   118.22     0.45    4.10   2.25  23.91
sdd              15.87  1417.23   23.03   77.13   161.60  6116.30   125.35     0.46    4.38   2.47  24.75
sde              15.87  1415.67   21.47   76.00   155.47  6107.23   128.51     0.40    3.95   2.28  22.22
sdg               0.50     6.43  184.87    1.47 23531.07    31.20   252.90     0.79    4.24   2.28  42.47
dm-0              0.00     0.00    1.77    7.80    26.00    31.20    11.96     1.61  168.64  10.36   9.91
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md127             0.00     0.00    1.73 2871.47    27.73 23940.53    16.68     0.00    0.00   0.00   0.00
dm-3              0.00     0.00  183.57    0.00 23496.53     0.00   256.00     0.66    3.62   2.28  41.76
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh              15.57  1416.67   23.63   74.33   163.87  6102.43   127.93     0.45    4.35   2.50  24.53
sdi              15.67  1414.50   18.80   82.03   144.40  6121.50   124.28     0.42    3.99   2.38  24.02
dm-6              0.00     0.00    1.73 2871.47    27.73 23940.53    16.68    68.81   23.95   0.20  57.54
 
and took 197 seconds.  Time to check the CPU% to see if the process is limited by a single execution core:
 
top – 08:49:51 up 16:49,  8 users,  load average: 0.51, 0.21, 0.12
Tasks: 293 total,   2 running, 290 sleeping,   0 stopped,   1 zombie
Cpu0  : 14.6%us, 17.5%sy,  0.0%ni, 47.4%id, 20.2%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu1  : 30.6%us, 13.7%sy,  0.0%ni, 22.5%id, 32.6%wa,  0.3%hi,  0.3%si,  0.0%st
Mem:   3920884k total,  3769812k used,   151072k free,    64652k buffers
Swap:  4194296k total,      480k used,  4193816k free,  2403000k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                         
 3222 root      20   0 1417m 173m 1516 S   35  4.5   3:29.04 zfs-fuse
 3243 root      20   0 1104m  68m  15m R   13  1.8 123:47.75 npviewer.bin
32530 root      20   0  111m  972  708 D    9  0.0   0:06.75 cp
  893 root      20   0     0    0    0 S    7  0.0 332:40.52 md127_raid6
 3127 root      20   0 1099m 291m  21m S    4  7.6  62:17.18 firefox
 2244 root      20   0  187m  64m 9936 S    3  1.7  19:34.16 Xorg
   38 root      20   0     0    0    0 S    2  0.0   0:06.01 kswapd0
 2660 root      20   0  138m 3004 2380 S    1  0.1   0:04.70 gvfsd-trash
18330 videouse  20   0  138m 2856 2352 S    1  0.1   0:02.66 gvfsd-trash
 3169 root      20   0  292m  13m 9108 S    1  0.4   1:39.88 gnome-terminal
11967 root      20   0 15220 1352  904 R    1  0.0   1:41.24 top
   22 root      20   0     0    0    0 S    0  0.0   2:30.00 kblockd/0                        
 
which, at first glance, doesn't appear to be the case because zfs-fuse is well under 50% in Irix mode however when checking all the CPU  counters the usage is in fact substantial (The ZFS Daemon?).  Copying a file back, yields this:
 
top – 08:54:26 up 16:53,  8 users,  load average: 1.60, 0.64, 0.30
Tasks: 294 total,   4 running, 289 sleeping,   0 stopped,   1 zombie
Cpu0  : 17.0%us, 28.3%sy,  0.0%ni,  6.7%id, 35.7%wa,  0.0%hi, 12.3%si,  0.0%st
Cpu1  : 15.8%us, 34.5%sy,  0.0%ni,  0.0%id, 47.7%wa,  0.3%hi,  1.6%si,  0.0%st
Mem:   3920884k total,  3776932k used,   143952k free,    64988k buffers
Swap:  4194296k total,      480k used,  4193816k free,  2371892k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                         
 3222 root      20   0 1417m 177m 1516 S   46  4.6   4:27.26 zfs-fuse
  893 root      20   0     0    0    0 S   16  0.0 332:53.63 md127_raid6
  770 root      20   0  111m  880  712 R   13  0.0   0:04.74 cp
 3243 root      20   0 1104m  68m  15m R   13  1.8 124:23.04 npviewer.bin
  771 root      20   0     0    0    0 D    5  0.0   0:01.43 flush-253:3
 3127 root      20   0 1099m 292m  21m S    4  7.6  62:28.77 firefox
 2244 root      20   0  187m  64m 9936 S    4  1.7  19:42.01 Xorg     
 
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda              81.83     0.00 1308.63    0.00  9825.65     0.00    15.02     2.57    1.96   0.17  21.97
sdb              85.83     0.00 1304.10    0.00  9831.33     0.00    15.08     2.96    2.28   0.17  22.61
sdd              79.73     0.00 1309.20    0.00  9824.80     0.00    15.01     1.26    0.97   0.14  17.78
sde              82.33     0.00 1307.93    0.00  9823.03     0.00    15.02     1.17    0.90   0.14  17.76
sdg               0.00 11392.93    3.13   91.50    60.53 46054.93   974.61   142.69 1522.58  10.57 100.00
dm-0              0.00     0.00    3.10    5.03    60.40    20.00    19.77     2.79  342.85  40.18  32.68
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md127             0.00     0.00 5803.60    0.00 48809.82     0.00    16.82     0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.03 11478.83     0.13 45915.33     8.00 18153.20 1596.93   0.09 100.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh              84.97     0.00 1304.03    0.00  9823.73     0.00    15.07     3.09    2.37   0.18  23.29
sdi              84.03     0.00 1306.27    0.00  9825.83     0.00    15.04     1.61    1.23   0.14  18.81
dm-6              0.00     0.00 5803.60    0.00 48809.82     0.00    16.82    20.97    3.62   0.07  42.71
 
Which means reads are much faster out of the RAID6 /dev/raidmd0.  The bottleneck is clearly the target here but it's not so clear where the bottleneck is with a single drive to RAID6 /dev/raidmd0 copy.  So reading could go up to 115MB/s theoretically but writes suffer at no higher then 25MB/s.  (This is very slow).  
 
Tweaking time:
 
cat /sys/block/md127/md/stripe_cache_size
 
OR
 
cat /sys/block/$(awk 'BEGIN { "ls -al /dev/raidmd0" | getline; print $NF }')/md/stripe_cache_size
 
and so doing this action:
 
# echo "8192" > /sys/block/md127/md/stripe_cache_size
 
had absolutely no effect:
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda              20.60  1510.37   22.10   92.07   177.47  6580.90   118.39     0.51    4.23   2.21  25.28
sdb              18.70  1507.87   22.80   96.10   172.53  6580.37   113.59     0.45    3.66   2.04  24.28
sdd              17.90  1515.40   22.97   91.13   170.13  6595.83   118.60     0.51    4.21   2.31  26.34
sde              21.27  1513.87   22.27   93.63   180.93  6588.90   116.82     0.50    4.09   2.13  24.64
sdg               0.03   122.07  192.90    4.83 24229.33   506.93   250.20     0.77    3.89   2.23  44.09
dm-0              0.00     0.00    4.00  126.73    50.93   506.93     8.53     6.45   49.31   0.36   4.72
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md127             0.00     0.00    1.47 3193.80    23.47 25799.33    16.16     0.00    0.00   0.00   0.00
dm-3              0.00     0.00  188.90    0.00 24174.13     0.00   255.95     0.62    3.28   2.12  39.95
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh              19.70  1513.63   22.13   94.57   173.60  6601.83   116.12     0.51    4.13   2.24  26.19
sdi              19.77  1513.33   23.23   88.30   179.33  6577.30   121.16     0.50    4.21   2.35  26.21
dm-6              0.00     0.00    1.47 3193.80    23.47 25799.33    16.16    75.74   23.67   0.19  59.21
 
There's a script on this post that suggests a number of values you can use (http://ubuntuforums.org/showthread.php?t=1494846&page=3)  It only reports but doesn't set so is handy to experiment for this purpose:
 
# ./tune.bash
check /tmp/tune_raid.log for messages in case of error.
suggested read ahead size per device: 768 blocks (384kb)
suggested read ahead size of array: 4608 blocks (2304kb)
RUN blockdev –setra 768 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi
your current value for readahead is 256 256 256 256 256 256
RUN blockdev –setra 4608 /dev/md127
your current value for readahead is 256
suggested stripe cache size of devices: 96 pages (384kb)
RUN echo 96 > /sys/block/md127/md/stripe_cache_size
current value of /sys/block/md127/md/stripe_cache_size is 8192
setting max sectors kb to match chunk size
RUN echo 16 > /sys/block/sdi/queue/max_sectors_kb
current value of /sys/block/sdi/queue/max_sectors_kb is 512
RUN echo 16 > /sys/block/sdh/queue/max_sectors_kb
current value of /sys/block/sdh/queue/max_sectors_kb is 512
RUN echo 16 > /sys/block/sde/queue/max_sectors_kb
current value of /sys/block/sde/queue/max_sectors_kb is 512
RUN echo 16 > /sys/block/sdd/queue/max_sectors_kb
current value of /sys/block/sdd/queue/max_sectors_kb is 512
RUN echo 16 > /sys/block/sdb/queue/max_sectors_kb
current value of /sys/block/sdb/queue/max_sectors_kb is 512
RUN echo 16 > /sys/block/sda/queue/max_sectors_kb
current value of /sys/block/sda/queue/max_sectors_kb is 512
setting NCQ queue depth to 1
RUN echo 1 > /sys/block/sdi/device/queue_depth
current value of /sys/block/sdi/device/queue_depth is 31
RUN echo 1 > /sys/block/sdh/device/queue_depth
current value of /sys/block/sdh/device/queue_depth is 31
RUN echo 1 > /sys/block/sde/device/queue_depth
current value of /sys/block/sde/device/queue_depth is 31
RUN echo 1 > /sys/block/sdd/device/queue_depth
current value of /sys/block/sdd/device/queue_depth is 31
RUN echo 1 > /sys/block/sdb/device/queue_depth
current value of /sys/block/sdb/device/queue_depth is 31
RUN echo 1 > /sys/block/sda/device/queue_depth
current value of /sys/block/sda/device/queue_depth is 31
 
After the above, the write took 261 seconds from the earlier 197 seconds.  So degraded performance.  The difference is most visible on iostat:
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda              64.17   756.40   54.37  362.40   490.53  4502.53    23.96     4.27   10.23   1.08  45.19
sdb              56.87   756.23   53.23  364.53   458.40  4508.40    23.78     4.03    9.63   1.05  43.95
sdd              56.13   751.40   53.43  365.60   458.93  4493.87    23.64     3.63    8.65   0.99  41.37
sde              66.20   755.03   55.67  361.23   505.87  4490.80    23.97     4.06    9.70   1.05  43.74
sdg               1.03     6.57  139.27    2.40 17570.27    35.07   248.55     0.61    4.33   2.34  33.20
dm-0              0.00     0.00    3.37    8.83    42.80    35.07    12.77     1.59  130.33   8.47  10.33
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md127             0.00     0.00    1.33 2189.10    21.33 17552.62    16.05     0.00    0.00   0.00   0.00
dm-3              0.00     0.00  137.00    0.00 17536.00     0.00   256.00     0.48    3.52   2.24  30.62
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh              58.40   760.20   51.70  355.87   461.87  4498.53    24.34     4.41   10.82   1.13  46.16
sdi              63.23   755.70   55.20  362.87   489.20  4500.40    23.87     3.62    8.64   1.05  43.80
dm-6              0.00     0.00    1.33 2189.10    21.33 17552.62    16.05    79.42   36.30   0.31  68.95
So I try to use my own numbers instead.
 
MY NUMBERS:
 
echo 4096 > /sys/block/md127/md/stripe_cache_size
blockdev –setra 1024 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi
blockdev –setra 16484 /dev/md127
for mskb in sdi sdh sde sdd sdb sda; do echo 4096 > /sys/block/$mskb/queue/max_sectors_kb; done
for qdepth in sdi sdh sde sdd sdb sda; do echo 256 > /sys/block/$qdepth/device/queue_depth; done
 
This braught it back to 196 seconds for a 4.7GB file.  Looking at the iostat -x -k -d 30 numbers, the individual RAID6 disks are nearly half as busy with the higher numbers.  This is a good indication:
 
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda              14.03  1415.77   18.70   78.37   135.60  6117.80   128.85     0.43    4.17   2.38  23.14
sdb              12.77  1417.63   19.03   80.47   131.33  6130.73   125.87     0.41    3.90   2.33  23.22
sdd              13.03  1416.23   19.70   79.77   137.07  6125.93   125.93     0.47    4.55   2.64  26.27
sde              13.37  1415.53   18.70   80.67   135.33  6123.53   125.98     0.41    3.99   2.33  23.14
sdg               0.17     5.73  187.47    1.40 23914.80    28.13   253.54     0.75    3.98   2.20  41.58
dm-0              0.00     0.00    0.97    7.03    17.20    28.13    11.33     0.56   70.47   3.61   2.89
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md127             0.00     0.00    1.47 2852.20    23.47 23960.73    16.81     0.00    0.00   0.00   0.00
dm-3              0.00     0.00  186.70    0.00 23897.60     0.00   256.00     0.64    3.44   2.18  40.73
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh              14.30  1414.70   20.00   80.23   143.07  6117.67   124.92     0.44    4.22   2.40  24.02
sdi              12.60  1416.63   19.00   81.60   131.87  6133.67   124.56     0.49    4.62   2.61  26.27
dm-6              0.00     0.00    1.47 2852.20    23.47 23960.73    16.81    67.94   23.80   0.21  58.66
 
So let's try some higher numbers by doubling them:
 
echo 8192 > /sys/block/md127/md/stripe_cache_size
blockdev –setra 2048 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi
blockdev –setra 32768 /dev/md127
for mskb in sdi sdh sde sdd sdb sda; do echo 8192 > /sys/block/$mskb/queue/max_sectors_kb; done
for qdepth in sdi sdh sde sdd sdb sda; do echo 256 > /sys/block/$qdepth/device/queue_depth; done
NOTE: putting 256 failed this time.   It couldn't set higher then 31 for queue_depth which is interesting.  The above did give me the fastest result at 192 seconds for 4.7GB:
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda              17.37  1416.83   21.73   84.60   163.48  6180.37   119.32     0.47    4.22   2.28  24.25
sdb              16.40  1414.93   20.87   84.57   156.53  6175.30   120.11     0.44    3.92   2.19  23.08
sdd              15.73  1416.70   20.10   85.73   148.80  6188.37   119.76     0.45    4.11   2.31  24.44
sde              17.00  1416.33   21.20   87.83   159.47  6191.03   116.49     0.44    3.88   2.11  23.01
sdg               0.17    63.83  191.50    2.80 24238.80   266.00   252.24     0.77    3.99   2.25  43.64
dm-0              0.00     0.00    2.70   66.50    51.07   266.00     9.16     4.78   69.05   0.69   4.76
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md127             0.00     0.00    1.67 2953.93    25.68 24204.58    16.40     0.00    0.00   0.00   0.00
dm-3              0.00     0.00  188.97    0.00 24187.73     0.00   256.00     0.65    3.44   2.19  41.39
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh              17.20  1419.53   19.60   81.17   153.07  6184.63   125.79     0.48    4.47   2.38  23.98
sdi              15.60  1415.27   21.30   87.73   155.07  6173.70   116.09     0.45    3.95   2.27  24.76
dm-6              0.00     0.00    1.67 2953.93    25.68 24204.58    16.40    71.91   24.30   0.20  57.94
Ok.  Let's try with these:
 
echo 8192 > /sys/block/md127/md/stripe_cache_size
blockdev –setra 8192 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi
blockdev –setra 32768 /dev/md127
for mskb in sdi sdh sde sdd sdb sda; do echo 8192 > /sys/block/$mskb/queue/max_sectors_kb; done
for qdepth in sdi sdh sde sdd sdb sda; do echo 512 > /sys/block/$qdepth/device/queue_depth; done
This resulted in some sustained improvements from around 24MB even to 24.8MB even.  Not a big improvement, ok an abysmal improvement, but an improvement and write time was 191s:
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda              15.43  1466.03   20.00   80.93   149.33  6342.73   128.64     0.42    4.05   2.39  24.08
sdb              13.67  1462.10   20.17   87.33   141.33  6346.60   120.71     0.39    3.54   2.16  23.26
sdd              14.47  1465.80   21.13   79.03   149.20  6343.27   129.63     0.43    4.13   2.42  24.20
sde              15.37  1463.27   20.23   80.87   148.40  6327.67   128.11     0.41    3.92   2.23  22.59
sdg               0.37     8.10  188.40    2.47 23924.13    41.60   251.13     0.82    4.28   2.25  43.04
dm-0              0.00     0.00    2.13   10.40    35.07    41.60    12.23     1.24   98.82   4.91   6.16
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md127             0.00     0.00    1.60 3027.00    25.60 24818.23    16.41     0.00    0.00   0.00   0.00
dm-3              0.00     0.00  186.63    0.00 23889.07     0.00   256.00     0.65    3.47   2.21  41.20
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh              13.80  1463.50   20.00   81.77   142.00  6331.80   127.23     0.43    4.09   2.40  24.42
sdi              16.37  1460.53   20.67   84.03   156.40  6337.27   124.04     0.43    3.97   2.37  24.84
dm-6              0.00     0.00    1.60 3027.17    25.60 24820.50    16.41    72.43   23.92   0.19  58.04
Next I will tune the stripe_cache_size to a higher number and see:
echo 32768 > /sys/block/md127/md/stripe_cache_size
blockdev –setra 32768 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi
blockdev –setra 32768 /dev/md127
for mskb in sdi sdh sde sdd sdb sda; do echo 16384 > /sys/block/$mskb/queue/max_sectors_kb; done
for qdepth in sdi sdh sde sdd sdb sda; do echo 512 > /sys/block/$qdepth/device/queue_depth; done
The results were slightly worse.  So let's try these numbers:
 
echo 8192 > /sys/block/md127/md/stripe_cache_size
blockdev –setra 4096 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi
blockdev –setra 32768 /dev/md127
for mskb in sdi sdh sde sdd sdb sda; do echo 8192 > /sys/block/$mskb/queue/max_sectors_kb; done
for qdepth in sdi sdh sde sdd sdb sda; do echo 512 > /sys/block/$qdepth/device/queue_depth; done
A degredation from 24.8MB/s but improvement from earlier so I set it back to what I had that gave me 24.8MB/s:
 
echo 8192 > /sys/block/md127/md/stripe_cache_size
blockdev –setra 8192 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi
blockdev –setra 32768 /dev/md127
for mskb in sdi sdh sde sdd sdb sda; do echo 8192 > /sys/block/$mskb/queue/max_sectors_kb; done
for qdepth in sdi sdh sde sdd sdb sda; do echo 31 > /sys/block/$qdepth/device/queue_depth; done
And we're back to where we started:
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda              14.17  1465.27   20.43   81.00   146.53  6348.47   128.06     0.41    3.93   2.31  23.42
sdb              16.63  1464.10   20.40   84.40   154.80  6356.73   124.27     0.43    3.95   2.23  23.33
sdd              15.17  1467.00   18.67   84.03   142.27  6368.20   126.79     0.42    3.91   2.39  24.52
sde              13.27  1467.40   19.03   84.90   136.13  6371.67   125.23     0.40    3.74   2.15  22.38
sdg               0.00     1.40  187.20    1.27 23895.73    10.40   253.69     0.85    4.51   2.14  40.42
dm-0              0.00     0.00    0.57    2.60     6.67    10.40    10.78     0.24   74.99   7.93   2.51
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md127             0.00     0.00    1.60 3120.27    25.60 24887.83    15.96     0.00    0.00   0.00   0.00
dm-3              0.00     0.00  186.63    0.00 23889.07     0.00   256.00     0.62    3.32   2.12  39.64
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh              15.77  1467.90   20.43   80.70   151.60  6360.33   128.78     0.45    4.29   2.40  24.28
sdi              14.13  1465.30   19.60   82.77   143.07  6367.27   127.20     0.44    4.14   2.51  25.70
dm-6              0.00     0.00    1.60 3120.27    25.60 24887.83    15.96    75.88   24.29   0.19  59.85
So the above peaked at 24.8MB/s.  Next I'll try to reset the chunk size from 16 to 128 (Man recommends 512):
 
# mdadm –grow /dev/raidmd0 –chunk-size=128
 
# mdadm –grow /dev/raidmd0 –chunk=32K
mdadm: New chunk size does not divide component size
#
And that's where we apparently run into BZ#723137 (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=723137).  Looks like I have to destroy my ZFS and RAID6 and start all over with a larger chunk size.  So let's do that with a condensed set of steps:
 
# zpool destroy MBPCBackupz
#
# zpool list
no pools available
# zfs list
no datasets available
#
# lvm lvremove /dev/MBPCStorage/MBPCBackup 
Do you really want to remove active logical volume MBPCBackup? [y/n]: y
  Logical volume "MBPCBackup" successfully removed
# lvm lvs
  LV        VG          Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  oLogVol02 VGEntertain -wi-ao 151.00g                                      
  olv_root  VGEntertain -wi-ao  32.00g                                      
  olv_swap  VGEntertain -wi-a-   4.00g                                      
  fmlv      mbpcvg      -wi-ao   1.15t                                      
  rootlv    mbpcvg      -wi-ao  31.25g                                      
  swaplv    mbpcvg      -wi-ao   4.00g                                      
# lvm vgs
  VG          #PV #LV #SN Attr   VSize   VFree
  MBPCStorage   1   0   0 wz–n-   3.64t 3.64t
  VGEntertain   1   3   0 wz–n- 187.00g    0 
  mbpcvg        1   3   0 wz–n-   1.18t    0 
#
 
 
# lvm vgremove MBPCStorage
  Volume group "MBPCStorage" successfully removed
# lvm vgs
  VG          #PV #LV #SN Attr   VSize   VFree
  VGEntertain   1   3   0 wz–n- 187.00g    0 
  mbpcvg        1   3   0 wz–n-   1.18t    0 
#
 
# lvm pvremove /dev/raidmd0
  Labels on physical volume "/dev/raidmd0" successfully wiped
[root@mbpc mnt]# lvm pvs
  PV         VG          Fmt  Attr PSize   PFree
  /dev/sdg2  mbpcvg      lvm2 a-     1.18t    0 
  /dev/sdg3  VGEntertain lvm2 a-   187.00g    0 
#
 
Next we stop our array:
 
# mdadm –detail /dev/raidmd0
/dev/raidmd0:
        Version : 1.2
  Creation Time : Mon Jan 30 00:22:17 2012
     Raid Level : raid6
     Array Size : 3907045696 (3726.05 GiB 4000.81 GB)
  Used Dev Size : 976761424 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent
 
    Update Time : Sun Feb 26 14:11:14 2012
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0
 
         Layout : left-symmetric
     Chunk Size : 16K
 
           Name : mbpc:0  (local to host mbpc)
           UUID : b9c13d43:a7a1d949:f20dd93a:cb41cc00
         Events : 312
 
    Number   Major   Minor   RaidDevice State
       0       8      112        0      active sync   /dev/sdh
       1       8       64        1      active sync   /dev/sde
       2       8       48        2      active sync   /dev/sdd
       3       8      128        3      active sync   /dev/sdi
       4       8       16        4      active sync   /dev/sdb
       5       8        0        5      active sync   /dev/sda
#
# mdadm –stop /dev/raidmd0
# mdadm –detail /dev/raidmd0
mdadm: cannot open /dev/raidmd0: No such file or directory
# cat /proc/mdadm
cat: /proc/mdadm: No such file or directory
#
And now we recreate our array:
mdadm –create –verbose  /dev/md0 –level=raid6 –chunk=64K –auto=p –raid-devices=6  –spare-devices=0  /dev/rsd{a,b,c,d,e,f}
lvm pvcreate /dev/raidmd0
lvm vgcreate MBPCStorage /dev/raidmd0
lvm lvcreate -L3906254360S -n MBPCBackup MBPCStorage
zpool create MBPCBackupz /dev/MBPCStorage/MBPCBackup -m /mnt/MBPCBackupz
zfs set compression=on MBPCBackupz
Speed was still abysmal however at 191 seconds for 4.7GB:
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.80  191.30    1.17 24418.93     7.60   253.83     1.04    5.39   2.17  41.79
dm-0              0.00     0.00    0.63    1.90    13.60     7.60    16.74     0.43  169.89  16.88   4.28
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-3              0.00     0.00  190.73    0.00 24413.87     0.00   256.00     0.64    3.37   2.14  40.74
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb             429.73  1891.23   39.80   93.30  1890.53  7978.63   148.30     0.62    4.53   2.37  31.57
md0               0.00     0.00    0.43  479.03    27.73 24730.58   103.27     0.00    0.00   0.00   0.00
sdc             435.37  1883.03   40.37   91.07  1918.40  7942.77   150.06     0.55    4.05   2.15  28.25
sdd             456.23  1903.57   43.97   94.23  2010.53  8031.43   145.33     0.50    3.46   1.92  26.50
sde             449.67  1891.33   37.67   89.87  1962.80  7966.37   155.71     0.63    4.84   2.62  33.36
sdf             436.43  1896.80   39.03   96.00  1915.20  8008.23   146.98     0.53    3.82   2.09  28.25
sdg             425.77  1921.77   41.03   96.93  1882.67  8115.97   144.94     0.55    3.88   2.10  29.00
dm-6              0.00     0.00    0.43  479.03    27.73 24730.58   103.27    11.11   23.18   1.23  58.75
Time to tweak again:
cat /sys/block/md0/md/stripe_cache_size
blockdev –getra $(echo $(ls -al /dev/rsd*|awk '{ print "/dev/"$NF }'))
blockdev –getra /dev/md0 /dev/raidmd0
for mskb in $(ls -al /dev/rsd*|awk '{ print $NF }'); do cat /sys/block/$mskb/queue/max_sectors_kb; done
for qdepth in $(ls -al /dev/rsd*|awk '{ print $NF }'); do cat /sys/block/$qdepth/device/queue_depth; done
Verify our mappings for changing parameters (UDEV Rules):
 
# ls -al /dev/rsd*
lrwxrwxrwx. 1 root root 3 Feb 27 13:41 /dev/rsda -> sdb
lrwxrwxrwx. 1 root root 3 Feb 27 13:41 /dev/rsdb -> sdc
lrwxrwxrwx. 1 root root 3 Feb 27 13:41 /dev/rsdc -> sdd
lrwxrwxrwx. 1 root root 3 Feb 27 13:41 /dev/rsdd -> sde
lrwxrwxrwx. 1 root root 3 Feb 27 13:41 /dev/rsde -> sdf
lrwxrwxrwx. 1 root root 3 Feb 27 13:41 /dev/rsdf -> sdg
#
And let's try with yet another combination of numbers:
 
echo 8192 > /sys/block/md0/md/stripe_cache_size
blockdev –setra 8192 $(echo $(ls -al /dev/rsd*|awk '{ print "/dev/"$NF }'))
blockdev –setra 32768 /dev/md127
for mskb in $(ls -al /dev/rsd*|awk '{ print $NF }'); do echo 8192 > /sys/block/$mskb/queue/max_sectors_kb; done
for qdepth in $(ls -al /dev/rsd*|awk '{ print $NF }'); do echo 31 > /sys/block/$qdepth/device/queue_depth; done
Verify that things are actually set:
 
# cat /sys/block/md0/md/stripe_cache_size
256
# blockdev –getra $(echo $(ls -al /dev/rsd*|awk '{ print "/dev/"$NF }'))
256
256
256
256
256
256
# blockdev –getra /dev/md0 /dev/raidmd0
4096
4096
# ls -al /dev/rsd*|awk '{ print $NF }'
sdb
sdc
sdd
sde
sdf
sdg
# for mskb in $(ls -al /dev/rsd*|awk '{ print $NF }'); do cat /sys/block/$mskb/queue/max_sectors_kb; done
512
512
512
512
512
512
# for qdepth in $(ls -al /dev/rsd*|awk '{ print $NF }'); do cat /sys/block/$qdepth/device/queue_depth; done
31
31
31
31
31
31
#
So now we try to reset the chunk size again.  This time because we started with a larger value, 256, we are successful:
 
 
# mdadm –grow /dev/raidmd0 –chunk=128K
mdadm: /dev/raidmd0: Cannot grow – need backup-file
#
Hmm.  No luck but it's ok.  So now let's continue performance testing:
 
top – 04:19:08 up  5:36,  6 users,  load average: 0.52, 0.24, 0.09
Tasks: 221 total,   3 running, 217 sleeping,   0 stopped,   1 zombie
Cpu0  :  4.0%us, 14.3%sy,  0.0%ni, 25.6%id, 55.8%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu1  :  8.2%us, 14.1%sy,  0.0%ni, 67.0%id, 10.1%wa,  0.0%hi,  0.7%si,  0.0%st
Mem:   3920768k total,  3776328k used,   144440k free,    59900k buffers
Swap:  4194296k total,      524k used,  4193772k free,  2872196k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                      
 
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda              91.97  1576.17   20.47   88.93   462.93  6727.83   131.46     0.42    3.73   2.10  22.97
sdb              88.00  1573.27   21.50   85.60   445.33  6702.37   133.48     0.42    3.79   2.19  23.46
sdc              86.57  1578.97   20.73   86.33   434.80  6729.03   133.82     0.40    3.62   2.14  22.96
sdd              84.93  1570.67   21.93   80.53   434.13  6669.30   138.65     0.42    4.01   2.30  23.59
sde              85.47  1571.33   20.97   86.17   438.80  6688.37   133.05     0.41    3.69   2.19  23.46
sdf              89.47  1571.10   22.07   84.27   454.13  6677.83   134.14     0.40    3.67   2.21  23.45
sdg               0.33     4.20  195.33    1.03 24820.80    20.53   253.01     0.75    3.83   2.27  44.49
dm-0              0.00     0.00    2.03    5.13    35.73    20.53    15.70     0.21   29.92   5.68   4.07
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-4              0.00     0.00  193.57    0.00 24776.53     0.00   256.00     0.70    3.60   2.24  43.42
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.37  928.20    23.47 25264.65    54.47     0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.37  928.20    23.47 25264.65    54.47    21.50   23.14   0.61  56.25
The second copy gave me 188 seconds so roughly 25.2MB/s.  We then decided to add a bitmap as apparently it didn't have any Bitmap earlier.  This is generally a good thing for recovering an array:
 
# mdadm –grow /dev/md127 –bitmap=internal
#
 
We can reset to none after with this command:
# mdadm –grow /dev/md127 –bitmap=none
#
# mdadm –detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Sun Mar  4 23:11:42 2012
     Raid Level : raid6
     Array Size : 3907045632 (3726.05 GiB 4000.81 GB)
  Used Dev Size : 976761408 (931.51 GiB 1000.20 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent
 
  Intent Bitmap : Internal
 
    Update Time : Sun Mar 18 18:30:21 2012
          State : active
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0
 
         Layout : left-symmetric
     Chunk Size : 64K
 
           Name : mbpc:0  (local to host mbpc)
           UUID : f1c5626d:cfd9d49e:41347e87:7b949c44
         Events : 20
 
    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       64        1      active sync   /dev/sde
       2       8       48        2      active sync   /dev/sdd
       3       8       80        3      active sync   /dev/sdf
       4       8       16        4      active sync   /dev/sdb
       5       8        0        5      active sync   /dev/sda
#
One thing to note is that even with internal bitmaps, there will be a slight write penalty but a huge recovery benefit when a disk(s) fail in an array.  So it is better to keep internal bitmaps.  Also set the minimum speed of the array rebuild to 50MB/s. Per http://www.ducea.com/2006/06/25/increase-the-speed-of-linux-software-raid-reconstruction/ from the default of 1000:
 
# echo 50000 >/proc/sys/dev/raid/speed_limit_min
 
but little write speed difference.  
 

Pages: 1 2 3 4 5

11 Responses to “HTPC / Backup Home Server Solution using Linux”

  1. Hi,

    Great post. You don’t need to specify the parameters when creating the XFS file system, see http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E and http://www.spinics.net/lists/raid/msg38074.html . Of course, YMMV.

    Did you run those benchmarks while the array was resyncing?

  2. Hey Mathias,

    Thanks for posting. Just added the testing numbers so feel free to have a look and judge yourself.

    > logbsize and delaylog
    I ran another test with logbsize=128k (couldn’t find anything for delaylog in my mkfs.xfs man page so I’m not sure if that’ll do anything). Little to no difference in this case on first glance. Watch out for the results at some point for a closer look.

    One consideration here is that eventually I would grow the LVM and XFS to fill up to 4TB. I’ll be doing this soon Potentially in the future, I may try to grow this array as well to something well over 8TB (Yet to see how to do that). I’m not sure if XFS would auto-adjust in those cases for optimal values for those capacities and the link didn’t touch on that topic.

    All in all, I can still run tests on this thing recreating the FS if I need to so feel free to suggest numbers you’d be interested to see. I might leave this topic open for a week or two to see if I can think of anything else or if I’m missing anything. For my setup, having anything > 125MB/s is a bonus as the network is only 1GB/s with that theoretical max.

    Cheers!
    TK

  3. [...] could be done safely enough like this guy did and with RAID6 as well with SSD type R/W’s no less. Your size would be limited to the size of the [...]

  4. Thank you for posting this blog.  I was getting desparate.  I could not figure out why I could not stop the RAID1 device.  Even from Ubuntu Rescue Remix.  The LVM group was being assembled from the failed raid.  I removed the volume group and was finally able to gain exclusive access to the array to stop it, put in the new disk and rebuild the array.
     
    Nice job.
    Best,
    Dave.

  5. [...] we'll use for this is the APCUPSD daemon available in RPM format. We've set one up for our HTPCB server for a home redundancy / backup solution to protect against power surges and bridge the [...]

  6. [...] every time while transferring my files.  At the point, I not only lost connectivity with the HTPC+B but also my web access most of the time.  Here are the culprits and here's how we went [...]

  7. [...] removed the cable and the adapter and only used a 2 foot cable to my HTPC+B system I've just configured.  Voila!  Problem solved.  Ultimately, it's [...]

  8. [...] them from system to system to avoid choppy video / sound and also to accommodate the needs of our HTPC+B solution through file [...]

  9. [...] Linux Networking: Persistent naming rules based on MAC for eth0 and wlan0 Linux: HTPC / Home Backup: MDADM, RAID6, LVM, XFS, CIFS and NFS [...]

  10. [...] at this point and 4:15 minutes have passed).  While this was going on, we are referencing our HTPC page for [...]

  11. [...] HTPC, Backup & Storage [...]

Leave a Reply

You must be logged in to post a comment.


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License