Linux: HTPC / Home Backup: MDADM, RAID6, LVM, XFS, CIFS and NFS | The IT Development and Technology Mini Vault

HTPC / Backup Home Server Solution using Linux

Pages: 1 2 3 4 5

ZFS ATTEMPTS

Prior to using XFS, we gave ZFS a serious try here. This came after some research online on the best options available. There are other contenders such as BRTFS (In development) and ReiserFS which is no longer developed. BRTFS is intrieguing however it's development state would prevent it from being used on most critical environments and we also needed a good record of stability. ZFS being taunted as the next best thing in filesystems and even the last word in filesystems, seemed like a great choice for this setup.

So naturally it was the first thing we tried. It's important to mention one specific item about ZFS: We used a port of ZFS to Linux called ZFS Fuse. Sun Microsystems / Oracle licencing fobids the original code to be used.

ZFS was originally designed for the Solaris OS and the source code is licenced to Oracle (Formerly Sun Microsystems). So as with any port, we were hopeful that the means of porting the code over to Linux didn't effect performance. We definitely expected a solid piece of software:

ITEM #	DESCRIPTION	COMMAND(S)
#1	Install the ZFS Fuse RPM.	In this case, I opted to use the zfs-fuse (Mind you, this is April 2012) # rpm -aq\|grep -i zfs zfs-fuse-0.6.9-6.20100709git.el6.x86_64 # which zfs /usr/bin/zfs # which zpool /usr/bin/zpool #
#2	Create the ZFS Filesystem.	We should now be ready to create our ZFS Filesystem. In this case /dev/raidmd0 is already a raid so there is only one device to use: # zpool create MBPCBackupz raidz /dev/MBPCStorage/MBPCBackup If you receive an error that the ZFS Daemon is required for this to start issue the following: # zpool create MBPCBackupz raidz /dev/MBPCStorage/MBPCBackup connect: No such file or directory Please make sure that the zfs-fuse daemon is running. internal error: failed to initialize ZFS library # setup # service zfs-fuse restart Starting zfs-fuse: [ OK ] Immunizing zfs-fuse against OOM kills [ OK ] Mounting zfs partitions: [ OK ] # ps -ef\|grep -i zfs root 3222 1 0 02:18 ? 00:00:00 /usr/bin/zfs-fuse -p /var/run/zfs-fuse.pid root 3310 23918 0 02:18 pts/2 00:00:00 grep -i zfs # Trying again: # zpool create MBPCBackupz raidz /dev/MBPCStorage/MBPCBackup invalid vdev specification: raidz requires at least 2 devices # And finally again after changing raidz to disk (raidz is about as good as RAID5 but we have RAID6 above.): # zpool create MBPCBackupz /dev/MBPCStorage/MBPCBackup #
#3	Check that your ZFS exists. Redo if changes are needed.	Checking your ZFS is fairly straight forward. Apparently, thought has been given to maintenance and support time and dollar costs for ZFS: # zfs list NAME USED AVAIL REFER MOUNTPOINT MBPCBackupz 73.5K 1.78T 21K /MBPCBackupz # # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT MBPCBackupz 1.81T 78K 1.81T 0% 1.00x ONLINE – # Obviously, I failed to realize that the mount point could make a difference. So let's try again: # zpool destroy MBPCBackupz # zpool list no pools available # # zpool create MBPCBackupz /dev/MBPCStorage/MBPCBackup -m /mnt/MBPCBackupz # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT MBPCBackupz 1.81T 87K 1.81T 0% 1.00x ONLINE – #
#4	ZFS on our zpool.	The ZFS filesystem has quite the extensive array of utilities for all sorts stats: # zfs get all MBPCBackupz NAME PROPERTY VALUE SOURCE MBPCBackupz type filesystem – MBPCBackupz creation Sun Feb 26 2:51 2012 – MBPCBackupz used 701M – MBPCBackupz available 1.78T – MBPCBackupz referenced 701M – MBPCBackupz compressratio 1.00x – MBPCBackupz mounted yes – MBPCBackupz quota none default MBPCBackupz reservation none default MBPCBackupz recordsize 128K default MBPCBackupz mountpoint /mnt/MBPCBackupz local MBPCBackupz sharenfs off default MBPCBackupz checksum on default MBPCBackupz compression off default MBPCBackupz atime on default MBPCBackupz devices on default MBPCBackupz exec on default MBPCBackupz setuid on default MBPCBackupz readonly off default MBPCBackupz zoned off default MBPCBackupz snapdir hidden default MBPCBackupz aclmode groupmask default MBPCBackupz aclinherit restricted default MBPCBackupz canmount on default MBPCBackupz xattr on default MBPCBackupz copies 1 default MBPCBackupz version 4 – MBPCBackupz utf8only off – MBPCBackupz normalization none – MBPCBackupz casesensitivity sensitive – MBPCBackupz vscan off default MBPCBackupz nbmand off default MBPCBackupz sharesmb off default MBPCBackupz refquota none default MBPCBackupz refreservation none default MBPCBackupz primarycache all default MBPCBackupz secondarycache all default MBPCBackupz usedbysnapshots 0 – MBPCBackupz usedbydataset 701M – MBPCBackupz usedbychildren 61.5K – MBPCBackupz usedbyrefreservation 0 – MBPCBackupz logbias latency default MBPCBackupz dedup off default MBPCBackupz mlslabel off – #
#5	(Optional) Set the compression to allow for more space.	Let's enable compression to get a bit more space out of this combination: # zfs set compression=on MBPCBackupz # After deleting a temp file we copied earlier, space only drops after a few seconds: # zfs list NAME USED AVAIL REFER MOUNTPOINT MBPCBackupz 701M 1.78T 701M /mnt/MBPCBackupz # zfs list NAME USED AVAIL REFER MOUNTPOINT MBPCBackupz 82.5K 1.78T 21K /mnt/MBPCBackupz # # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT MBPCBackupz 1.81T 122K 1.81T 0% 1.00x ONLINE – # NOTE: One thing we have to do is to remind ourselves of the fact that we are doing this customization and creating the ZFS filesystem all the while the RAID6 is still resyncing: top – 03:08:40 up 11:08, 5 users, load average: 1.76, 1.33, 1.22 Tasks: 244 total, 2 running, 241 sleeping, 0 stopped, 1 zombie Cpu0 : 6.3%us, 36.3%sy, 0.0%ni, 45.9%id, 0.0%wa, 0.0%hi, 11.6%si, 0.0%st Cpu1 : 11.3%us, 24.5%sy, 0.0%ni, 63.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 3920884k total, 2939120k used, 981764k free, 27644k buffers Swap: 4194296k total, 8k used, 4194288k free, 1771964k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 893 root 20 0 0 0 0 S 47.0 0.0 218:28.05 md127_raid6 28276 root 20 0 0 0 0 D 17.9 0.0 16:55.23 md127_resync # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md127 : active raid6 sdi[3] sdh[0] sde[1] sdd[2] sdb[4] sda[5] 3907045696 blocks super 1.2 level 6, 16k chunk, algorithm 2 [6/5] [UUU_UU] [======>…………..] recovery = 31.2% (305691368/976761424) finish=220.8min speed=50643K/sec unused devices: <none> #
#6	Rudimentary Performance Test	Now we will copy a large file and time the operation from start to finish while the RAID 6 is resyncing the last disk and with the compression set to ON on the ZFS pool (4.7GB file): # date +%s:%N; cp -p /mnt/OD1.5TB/SampleBinary.dat .; date +%s:%N; 1330243909:207559045 1330244114:162053993 # So this took 1330244114 – 1330243909 = 205 seconds or just 3 minutes. It's hard to say if the limitation of the source single 1.5TB disk or the RAID6 array we created above is to blame for the slow speed. We can retest this later. Now let's check the compression ratio of our ZFS pool after the copy: # zfs get compressratio MBPCBackupz NAME PROPERTY VALUE SOURCE MBPCBackupz compressratio 1.00x – # zpool status MBPCBackupz pool: MBPCBackupz state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM MBPCBackupz ONLINE 0 0 0 MBPCStorage/MBPCBackup ONLINE 0 0 0 errors: No known data errors # Still 1.00x for compression but no errors. Not surprised, file was already a compressed one. 🙂 Just for fun, and though theres no errors, let's scrub the pool: # zpool status MBPCBackupz pool: MBPCBackupz state: ONLINE see: http://www.sun.com/msg/ZFS-8000-EY scrub: scrub in progress for 0h0m, 23.54% done, 0h0m to go config: NAME STATE READ WRITE CKSUM MBPCBackupz ONLINE 0 0 0 MBPCStorage/MBPCBackup ONLINE 0 0 0 errors: No known data errors # So the overall speed of copy and compression of about 4.7GB, even though little was compressed, was 23MB per second. Degraded but still not bad. Checking with du -ah, we can see that the compression was only lightly effective on the large 4.7GB file but significant (from 127MB to 27MB) on the text file: # du -ah 4.4G ./SampleBinary.dat 27M ./wpa_supplicant_watch.log-20111226 4.5G . # du -a 4606003 ./SampleBinary.dat 27639 ./wpa_supplicant_watch.log-20111226 4633651 . # Very nice and successful setup, but only from a ease of configuration perspective: RAID6 (mdadm) LVM2 ZFS gzip Some static iostat statistics on our configuration: # iostat Linux 2.6.32-131.12.1.el6.x86_64 (mbpc) 02/26/2012 _x86_64_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 10.70 0.05 24.86 0.73 0.00 63.67 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 739.42 64390.64 66.14 2673766075 2746394 sdb 735.83 64390.75 66.15 2673770703 2746802 sdd 836.45 67273.85 66.13 2793488937 2746106 sde 722.54 64390.85 66.11 2673774628 2745154 sdg 7.87 337.86 340.54 14029176 14140448 dm-0 43.79 62.07 338.18 2577228 14042688 dm-1 0.01 0.09 0.00 3914 0 dm-2 0.01 0.09 0.00 3650 0 md127 28.93 223.94 258.45 9298765 10731930 dm-3 1.09 267.77 0.06 11118820 2632 dm-4 0.79 6.36 0.00 263956 64 dm-5 0.13 0.79 0.22 32660 9192 sdh 580.92 17345.18 47111.61 720243498 1956269152 sdi 105.80 38.88 17364.42 1614442 721042636 dm-6 28.85 223.27 258.45 9271035 10731850 # # iostat -x -k -d Linux 2.6.32-131.12.1.el6.x86_64 (mbpc) 02/26/2012 _x86_64_ (2 CPU) Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 7309.77 7.59 739.35 0.45 32206.22 33.05 87.16 0.86 1.17 0.26 19.47 sdb 7313.38 7.59 735.75 0.46 32206.26 33.05 87.58 0.77 1.04 0.25 18.46 sdd 7573.79 7.60 836.72 0.43 33652.00 33.05 80.47 1.76 2.11 0.30 25.10 sde 7326.68 7.59 722.47 0.45 32206.31 33.03 89.19 0.84 1.16 0.27 19.41 sdg 0.90 37.46 2.55 5.32 168.83 170.18 86.15 0.16 20.75 2.93 2.30 dm-0 0.00 0.00 1.44 42.33 31.01 169.00 9.14 3.58 81.86 0.46 2.02 dm-1 0.00 0.00 0.01 0.00 0.05 0.00 7.91 0.00 4.54 2.65 0.00 dm-2 0.00 0.00 0.01 0.00 0.04 0.00 7.90 0.00 1.20 1.18 0.00 md127 0.00 0.00 13.40 15.51 111.90 129.15 16.67 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 1.08 0.01 133.80 0.03 245.79 0.01 4.82 2.63 0.29 dm-4 0.00 0.00 0.79 0.00 3.18 0.00 8.00 0.00 4.28 0.40 0.03 dm-5 0.00 0.00 0.10 0.03 0.39 0.11 7.99 0.00 14.05 0.52 0.01 sdh 1943.95 5507.00 228.05 353.33 8697.72 23541.54 110.90 3.15 5.41 0.80 46.23 sdi 2.70 2063.57 2.04 104.07 19.43 8707.34 164.47 0.56 5.25 1.13 11.98 dm-6 0.00 0.00 13.32 15.51 111.57 129.15 16.70 0.41 14.12 0.15 0.44 # INTERESTING NOTE: After we copied both files above, the system appears to be still compressing the files as the ratio went from 1.01x to 1.02x after copy was done for both files. This would appear to be a nice feature however I'm not sure I would want lingering processes on the system when production jobs need the CPU: # zfs get compressratio MBPCBackupz NAME PROPERTY VALUE SOURCE MBPCBackupz compressratio 1.02x – # Full read/write from sdg to md127(sda, sdb, sdd, sde, sdh, sdi) (Nothing at 100%?) Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 15.33 1414.90 21.17 77.53 153.33 6109.10 126.90 0.43 4.13 2.40 23.69 sdb 15.03 1413.50 22.03 84.20 155.33 6124.30 118.22 0.45 4.10 2.25 23.91 sdd 15.87 1417.23 23.03 77.13 161.60 6116.30 125.35 0.46 4.38 2.47 24.75 sde 15.87 1415.67 21.47 76.00 155.47 6107.23 128.51 0.40 3.95 2.28 22.22 sdg 0.50 6.43 184.87 1.47 23531.07 31.20 252.90 0.79 4.24 2.28 42.47 dm-0 0.00 0.00 1.77 7.80 26.00 31.20 11.96 1.61 168.64 10.36 9.91 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md127 0.00 0.00 1.73 2871.47 27.73 23940.53 16.68 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 183.57 0.00 23496.53 0.00 256.00 0.66 3.62 2.28 41.76 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 15.57 1416.67 23.63 74.33 163.87 6102.43 127.93 0.45 4.35 2.50 24.53 sdi 15.67 1414.50 18.80 82.03 144.40 6121.50 124.28 0.42 3.99 2.38 24.02 dm-6 0.00 0.00 1.73 2871.47 27.73 23940.53 16.68 68.81 23.95 0.20 57.54 and took 197 seconds. Time to check the CPU% to see if the process is limited by a single execution core: top – 08:49:51 up 16:49, 8 users, load average: 0.51, 0.21, 0.12 Tasks: 293 total, 2 running, 290 sleeping, 0 stopped, 1 zombie Cpu0 : 14.6%us, 17.5%sy, 0.0%ni, 47.4%id, 20.2%wa, 0.0%hi, 0.3%si, 0.0%st Cpu1 : 30.6%us, 13.7%sy, 0.0%ni, 22.5%id, 32.6%wa, 0.3%hi, 0.3%si, 0.0%st Mem: 3920884k total, 3769812k used, 151072k free, 64652k buffers Swap: 4194296k total, 480k used, 4193816k free, 2403000k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3222 root 20 0 1417m 173m 1516 S 35 4.5 3:29.04 zfs-fuse 3243 root 20 0 1104m 68m 15m R 13 1.8 123:47.75 npviewer.bin 32530 root 20 0 111m 972 708 D 9 0.0 0:06.75 cp 893 root 20 0 0 0 0 S 7 0.0 332:40.52 md127_raid6 3127 root 20 0 1099m 291m 21m S 4 7.6 62:17.18 firefox 2244 root 20 0 187m 64m 9936 S 3 1.7 19:34.16 Xorg 38 root 20 0 0 0 0 S 2 0.0 0:06.01 kswapd0 2660 root 20 0 138m 3004 2380 S 1 0.1 0:04.70 gvfsd-trash 18330 videouse 20 0 138m 2856 2352 S 1 0.1 0:02.66 gvfsd-trash 3169 root 20 0 292m 13m 9108 S 1 0.4 1:39.88 gnome-terminal 11967 root 20 0 15220 1352 904 R 1 0.0 1:41.24 top 22 root 20 0 0 0 0 S 0 0.0 2:30.00 kblockd/0 which, at first glance, doesn't appear to be the case because zfs-fuse is well under 50% in Irix mode however when checking all the CPU counters the usage is in fact substantial (The ZFS Daemon?). Copying a file back, yields this: top – 08:54:26 up 16:53, 8 users, load average: 1.60, 0.64, 0.30 Tasks: 294 total, 4 running, 289 sleeping, 0 stopped, 1 zombie Cpu0 : 17.0%us, 28.3%sy, 0.0%ni, 6.7%id, 35.7%wa, 0.0%hi, 12.3%si, 0.0%st Cpu1 : 15.8%us, 34.5%sy, 0.0%ni, 0.0%id, 47.7%wa, 0.3%hi, 1.6%si, 0.0%st Mem: 3920884k total, 3776932k used, 143952k free, 64988k buffers Swap: 4194296k total, 480k used, 4193816k free, 2371892k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3222 root 20 0 1417m 177m 1516 S 46 4.6 4:27.26 zfs-fuse 893 root 20 0 0 0 0 S 16 0.0 332:53.63 md127_raid6 770 root 20 0 111m 880 712 R 13 0.0 0:04.74 cp 3243 root 20 0 1104m 68m 15m R 13 1.8 124:23.04 npviewer.bin 771 root 20 0 0 0 0 D 5 0.0 0:01.43 flush-253:3 3127 root 20 0 1099m 292m 21m S 4 7.6 62:28.77 firefox 2244 root 20 0 187m 64m 9936 S 4 1.7 19:42.01 Xorg Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 81.83 0.00 1308.63 0.00 9825.65 0.00 15.02 2.57 1.96 0.17 21.97 sdb 85.83 0.00 1304.10 0.00 9831.33 0.00 15.08 2.96 2.28 0.17 22.61 sdd 79.73 0.00 1309.20 0.00 9824.80 0.00 15.01 1.26 0.97 0.14 17.78 sde 82.33 0.00 1307.93 0.00 9823.03 0.00 15.02 1.17 0.90 0.14 17.76 sdg 0.00 11392.93 3.13 91.50 60.53 46054.93 974.61 142.69 1522.58 10.57 100.00 dm-0 0.00 0.00 3.10 5.03 60.40 20.00 19.77 2.79 342.85 40.18 32.68 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md127 0.00 0.00 5803.60 0.00 48809.82 0.00 16.82 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.03 11478.83 0.13 45915.33 8.00 18153.20 1596.93 0.09 100.00 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 84.97 0.00 1304.03 0.00 9823.73 0.00 15.07 3.09 2.37 0.18 23.29 sdi 84.03 0.00 1306.27 0.00 9825.83 0.00 15.04 1.61 1.23 0.14 18.81 dm-6 0.00 0.00 5803.60 0.00 48809.82 0.00 16.82 20.97 3.62 0.07 42.71 Which means reads are much faster out of the RAID6 /dev/raidmd0. The bottleneck is clearly the target here but it's not so clear where the bottleneck is with a single drive to RAID6 /dev/raidmd0 copy. So reading could go up to 115MB/s theoretically but writes suffer at no higher then 25MB/s. (This is very slow). Tweaking time: cat /sys/block/md127/md/stripe_cache_size OR cat /sys/block/$(awk 'BEGIN { "ls -al /dev/raidmd0" \| getline; print $NF }')/md/stripe_cache_size and so doing this action: # echo "8192" > /sys/block/md127/md/stripe_cache_size had absolutely no effect: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 20.60 1510.37 22.10 92.07 177.47 6580.90 118.39 0.51 4.23 2.21 25.28 sdb 18.70 1507.87 22.80 96.10 172.53 6580.37 113.59 0.45 3.66 2.04 24.28 sdd 17.90 1515.40 22.97 91.13 170.13 6595.83 118.60 0.51 4.21 2.31 26.34 sde 21.27 1513.87 22.27 93.63 180.93 6588.90 116.82 0.50 4.09 2.13 24.64 sdg 0.03 122.07 192.90 4.83 24229.33 506.93 250.20 0.77 3.89 2.23 44.09 dm-0 0.00 0.00 4.00 126.73 50.93 506.93 8.53 6.45 49.31 0.36 4.72 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md127 0.00 0.00 1.47 3193.80 23.47 25799.33 16.16 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 188.90 0.00 24174.13 0.00 255.95 0.62 3.28 2.12 39.95 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 19.70 1513.63 22.13 94.57 173.60 6601.83 116.12 0.51 4.13 2.24 26.19 sdi 19.77 1513.33 23.23 88.30 179.33 6577.30 121.16 0.50 4.21 2.35 26.21 dm-6 0.00 0.00 1.47 3193.80 23.47 25799.33 16.16 75.74 23.67 0.19 59.21 There's a script on this post that suggests a number of values you can use (http://ubuntuforums.org/showthread.php?t=1494846&page=3) It only reports but doesn't set so is handy to experiment for this purpose: # ./tune.bash check /tmp/tune_raid.log for messages in case of error. suggested read ahead size per device: 768 blocks (384kb) suggested read ahead size of array: 4608 blocks (2304kb) RUN blockdev –setra 768 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi your current value for readahead is 256 256 256 256 256 256 RUN blockdev –setra 4608 /dev/md127 your current value for readahead is 256 suggested stripe cache size of devices: 96 pages (384kb) RUN echo 96 > /sys/block/md127/md/stripe_cache_size current value of /sys/block/md127/md/stripe_cache_size is 8192 setting max sectors kb to match chunk size RUN echo 16 > /sys/block/sdi/queue/max_sectors_kb current value of /sys/block/sdi/queue/max_sectors_kb is 512 RUN echo 16 > /sys/block/sdh/queue/max_sectors_kb current value of /sys/block/sdh/queue/max_sectors_kb is 512 RUN echo 16 > /sys/block/sde/queue/max_sectors_kb current value of /sys/block/sde/queue/max_sectors_kb is 512 RUN echo 16 > /sys/block/sdd/queue/max_sectors_kb current value of /sys/block/sdd/queue/max_sectors_kb is 512 RUN echo 16 > /sys/block/sdb/queue/max_sectors_kb current value of /sys/block/sdb/queue/max_sectors_kb is 512 RUN echo 16 > /sys/block/sda/queue/max_sectors_kb current value of /sys/block/sda/queue/max_sectors_kb is 512 setting NCQ queue depth to 1 RUN echo 1 > /sys/block/sdi/device/queue_depth current value of /sys/block/sdi/device/queue_depth is 31 RUN echo 1 > /sys/block/sdh/device/queue_depth current value of /sys/block/sdh/device/queue_depth is 31 RUN echo 1 > /sys/block/sde/device/queue_depth current value of /sys/block/sde/device/queue_depth is 31 RUN echo 1 > /sys/block/sdd/device/queue_depth current value of /sys/block/sdd/device/queue_depth is 31 RUN echo 1 > /sys/block/sdb/device/queue_depth current value of /sys/block/sdb/device/queue_depth is 31 RUN echo 1 > /sys/block/sda/device/queue_depth current value of /sys/block/sda/device/queue_depth is 31 After the above, the write took 261 seconds from the earlier 197 seconds. So degraded performance. The difference is most visible on iostat: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 64.17 756.40 54.37 362.40 490.53 4502.53 23.96 4.27 10.23 1.08 45.19 sdb 56.87 756.23 53.23 364.53 458.40 4508.40 23.78 4.03 9.63 1.05 43.95 sdd 56.13 751.40 53.43 365.60 458.93 4493.87 23.64 3.63 8.65 0.99 41.37 sde 66.20 755.03 55.67 361.23 505.87 4490.80 23.97 4.06 9.70 1.05 43.74 sdg 1.03 6.57 139.27 2.40 17570.27 35.07 248.55 0.61 4.33 2.34 33.20 dm-0 0.00 0.00 3.37 8.83 42.80 35.07 12.77 1.59 130.33 8.47 10.33 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md127 0.00 0.00 1.33 2189.10 21.33 17552.62 16.05 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 137.00 0.00 17536.00 0.00 256.00 0.48 3.52 2.24 30.62 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 58.40 760.20 51.70 355.87 461.87 4498.53 24.34 4.41 10.82 1.13 46.16 sdi 63.23 755.70 55.20 362.87 489.20 4500.40 23.87 3.62 8.64 1.05 43.80 dm-6 0.00 0.00 1.33 2189.10 21.33 17552.62 16.05 79.42 36.30 0.31 68.95 So I try to use my own numbers instead. MY NUMBERS: echo 4096 > /sys/block/md127/md/stripe_cache_size blockdev –setra 1024 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi blockdev –setra 16484 /dev/md127 for mskb in sdi sdh sde sdd sdb sda; do echo 4096 > /sys/block/$mskb/queue/max_sectors_kb; done for qdepth in sdi sdh sde sdd sdb sda; do echo 256 > /sys/block/$qdepth/device/queue_depth; done This braught it back to 196 seconds for a 4.7GB file. Looking at the iostat -x -k -d 30 numbers, the individual RAID6 disks are nearly half as busy with the higher numbers. This is a good indication: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 14.03 1415.77 18.70 78.37 135.60 6117.80 128.85 0.43 4.17 2.38 23.14 sdb 12.77 1417.63 19.03 80.47 131.33 6130.73 125.87 0.41 3.90 2.33 23.22 sdd 13.03 1416.23 19.70 79.77 137.07 6125.93 125.93 0.47 4.55 2.64 26.27 sde 13.37 1415.53 18.70 80.67 135.33 6123.53 125.98 0.41 3.99 2.33 23.14 sdg 0.17 5.73 187.47 1.40 23914.80 28.13 253.54 0.75 3.98 2.20 41.58 dm-0 0.00 0.00 0.97 7.03 17.20 28.13 11.33 0.56 70.47 3.61 2.89 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md127 0.00 0.00 1.47 2852.20 23.47 23960.73 16.81 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 186.70 0.00 23897.60 0.00 256.00 0.64 3.44 2.18 40.73 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 14.30 1414.70 20.00 80.23 143.07 6117.67 124.92 0.44 4.22 2.40 24.02 sdi 12.60 1416.63 19.00 81.60 131.87 6133.67 124.56 0.49 4.62 2.61 26.27 dm-6 0.00 0.00 1.47 2852.20 23.47 23960.73 16.81 67.94 23.80 0.21 58.66 So let's try some higher numbers by doubling them: echo 8192 > /sys/block/md127/md/stripe_cache_size blockdev –setra 2048 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi blockdev –setra 32768 /dev/md127 for mskb in sdi sdh sde sdd sdb sda; do echo 8192 > /sys/block/$mskb/queue/max_sectors_kb; done for qdepth in sdi sdh sde sdd sdb sda; do echo 256 > /sys/block/$qdepth/device/queue_depth; done NOTE: putting 256 failed this time. It couldn't set higher then 31 for queue_depth which is interesting. The above did give me the fastest result at 192 seconds for 4.7GB: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 17.37 1416.83 21.73 84.60 163.48 6180.37 119.32 0.47 4.22 2.28 24.25 sdb 16.40 1414.93 20.87 84.57 156.53 6175.30 120.11 0.44 3.92 2.19 23.08 sdd 15.73 1416.70 20.10 85.73 148.80 6188.37 119.76 0.45 4.11 2.31 24.44 sde 17.00 1416.33 21.20 87.83 159.47 6191.03 116.49 0.44 3.88 2.11 23.01 sdg 0.17 63.83 191.50 2.80 24238.80 266.00 252.24 0.77 3.99 2.25 43.64 dm-0 0.00 0.00 2.70 66.50 51.07 266.00 9.16 4.78 69.05 0.69 4.76 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md127 0.00 0.00 1.67 2953.93 25.68 24204.58 16.40 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 188.97 0.00 24187.73 0.00 256.00 0.65 3.44 2.19 41.39 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 17.20 1419.53 19.60 81.17 153.07 6184.63 125.79 0.48 4.47 2.38 23.98 sdi 15.60 1415.27 21.30 87.73 155.07 6173.70 116.09 0.45 3.95 2.27 24.76 dm-6 0.00 0.00 1.67 2953.93 25.68 24204.58 16.40 71.91 24.30 0.20 57.94 Ok. Let's try with these: echo 8192 > /sys/block/md127/md/stripe_cache_size blockdev –setra 8192 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi blockdev –setra 32768 /dev/md127 for mskb in sdi sdh sde sdd sdb sda; do echo 8192 > /sys/block/$mskb/queue/max_sectors_kb; done for qdepth in sdi sdh sde sdd sdb sda; do echo 512 > /sys/block/$qdepth/device/queue_depth; done This resulted in some sustained improvements from around 24MB even to 24.8MB even. Not a big improvement, ok an abysmal improvement, but an improvement and write time was 191s: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 15.43 1466.03 20.00 80.93 149.33 6342.73 128.64 0.42 4.05 2.39 24.08 sdb 13.67 1462.10 20.17 87.33 141.33 6346.60 120.71 0.39 3.54 2.16 23.26 sdd 14.47 1465.80 21.13 79.03 149.20 6343.27 129.63 0.43 4.13 2.42 24.20 sde 15.37 1463.27 20.23 80.87 148.40 6327.67 128.11 0.41 3.92 2.23 22.59 sdg 0.37 8.10 188.40 2.47 23924.13 41.60 251.13 0.82 4.28 2.25 43.04 dm-0 0.00 0.00 2.13 10.40 35.07 41.60 12.23 1.24 98.82 4.91 6.16 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md127 0.00 0.00 1.60 3027.00 25.60 24818.23 16.41 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 186.63 0.00 23889.07 0.00 256.00 0.65 3.47 2.21 41.20 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 13.80 1463.50 20.00 81.77 142.00 6331.80 127.23 0.43 4.09 2.40 24.42 sdi 16.37 1460.53 20.67 84.03 156.40 6337.27 124.04 0.43 3.97 2.37 24.84 dm-6 0.00 0.00 1.60 3027.17 25.60 24820.50 16.41 72.43 23.92 0.19 58.04 Next I will tune the stripe_cache_size to a higher number and see: echo 32768 > /sys/block/md127/md/stripe_cache_size blockdev –setra 32768 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi blockdev –setra 32768 /dev/md127 for mskb in sdi sdh sde sdd sdb sda; do echo 16384 > /sys/block/$mskb/queue/max_sectors_kb; done for qdepth in sdi sdh sde sdd sdb sda; do echo 512 > /sys/block/$qdepth/device/queue_depth; done The results were slightly worse. So let's try these numbers: echo 8192 > /sys/block/md127/md/stripe_cache_size blockdev –setra 4096 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi blockdev –setra 32768 /dev/md127 for mskb in sdi sdh sde sdd sdb sda; do echo 8192 > /sys/block/$mskb/queue/max_sectors_kb; done for qdepth in sdi sdh sde sdd sdb sda; do echo 512 > /sys/block/$qdepth/device/queue_depth; done A degredation from 24.8MB/s but improvement from earlier so I set it back to what I had that gave me 24.8MB/s: echo 8192 > /sys/block/md127/md/stripe_cache_size blockdev –setra 8192 /dev/sda /dev/sdb /dev/sdd /dev/sde /dev/sdh /dev/sdi blockdev –setra 32768 /dev/md127 for mskb in sdi sdh sde sdd sdb sda; do echo 8192 > /sys/block/$mskb/queue/max_sectors_kb; done for qdepth in sdi sdh sde sdd sdb sda; do echo 31 > /sys/block/$qdepth/device/queue_depth; done And we're back to where we started: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 14.17 1465.27 20.43 81.00 146.53 6348.47 128.06 0.41 3.93 2.31 23.42 sdb 16.63 1464.10 20.40 84.40 154.80 6356.73 124.27 0.43 3.95 2.23 23.33 sdd 15.17 1467.00 18.67 84.03 142.27 6368.20 126.79 0.42 3.91 2.39 24.52 sde 13.27 1467.40 19.03 84.90 136.13 6371.67 125.23 0.40 3.74 2.15 22.38 sdg 0.00 1.40 187.20 1.27 23895.73 10.40 253.69 0.85 4.51 2.14 40.42 dm-0 0.00 0.00 0.57 2.60 6.67 10.40 10.78 0.24 74.99 7.93 2.51 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md127 0.00 0.00 1.60 3120.27 25.60 24887.83 15.96 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 186.63 0.00 23889.07 0.00 256.00 0.62 3.32 2.12 39.64 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 15.77 1467.90 20.43 80.70 151.60 6360.33 128.78 0.45 4.29 2.40 24.28 sdi 14.13 1465.30 19.60 82.77 143.07 6367.27 127.20 0.44 4.14 2.51 25.70 dm-6 0.00 0.00 1.60 3120.27 25.60 24887.83 15.96 75.88 24.29 0.19 59.85 So the above peaked at 24.8MB/s. Next I'll try to reset the chunk size from 16 to 128 (Man recommends 512): # mdadm –grow /dev/raidmd0 –chunk-size=128 # mdadm –grow /dev/raidmd0 –chunk=32K mdadm: New chunk size does not divide component size # And that's where we apparently run into BZ#723137 (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=723137). Looks like I have to destroy my ZFS and RAID6 and start all over with a larger chunk size. So let's do that with a condensed set of steps: # zpool destroy MBPCBackupz # # zpool list no pools available # zfs list no datasets available # # lvm lvremove /dev/MBPCStorage/MBPCBackup Do you really want to remove active logical volume MBPCBackup? [y/n]: y Logical volume "MBPCBackup" successfully removed # # lvm lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert oLogVol02 VGEntertain -wi-ao 151.00g olv_root VGEntertain -wi-ao 32.00g olv_swap VGEntertain -wi-a- 4.00g fmlv mbpcvg -wi-ao 1.15t rootlv mbpcvg -wi-ao 31.25g swaplv mbpcvg -wi-ao 4.00g # lvm vgs VG #PV #LV #SN Attr VSize VFree MBPCStorage 1 0 0 wz–n- 3.64t 3.64t VGEntertain 1 3 0 wz–n- 187.00g 0 mbpcvg 1 3 0 wz–n- 1.18t 0 # # lvm vgremove MBPCStorage Volume group "MBPCStorage" successfully removed # lvm vgs VG #PV #LV #SN Attr VSize VFree VGEntertain 1 3 0 wz–n- 187.00g 0 mbpcvg 1 3 0 wz–n- 1.18t 0 # # lvm pvremove /dev/raidmd0 Labels on physical volume "/dev/raidmd0" successfully wiped [root@mbpc mnt]# lvm pvs PV VG Fmt Attr PSize PFree /dev/sdg2 mbpcvg lvm2 a- 1.18t 0 /dev/sdg3 VGEntertain lvm2 a- 187.00g 0 # Next we stop our array: # mdadm –detail /dev/raidmd0 /dev/raidmd0: Version : 1.2 Creation Time : Mon Jan 30 00:22:17 2012 Raid Level : raid6 Array Size : 3907045696 (3726.05 GiB 4000.81 GB) Used Dev Size : 976761424 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 6 Persistence : Superblock is persistent Update Time : Sun Feb 26 14:11:14 2012 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 16K Name : mbpc:0 (local to host mbpc) UUID : b9c13d43:a7a1d949:f20dd93a:cb41cc00 Events : 312 Number Major Minor RaidDevice State 0 8 112 0 active sync /dev/sdh 1 8 64 1 active sync /dev/sde 2 8 48 2 active sync /dev/sdd 3 8 128 3 active sync /dev/sdi 4 8 16 4 active sync /dev/sdb 5 8 0 5 active sync /dev/sda # # mdadm –stop /dev/raidmd0 # mdadm –detail /dev/raidmd0 mdadm: cannot open /dev/raidmd0: No such file or directory # cat /proc/mdadm cat: /proc/mdadm: No such file or directory # And now we recreate our array: mdadm –create –verbose /dev/md0 –level=raid6 –chunk=64K –auto=p –raid-devices=6 –spare-devices=0 /dev/rsd{a,b,c,d,e,f} lvm pvcreate /dev/raidmd0 lvm vgcreate MBPCStorage /dev/raidmd0 lvm lvcreate -L3906254360S -n MBPCBackup MBPCStorage zpool create MBPCBackupz /dev/MBPCStorage/MBPCBackup -m /mnt/MBPCBackupz zfs set compression=on MBPCBackupz Speed was still abysmal however at 191 seconds for 4.7GB: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.80 191.30 1.17 24418.93 7.60 253.83 1.04 5.39 2.17 41.79 dm-0 0.00 0.00 0.63 1.90 13.60 7.60 16.74 0.43 169.89 16.88 4.28 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 190.73 0.00 24413.87 0.00 256.00 0.64 3.37 2.14 40.74 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 429.73 1891.23 39.80 93.30 1890.53 7978.63 148.30 0.62 4.53 2.37 31.57 md0 0.00 0.00 0.43 479.03 27.73 24730.58 103.27 0.00 0.00 0.00 0.00 sdc 435.37 1883.03 40.37 91.07 1918.40 7942.77 150.06 0.55 4.05 2.15 28.25 sdd 456.23 1903.57 43.97 94.23 2010.53 8031.43 145.33 0.50 3.46 1.92 26.50 sde 449.67 1891.33 37.67 89.87 1962.80 7966.37 155.71 0.63 4.84 2.62 33.36 sdf 436.43 1896.80 39.03 96.00 1915.20 8008.23 146.98 0.53 3.82 2.09 28.25 sdg 425.77 1921.77 41.03 96.93 1882.67 8115.97 144.94 0.55 3.88 2.10 29.00 dm-6 0.00 0.00 0.43 479.03 27.73 24730.58 103.27 11.11 23.18 1.23 58.75 Time to tweak again: cat /sys/block/md0/md/stripe_cache_size *blockdev –getra $(echo $(ls -al /dev/rsd\|awk '{ print "/dev/"$NF }')) blockdev –getra /dev/md0 /dev/raidmd0 for mskb in $(ls -al /dev/rsd\|awk '{ print $NF }'); do cat /sys/block/$mskb/queue/max_sectors_kb; done* *for qdepth in $(ls -al /dev/rsd\|awk '{ print $NF }'); do cat /sys/block/$qdepth/device/queue_depth; done Verify our mappings for changing parameters (UDEV Rules*): # ls -al /dev/rsd lrwxrwxrwx. 1 root root 3 Feb 27 13:41 /dev/rsda -> sdb lrwxrwxrwx. 1 root root 3 Feb 27 13:41 /dev/rsdb -> sdc lrwxrwxrwx. 1 root root 3 Feb 27 13:41 /dev/rsdc -> sdd lrwxrwxrwx. 1 root root 3 Feb 27 13:41 /dev/rsdd -> sde lrwxrwxrwx. 1 root root 3 Feb 27 13:41 /dev/rsde -> sdf lrwxrwxrwx. 1 root root 3 Feb 27 13:41 /dev/rsdf -> sdg # And let's try with yet another combination of numbers: echo 8192 > /sys/block/md0/md/stripe_cache_size *blockdev –setra 8192 $(echo $(ls -al /dev/rsd\|awk '{ print "/dev/"$NF }')) blockdev –setra 32768 /dev/md127 for mskb in $(ls -al /dev/rsd\|awk '{ print $NF }'); do echo 8192 > /sys/block/$mskb/queue/max_sectors_kb; done* *for qdepth in $(ls -al /dev/rsd\|awk '{ print $NF }'); do echo 31 > /sys/block/$qdepth/device/queue_depth; done Verify that things are actually set: # cat /sys/block/md0/md/stripe_cache_size 256 # blockdev –getra $(echo $(ls -al /dev/rsd\|awk '{ print "/dev/"$NF }'))* 256 256 256 256 256 256 # blockdev –getra /dev/md0 /dev/raidmd0 4096 4096 *# ls -al /dev/rsd\|awk '{ print $NF }' sdb sdc sdd sde sdf sdg # for mskb in $(ls -al /dev/rsd\|awk '{ print $NF }'); do cat /sys/block/$mskb/queue/max_sectors_kb; done* 512 512 512 512 512 512 *# for qdepth in $(ls -al /dev/rsd\|awk '{ print $NF }'); do cat /sys/block/$qdepth/device/queue_depth; done 31 31 31 31 31 31 # So now we try to reset the chunk size again. This time because we started with a larger value, 256, we are successful: # mdadm –grow /dev/raidmd0 –chunk=128K mdadm: /dev/raidmd0: Cannot grow – need backup-file # Hmm. No luck but it's ok. So now let's continue performance testing: top – 04:19:08 up 5:36, 6 users, load average: 0.52, 0.24, 0.09 Tasks: 221 total, 3 running, 217 sleeping, 0 stopped, 1 zombie Cpu0 : 4.0%us, 14.3%sy, 0.0%ni, 25.6%id, 55.8%wa, 0.0%hi, 0.3%si, 0.0%st Cpu1 : 8.2%us, 14.1%sy, 0.0%ni, 67.0%id, 10.1%wa, 0.0%hi, 0.7%si, 0.0%st Mem: 3920768k total, 3776328k used, 144440k free, 59900k buffers Swap: 4194296k total, 524k used, 4193772k free, 2872196k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 91.97 1576.17 20.47 88.93 462.93 6727.83 131.46 0.42 3.73 2.10 22.97 sdb 88.00 1573.27 21.50 85.60 445.33 6702.37 133.48 0.42 3.79 2.19 23.46 sdc 86.57 1578.97 20.73 86.33 434.80 6729.03 133.82 0.40 3.62 2.14 22.96 sdd 84.93 1570.67 21.93 80.53 434.13 6669.30 138.65 0.42 4.01 2.30 23.59 sde 85.47 1571.33 20.97 86.17 438.80 6688.37 133.05 0.41 3.69 2.19 23.46 sdf 89.47 1571.10 22.07 84.27 454.13 6677.83 134.14 0.40 3.67 2.21 23.45 sdg 0.33 4.20 195.33 1.03 24820.80 20.53 253.01 0.75 3.83 2.27 44.49 dm-0 0.00 0.00 2.03 5.13 35.73 20.53 15.70 0.21 29.92 5.68 4.07 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 193.57 0.00 24776.53 0.00 256.00 0.70 3.60 2.24 43.42 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.37 928.20 23.47 25264.65 54.47 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.37 928.20 23.47 25264.65 54.47 21.50 23.14 0.61 56.25 The second copy gave me 188 seconds so roughly 25.2MB/s. We then decided to add a bitmap as apparently it didn't have any Bitmap earlier. This is generally a good thing for recovering an array: # mdadm –grow /dev/md127 –bitmap=internal # We can reset to none after with this command: # mdadm –grow /dev/md127 –bitmap=none # # mdadm –detail /dev/md127 /dev/md127: Version : 1.2 Creation Time : Sun Mar 4 23:11:42 2012 Raid Level : raid6 Array Size : 3907045632 (3726.05 GiB 4000.81 GB) Used Dev Size : 976761408 (931.51 GiB 1000.20 GB) Raid Devices : 6 Total Devices : 6 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sun Mar 18 18:30:21 2012 State : active Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : mbpc:0 (local to host mbpc) UUID : f1c5626d:cfd9d49e:41347e87:7b949c44 Events : 20 Number Major Minor RaidDevice State 0 8 32 0 active sync /dev/sdc 1 8 64 1 active sync /dev/sde 2 8 48 2 active sync /dev/sdd 3 8 80 3 active sync /dev/sdf 4 8 16 4 active sync /dev/sdb 5 8 0 5 active sync /dev/sda # One thing to note is that even with internal bitmaps, there will be a slight write penalty but a huge recovery benefit when a disk(s) fail in an array. So it is better to keep internal bitmaps. Also set the minimum speed of the array rebuild to 50MB/s. Per http://www.ducea.com/2006/06/25/increase-the-speed-of-linux-software-raid-reconstruction/ from the default of 1000: # echo 50000 >/proc/sys/dev/raid/speed_limit_min** but little write speed difference.

Pages: 1 2 3 4 5

This entry was posted on Monday, April 2nd, 2012 at 12:44 am and is filed under Network, NIX Posts, Perl, Web. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

11 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Mathias

14 years ago

Hi,

Great post. You don’t need to specify the parameters when creating the XFS file system, see http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E and http://www.spinics.net/lists/raid/msg38074.html . Of course, YMMV.

Did you run those benchmarks while the array was resyncing?

Author

Tom

14 years ago

Hey Mathias,

Thanks for posting. Just added the testing numbers so feel free to have a look and judge yourself.

> logbsize and delaylog
I ran another test with logbsize=128k (couldn’t find anything for delaylog in my mkfs.xfs man page so I’m not sure if that’ll do anything). Little to no difference in this case on first glance. Watch out for the results at some point for a closer look.

One consideration here is that eventually I would grow the LVM and XFS to fill up to 4TB. I’ll be doing this soon Potentially in the future, I may try to grow this array as well to something well over 8TB (Yet to see how to do that). I’m not sure if XFS would auto-adjust in those cases for optimal values for those capacities and the link didn’t touch on that topic.

All in all, I can still run tests on this thing recreating the FS if I need to so feel free to suggest numbers you’d be interested to see. I might leave this topic open for a week or two to see if I can think of anything else or if I’m missing anything. For my setup, having anything > 125MB/s is a bonus as the network is only 1GB/s with that theoretical max.

Cheers!
TK

RAID or not? - Page 2

13 years ago

[…] could be done safely enough like this guy did and with RAID6 as well with SSD type R/W’s no less. Your size would be limited to the size of the […]

David D

13 years ago

Thank you for posting this blog. I was getting desparate. I could not figure out why I could not stop the RAID1 device. Even from Ubuntu Rescue Remix. The LVM group was being assembled from the failed raid. I removed the volume group and was finally able to gain exclusive access to the array to stop it, put in the new disk and rebuild the array.

Nice job.
Best,
Dave.

Anonymous

13 years ago

[…] we'll use for this is the APCUPSD daemon available in RPM format. We've set one up for our HTPCB server for a home redundancy / backup solution to protect against power surges and bridge the […]

Anonymous

13 years ago

[…] every time while transferring my files. At the point, I not only lost connectivity with the HTPC+B but also my web access most of the time. Here are the culprits and here's how we went […]

Anonymous

13 years ago

[…] removed the cable and the adapter and only used a 2 foot cable to my HTPC+B system I've just configured. Voila! Problem solved. Ultimately, it's […]

Anonymous

13 years ago

[…] them from system to system to avoid choppy video / sound and also to accommodate the needs of our HTPC+B solution through file […]

Anonymous

13 years ago

[…] Linux Networking: Persistent naming rules based on MAC for eth0 and wlan0 Linux: HTPC / Home Backup: MDADM, RAID6, LVM, XFS, CIFS and NFS […]

Anonymous

12 years ago

[…] at this point and 4:15 minutes have passed). While this was going on, we are referencing our HTPC page for […]

Simple Home Backup Solution | Thoughts and Scribbles | MicroDevSys.com

11 years ago

[…] HTPC, Backup & Storage […]

The IT Development and Technology Mini Vault | MicroDevSys.com

HTPC / Backup Home Server Solution using Linux

Navigation

Blogroll

Databases

ISP's & Resources

Java

Languages

Linux

Miscellaneous

Online Security

Perl

Scripting

Web


	Copyright © 2003 - 2025 Tom Kacperski (microdevsys.com). All rights reserved. This work is licensed under a Creative Commons Attribution 3.0 Unported License Privacy / Use / Terms / Disclaimer Policy.