Linux LVM: Recovering a lost volume.
It may become necessary to recover or rebuild an LVM in case either a backup /etc/lvm/backup/<VOLGROUP> is not available or becomes lost or other conditions arise causing loss of system volume data. What might otherwise appear destroyed, deleted or inaccessible, LVM comes with a few backup options in case of disaster. Here are the recovery steps I took to recover previously lost data without either a backup and using only a few open source tools:
NOTE: Do not attempt these steps unless you have a firm understanding of UNIX / LINUX. The concepts here are relatively advanced. Also ensure you have a backup of the data that will be affected by your actions below. Also take note that your mileage may vary and that this may or may not necessarily work for you. Use with caution.
Before we proceed any further, the LVM version used here is:
LVM version: 2.02.83(2)-RHEL6 (2011-03-18)
Library version: 1.02.62-RHEL6 (2011-03-18)
Driver version: 4.20.6
If your version differs, the below steps may not work for you.
The first step to take is to grab the start sectors of the partitions that used to hold your LVM. The Linux LVM holds the equivalent of /etc/lvm/backup/<VOLGROUP> just in case of loss to allow for recovery later, as we will be doing here. To do this, invoke dd and grab 255, 512 byte sectors from the start of, in this case, /dev/sda2:
# dd if=/dev/sda2 bs=512 count=255 skip=1 of=./sda2.lvm
This will create a binary file that I found only best to use a hex editor to view with. I used Okteta to get a look at the file in HEX notation, simply because it was installed on my secondary system at the time. I used it to identify start and end of blocks I need to get the correct block from the file. The start and end sectors of a block begin with the 0 ASCII character: 00. Here is how the file looked like in Okteta, showing start and end markers for the LVM PV's, VG's, and LV's definition block:
Okteta Linux LVM – START of LVM Definition.
Okteta Linux LVM – END of LVM Definition.
Okteta Linux LVM – END of LVM Definition (alternate example).
Please note the end point actually marked by the green in the second image, NOT the red. They are marked on a 00 byte boundary. I did not find the START and END as easily descernable in VI. I expect Okteta, or any other HEX editor for that matter, to better indicate the starting and ending points. Your mileage may vary. The recovered file looked like this (sda2-lvm-final.lvm):
VolGroup {
id = "l6D50I-m8tf-2iPJ-ef1R-O9kf-kH2v-7SslkB"
seqno = 51
status = ["RESIZEABLE", "READ", "WRITE"]
flags = []
extent_size = 8192
max_lv = 0
max_pv = 0
metadata_copies = 0
physical_volumes {
pv0 {
id = "AR2cbu-Zlxw-8nR0-423o-20pa-3g3l-Te3gcI"
device = "/dev/sda2"
status = ["ALLOCATABLE"]
flags = []
dev_size = 392161281
pe_start = 384
pe_count = 47871
}
pv1 {
id = "1tycCU-hDC0-USvN-fuS8-gdgd-QIHq-UHcb0Y"
device = "/dev/sda3"
status = ["ALLOCATABLE"]
flags = []
dev_size = 392161281
pe_start = 2048
pe_count = 47871
}
}
logical_volumes {
lv_root {
id = "b3XsIw-JC3l-cIEU-njSP-1NbY-kPaO-gvMJda"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 8192
type = "striped"
stripe_count = 1 # linear
stripes = ["pv1", 0]
}
}
lv_swap {
id = "KBdB2L-j1ZS-8iA6-Uilh-43G1-NKxX-eE06FW"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 1024
type = "striped"
stripe_count = 1 # linear
stripes = ["pv1", 8192]
}
}
LogVol02 {
id = "beOzB5-znIO-SD44-bkgr-aT9F-gb6N-9Ayyia"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 38655
type = "striped"
stripe_count = 1 # linear
stripes = ["pv1", 9216]
}
}
}
}
# Generated by LVM2 version 2.02.83(2)-RHEL6 (2011-03-18): Sun Sep 11 05:51:56 2011
contents = "Text Format Volume Group"
version = 1
description = ""
creation_host = "localhost.localdomain" # Linux localhost.localdomain 2.6.32-131.0.15.el6.x86_64 #1 SMP Sat May 21 10:27:57 CDT 2011 x86_64
creation_time = 1315720316 # Sun Sep 11 05:51:56 2011
After trying to apply the restored VolGroup sda2-lvm-final.lvm file, received the message:
# lvm vgcfgrestore -f sda2-lvm-final.lvm VolGroup
Couldn't find device with uuid AR2cbu-Zlxw-8nR0-423o-3g3l-Te3gcI.
Couldn't find device with uuid 1tycCu-hDC0-USvN-fuS8-gdgd-QIHq-UHcb0Y.
Cannot restore Volume Group VolGroup with 2 PVs marked as missing.
Restore failed.
#
This is because in order to restore, the vgcfgrestore commands needs to see the same PV's (including matching UUID's) on the system as existed before. At this point, I haven't created them yet.
We'll need to PV create on /dev/sda2 and /dev/sda3 as per the VolGroup recovered file above and assign the same UUID's as in our restore file. (The alternate might be to change the VolGroup recovery backup file and assign the new PV UUID's to the VolGroup restore file before applying. We will not try that here however so I cannot vouch for the success of this method.)
Before we can restore the VolGroup, we need to do so to PV's with the same UUID's. For this we create teh PV's first:
# lvm pvcreate -u $(cat sda2-lvm-final.lvm |grep 1tycCU|awk '{ gsub(/"/, "", $0) } { print $3 }') /dev/sda3 –restorefile sda2-lvm-final.lvm
# lvm pvcreate -u $(cat sda2-lvm-final.lvm |grep AR2cbu|awk '{ gsub(/"/, "", $0) } { print $3 }') /dev/sda2 –restorefile sda2-lvm-final.lvm
Followed by:
# lvm vgcfgrestore -f sda2-lvm-final.lvm VolGroup
Restored volume group VolGroup
#
Unfortunately, in my case, this didn't work. I can't say I didn't expect that entirely because I had earlier done lvm pvmove from /dev/sda2 to /dev/sda3. Next I tried the same thing but this time attempting to recover the volume from /dev/sda3. This one I had less hope of recovering because I had previously unsuccessfully tried to recreate the LVM structure on /dev/sda3 that I thought I had lost. Unfortunately unknown to me at the time, I did NOT know about this type of recovery process. But I degress and now on to the show.
Below is the derived VolGroup definition I could derive from my former installation from the beginning of the partition using the means already explained above:
VolGroup {
id = "l6D50I-m8tf-2iPJ-ef1R-O9kf-kH2v-7SslkB"
seqno = 5
status = ["RESIZEABLE", "READ", "WRITE"]
flags = []
extent_size = 8192
max_lv = 0
max_pv = 0
metadata_copies = 0
physical_volumes {
pv0 {
id = "x0O0xW-TAXH-Vrz2-g299-pJm6-Mj5g-srkOBH"
device = "/dev/sda3"
status = ["ALLOCATABLE"]
flags = []
dev_size = 392161281
pe_start = 2048
pe_count = 47871
}
}
logical_volumes {
lv_root {
id = "b3XsIw-JC3l-cIEU-njSP-1NbY-kPaO-gvMJda"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 8192
type = "striped"
stripe_count = 1 # linear
stripes = ["pv0", 0]
}
}
lv_swap {
id = "KBdB2L-j1ZS-8iA6-Uilh-43G1-NKxX-eE06FW"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 1024
type = "striped"
stripe_count = 1 # linear
stripes = ["pv0", 8192]
}
}
LogVol02 {
id = "beOzB5-znIO-SD44-bkgr-aT9F-gb6N-9Ayyia"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 38655
type = "striped"
stripe_count = 1 # linear
stripes = ["pv0", 9216]
}
}
}
}
# Generated by LVM2 version 2.02.83(2)-RHEL6 (2011-03-18): Mon Sep 12 05:04:15 2011
contents = "Text Format Volume Group"
version = 1
description = ""
creation_host = "localhost.localdomain" # Linux localhost.localdomain 2.6.32-131.0.15.el6.x86_64 #1 SMP Sat May 21 10:27:57 CDT 2011 x86_64
creation_time = 1315803855 # Mon Sep 12 05:04:15 2011
Again we mimick the steps taken above to try and recover the lost LVM:
- lvm pvcreate -u $(cat sda3-lvm-final-3.lvm |awk '{ if ( $0 ~ /x0O0xW/ ) { gsub(/"/,"",$3); print $3 } }') /dev/sda3 –restorefile sda3-lvm-final-3.lvm
- lvm vgcfgrestore -f sda3-lvm-final-3.lvm VolGroup
- lvm vgscan
- lvm pvscan
- lvm vgchange VolGroup -a y
- lvm pvs
- lvm vgs
- lvm lvs
The last three commands being the verification. All showed LV's, PV's and VG's listead. Now to mount the volumes to test. Unfortunately:
# mount /dev/VolGroup/lv_root /mnt/mr
mount: you must specify the filesystem type
#
Adding the -vvvvv to the above command reveals that the mounter couldn't see an OS type from the head of the volume it appears:
mount: types: "(null)"
mount: opts: "(null)"
At this point it looked like I was really in trouble. So I decided to see where my data was really located. For this I used dd:
# dd if=/dev/sda2 bs=512 count=100000 skip=1 of=./sda2.dd.lvm
# dd if=/dev/sda3 bs=512 count=100000 skip=1 of=./sda3.dd.lvm
Then scp'd the files to my server and analyzed them using:
# strings sda2.dd.lvm | more
# strings sda3.dd.lvm | more
This quickly revealed where the problem looked to be. I could see my file names off the old partition on sda2 but NOT on sda3. I had remembered what I did that lead to my predicament which was:
# pvcreate /dev/sda3
# vgextend VolGroup /dev/sda3
# pvmove /dev/sda2 /dev/sda3
# vgreduce VolGroup
# pvremove /dev/sda2
There is also free space between /dev/sda2 and /dev/sda3, about 1TB o f free space if not more. I had hoped the pvmove will move everything between partitions as advertised but unfortunately, it did not. But at least now I could confirm that my data was still on /dev/sda2 and nothing was on /dev/sda3. So I had to work with any remaining LVM configs off of /dev/sda2 if I have any hope of recovering my data. So I edited the smaller dd output earlier for sda2 and recreated any and all available definitions I could find and started reimplementing from the earliest moving backwards:
And this time, mount worked on /dev/VolGroup/LogVol02 and I could see my files:
# mount /dev/VolGroup/LogVol02 /mnt/mr
# du -sh /mnt/mr
119G /mnt/mr
#
But I wanted /dev/VolGroup/lv_root to be visible. So off I go to the next available definition, inspired by my success above and try that. This time I used the third last one:
VolGroup {
id = "l6D50I-m8tf-2iPJ-ef1R-O9kf-kH2v-7SslkB"
seqno = 47
status = ["RESIZEABLE", "READ", "WRITE"]
flags = []
extent_size = 8192
max_lv = 0
max_pv = 0
metadata_copies = 0
physical_volumes {
pv0 {
id = "AR2cbu-Zlxw-8nR0-423o-20pa-3g3l-Te3gcI"
device = "/dev/sda2"
status = ["ALLOCATABLE"]
flags = []
dev_size = 392161281
pe_start = 384
pe_count = 47871
}
pv1 {
id = "1tycCU-hDC0-USvN-fuS8-gdgd-QIHq-UHcb0Y"
device = "/dev/sda3"
status = ["ALLOCATABLE"]
flags = []
dev_size = 392161281
pe_start = 2048
pe_count = 47871
}
}
logical_volumes {
lv_root {
id = "b3XsIw-JC3l-cIEU-njSP-1NbY-kPaO-gvMJda"
status = ["READ", "WRITE", "VISIBLE", "LOCKED"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 8192
type = "striped"
stripe_count = 1 # linear
stripes = ["pvmove0", 0]
}
}
lv_swap {
id = "KBdB2L-j1ZS-8iA6-Uilh-43G1-NKxX-eE06FW"
status = ["READ", "WRITE", "VISIBLE", "LOCKED"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 1024
type = "striped"
stripe_count = 1 # linear
stripes = ["pvmove0", 8192]
}
}
LogVol02 {
id = "beOzB5-znIO-SD44-bkgr-aT9F-gb6N-9Ayyia"
status = ["READ", "WRITE", "VISIBLE", "LOCKED"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 38655
type = "striped"
stripe_count = 1 # linear
stripes = ["pvmove0", 9216]
}
}
pvmove0 {
id = "2D2d4C-tTFL-IJLu-CDX8-Ud6l-juav-7Xh9on"
status = ["READ", "WRITE", "PVMOVE", "LOCKED"]
flags = []
allocation_policy = "contiguous"
segment_count = 3
segment1 {
start_extent = 0
extent_count = 8192
type = "mirror"
mirror_count = 2
extents_moved = 0
mirrors = ["pv0", 0,"pv1", 0]
}
segment2 {
start_extent = 8192
extent_count = 1024
type = "mirror"
mirror_count = 2
extents_moved = 0
mirrors = ["pv0", 8192,"pv1", 8192]
}
segment3 {
start_extent = 9216
extent_count = 38655
type = "mirror"
mirror_count = 2
extents_moved = 0
mirrors = ["pv0", 9216,"pv1", 9216]
}
}
}
}
# Generated by LVM2 version 2.02.83(2)-RHEL6 (2011-03-18): Sun Sep 11 02:55:09 2011
contents = "Text Format Volume Group"
version = 1
description = ""
creation_host = "localhost.localdomain" # Linux localhost.localdomain 2.6.32-131.0.15.el6.x86_64 #1 SMP Sat May 21 10:27:57 CDT 2011 x86_64
creation_time = 1315709709 # Sun Sep 11 02:55:09 2011
This time, I could see and mount all my partitions again:
# mount /dev/VolGroup/LogVol02 /mnt/mr
# mount /dev/VolGroup/lv_root /mnt/lv_root
# du -sh /mnt/mr
119G /mnt/mr
# du -sh /mnt/lv_root
13G /mnt/lv_root
#
And this is why I got my partitions back and working on /dev/sda2. I did notice one thing however. As soon as I restored the above definition the hard drive light on the disk of the target server stayed on as if it was copying something. Looking I noticed kblockd using some CPU while the hard drive light was on. So I checked what is doing disk I/O using this command:
# for ikh in $(ls /proc/*/io); do echo "[$ikh]: "$(cat $ikh 2>/dev/null)|awk '{ if ( $0 ~ /rchar/ && $3 != 0 ) print $1" "$3; }'; done
Which, when ran twice, reveals which processes show a delta on Disk I/O:
[/proc/1/io]: 388814
[/proc/1000/io]: 24292
[/proc/1004/io]: 673413398
[/proc/12569/io]: 49573
[/proc/12583/io]: 30445
[/proc/12592/io]: 170451
[/proc/67/io]: 107712397
[/proc/773/io]: 7780
[/proc/774/io]: 261734
[/proc/775/io]: 2412006
[/proc/776/io]: 26700
[/proc/778/io]: 1777975
[/proc/78/io]: 115248
[/proc/794/io]: 440762
[/proc/807/io]: 16012
[/proc/836/io]: 7767303
[/proc/84/io]: 58436892
[/proc/93/io]: 4478177
[/proc/self/io]: 1956
Checking processes, this quickly revealed the processes:
root 1004 0.0 0.0 108524 2120 tty1 S Sep17 0:08 bash
root 23238 0.0 0.0 20204 1252 ? S 02:32 0:00 hald-addon-input: Listening on /dev/input/event3
With 1004 belonging to my current shell. So the only thing that I could deduce, that when I restored the LVM config, the pvmove defined in it, retriggered an lvm pvmove /dev/sda2 /dev/sda3 again. I will let this one finish but I won't vgreduce anymore once it's done, but instead check with dd exactly where it moved things before I reduce and shrink my partition.
Having let the activity run for a while, I noted that even when it was done, I still could not reduce the VolGroup by /dev/sda3 at all:
# lvm vgreduce VolGroup /dev/sda3
Physical volume /dev/sda3 is still in use.
So I decided to tweak the last VolGroup definition by removing the move blocks or amending them like this by editing everything in red out (Now this should work since we verified above that /dev/sda3 has NO data):
VolGroup {
id = "l6D50I-m8tf-2iPJ-ef1R-O9kf-kH2v-7SslkB"
seqno = 47
status = ["RESIZEABLE", "READ", "WRITE"]
flags = []
extent_size = 8192
max_lv = 0
max_pv = 0
metadata_copies = 0
physical_volumes {
pv0 {
id = "AR2cbu-Zlxw-8nR0-423o-20pa-3g3l-Te3gcI"
device = "/dev/sda2"
status = ["ALLOCATABLE"]
flags = []
dev_size = 392161281
pe_start = 384
pe_count = 47871
}
pv1 {
id = "1tycCU-hDC0-USvN-fuS8-gdgd-QIHq-UHcb0Y"
device = "/dev/sda3"
status = ["ALLOCATABLE"]
flags = []
dev_size = 392161281
pe_start = 2048
pe_count = 47871
}
}
logical_volumes {
lv_root {
id = "b3XsIw-JC3l-cIEU-njSP-1NbY-kPaO-gvMJda"
status = ["READ", "WRITE", "VISIBLE", "LOCKED"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 8192
type = "striped"
stripe_count = 1 # linear
stripes = ["pvmove0", 0]
}
}
lv_swap {
id = "KBdB2L-j1ZS-8iA6-Uilh-43G1-NKxX-eE06FW"
status = ["READ", "WRITE", "VISIBLE", "LOCKED"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 1024
type = "striped"
stripe_count = 1 # linear
stripes = ["pvmove0", 8192]
}
}
LogVol02 {
id = "beOzB5-znIO-SD44-bkgr-aT9F-gb6N-9Ayyia"
status = ["READ", "WRITE", "VISIBLE", "LOCKED"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 38655
type = "striped"
stripe_count = 1 # linear
stripes = ["pvmove0", 9216]
}
}
pvmove0 {
id = "2D2d4C-tTFL-IJLu-CDX8-Ud6l-juav-7Xh9on"
status = ["READ", "WRITE", "PVMOVE", "LOCKED"]
flags = []
allocation_policy = "contiguous"
segment_count = 3
segment1 {
start_extent = 0
extent_count = 8192
type = "mirror"
mirror_count = 2
extents_moved = 0
mirrors = ["pv0", 0,"pv1", 0]
}
segment2 {
start_extent = 8192
extent_count = 1024
type = "mirror"
mirror_count = 2
extents_moved = 0
mirrors = ["pv0", 8192,"pv1", 8192]
}
segment3 {
start_extent = 9216
extent_count = 38655
type = "mirror"
mirror_count = 2
extents_moved = 0
mirrors = ["pv0", 9216,"pv1", 9216]
}
}
}
}
# Generated by LVM2 version 2.02.83(2)-RHEL6 (2011-03-18): Sun Sep 11 02:55:09 2011
contents = "Text Format Volume Group"
version = 1
description = ""
creation_host = "localhost.localdomain" # Linux localhost.localdomain 2.6.32-131.0.15.el6.x86_64 #1 SMP Sat May 21 10:27:57 CDT 2011 x86_64
creation_time = 1315709709 # Sun Sep 11 02:55:09 2011
Sure enough, when the above edited file was reimplemented with vgcfgrestore, the volume was mountable and this time showed enough free space on /dev/sda3 by which to reduce it:
PV /dev/sda2 VG VolGroup lvm2 [187.00 GiB / 0 free]
PV /dev/sda3 VG VolGroup lvm2 [187.00 GiB / 187.00 GiB free]
Total: 2 [373.99 GiB] / in use: 2 [373.99 GiB] / in no VG: 0 [0 ]
And this time vgreduce worked successfully, and I was back on /dev/sda2 about to attempt a mirror instead then a vgreduce and pvremove.
MORAL OF THE STORY:
Consider using rsync or scp between volumes and define new ones manually before you copy over. You may get better mileage that way. Also, most importantly, take backup of the LVM config before you reboot. If that fails and you still get yourself into such a mess, then at least bookmark this site.
Cheers!
TK
Thanks for your post, it helped me to recover the config file although it wasn’t as difficult as your case.
Thank you! I am not sure where to start with my issue tho. I see the VG but that is all. I am new to LVM and I hope to be profient in it the coming months.
Thank you, this helped me a lot. Had no lvm backup files and your hint how to recover them saved me a lot of time. I had to increase the count of sectors to recover.