command ConnectStoragePoolVDS failed: Cannot find master domain:
So we receive the following error from oVirt:
VDSM mdskvm-p01.mds.xyz command ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=87ec67c6-8da8-4161-afdf-180778a4b595, msdUUID=73fa156c-f085-466f-b409-130a9795a667'
and dig in a bit deeper to see what's going on:
[root@mdskvm-p01 log]# systemctl status vdsmd.service
â vdsmd.service – Virtual Desktop Server Manager
Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2018-03-30 23:18:02 EDT; 23h ago
Process: 2787 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh –pre-start (code=exited, status=0/SUCCESS)
Main PID: 2875 (vdsmd)
CGroup: /system.slice/vdsmd.service
ââ 2875 /usr/bin/python2 /usr/share/vdsm/vdsmd
ââ16845 /usr/libexec/ioprocess –read-pipe-fd 51 –write-pipe-fd 50 –max-threads 10 –max-queued-requests 10
Mar 31 00:39:03 mdskvm-p01.mds.xyz vdsm[2875]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=d8dfd596-1e87-4e98-87ff-269edd…001d610>
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task…
Mar 31 00:40:03 mdskvm-p01.mds.xyz vdsm[2875]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=d8dfd596-1e87-4e98-87ff-269edd…c0b96d0>
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task…
Mar 31 00:40:23 mdskvm-p01.mds.xyz vdsm[2875]: WARN unhandled close event
Mar 31 00:40:35 mdskvm-p01.mds.xyz fence_ilo[20843]: Unable to connect/login to fencing device
Mar 31 00:40:37 mdskvm-p01.mds.xyz fence_ilo[20889]: Unable to connect/login to fencing device
Mar 31 00:41:03 mdskvm-p01.mds.xyz vdsm[2875]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=d8dfd596-1e87-4e98-87ff-269edd…009b650>
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task…
Mar 31 00:41:57 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.ovirt-guest-agen… removed
Mar 31 00:41:57 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.org.qemu.guest_a… removed
Mar 31 00:43:29 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.ovirt-guest-agen… removed
Mar 31 00:43:29 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.org.qemu.guest_a… removed
Hint: Some lines were ellipsized, use -l to show in full.
[root@mdskvm-p01 log]#
[root@mdskvm-p01 log]#
[root@mdskvm-p01 log]# systemctl status vdsmd.service -l
â vdsmd.service – Virtual Desktop Server Manager
Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2018-03-30 23:18:02 EDT; 23h ago
Process: 2787 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh –pre-start (code=exited, status=0/SUCCESS)
Main PID: 2875 (vdsmd)
CGroup: /system.slice/vdsmd.service
ââ 2875 /usr/bin/python2 /usr/share/vdsm/vdsmd
ââ16845 /usr/libexec/ioprocess –read-pipe-fd 51 –write-pipe-fd 50 –max-threads 10 –max-queued-requests 10
Mar 31 00:39:03 mdskvm-p01.mds.xyz vdsm[2875]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=d8dfd596-1e87-4e98-87ff-269edd92bdf1 at 0x7fcd3c0b9950> timeout=30.0, duration=0 at 0x7fcd4001d610>
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
task()
File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
self._callable()
File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 349, in __call__
self._execute()
File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 391, in _execute
self._vm.updateDriveVolume(drive)
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4209, in updateDriveVolume
vmDrive.volumeID)
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 6119, in _getVolumeSize
(domainID, volumeID))
StorageUnavailableError: Unable to get volume size for domain 73fa156c-f085-466f-b409-130a9795a667 volume 81186557-9080-42d1-ba6a-633fb8b805e5
Mar 31 00:40:03 mdskvm-p01.mds.xyz vdsm[2875]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=d8dfd596-1e87-4e98-87ff-269edd92bdf1 at 0x7fcd5805cd90> timeout=30.0, duration=0 at 0x7fcd3c0b96d0>
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
task()
File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
self._callable()
File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 349, in __call__
self._execute()
File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 391, in _execute
self._vm.updateDriveVolume(drive)
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4209, in updateDriveVolume
vmDrive.volumeID)
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 6119, in _getVolumeSize
(domainID, volumeID))
StorageUnavailableError: Unable to get volume size for domain 73fa156c-f085-466f-b409-130a9795a667 volume 81186557-9080-42d1-ba6a-633fb8b805e5
Mar 31 00:40:23 mdskvm-p01.mds.xyz vdsm[2875]: WARN unhandled close event
Mar 31 00:40:35 mdskvm-p01.mds.xyz fence_ilo[20843]: Unable to connect/login to fencing device
Mar 31 00:40:37 mdskvm-p01.mds.xyz fence_ilo[20889]: Unable to connect/login to fencing device
Mar 31 00:41:03 mdskvm-p01.mds.xyz vdsm[2875]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=d8dfd596-1e87-4e98-87ff-269edd92bdf1 at 0x3adeb90> timeout=30.0, duration=0 at 0x7fcd2009b650>
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
task()
File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
self._callable()
File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 349, in __call__
self._execute()
File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 391, in _execute
self._vm.updateDriveVolume(drive)
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4209, in updateDriveVolume
vmDrive.volumeID)
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 6119, in _getVolumeSize
(domainID, volumeID))
StorageUnavailableError: Unable to get volume size for domain 73fa156c-f085-466f-b409-130a9795a667 volume 81186557-9080-42d1-ba6a-633fb8b805e5
Mar 31 00:41:57 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.ovirt-guest-agent.0 already removed
Mar 31 00:41:57 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.org.qemu.guest_agent.0 already removed
Mar 31 00:43:29 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.ovirt-guest-agent.0 already removed
Mar 31 00:43:29 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.org.qemu.guest_agent.0 already removed
[root@mdskvm-p01 log]#
[root@mdskvm-p01 log]#
[root@mdskvm-p01 log]#
[root@mdskvm-p01 log]# systemctl restart vdsmd.service -l
[root@mdskvm-p01 log]# systemctl status vdsmd.service -l
â vdsmd.service – Virtual Desktop Server Manager
Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2018-04-01 00:04:52 EDT; 2s ago
Process: 22701 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh –post-stop (code=exited, status=0/SUCCESS)
Process: 22705 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh –pre-start (code=exited, status=0/SUCCESS)
Main PID: 22783 (vdsmd)
CGroup: /system.slice/vdsmd.service
ââ22783 /usr/bin/python2 /usr/share/vdsm/vdsmd
Apr 01 00:04:50 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running prepare_transient_repository
Apr 01 00:04:51 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running syslog_available
Apr 01 00:04:51 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running nwfilter
Apr 01 00:04:51 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running dummybr
Apr 01 00:04:52 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running tune_system
Apr 01 00:04:52 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running test_space
Apr 01 00:04:52 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running test_lo
Apr 01 00:04:52 mdskvm-p01.mds.xyz systemd[1]: Started Virtual Desktop Server Manager.
Apr 01 00:04:53 mdskvm-p01.mds.xyz vdsm[22783]: WARN MOM not available.
Apr 01 00:04:53 mdskvm-p01.mds.xyz vdsm[22783]: WARN MOM not available, KSM stats will be missing.
[root@mdskvm-p01 log]#
[root@mdskvm-p01 log]#
XFS metadata corruption shows up ( /var/log/messages ):
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Metadata corruption detected at xfs_agi_read_verify+0x5e/0x110 [xfs], xfs_agi block 0xebffc502
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Unmount and run xfs_repair
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): First 64 bytes of corrupted metadata buffer:
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): metadata I/O error: block 0xebffc502 ("xfs_trans_read_buf_map") error 117 numblks 1
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Metadata corruption detected at xfs_agi_read_verify+0x5e/0x110 [xfs], xfs_agi block 0xefffc402
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Unmount and run xfs_repair
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): First 64 bytes of corrupted metadata buffer:
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ovirtmgmt: received packet on bond0 with own address as source address (addr:78:e7:d1:8f:4d:26, vlan:0)
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ovirtmgmt: received packet on bond0 with own address as source address (addr:78:e7:d1:8f:4d:26, vlan:0)
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): metadata I/O error: block 0xefffc402 ("xfs_trans_read_buf_map") error 117 numblks 1
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Metadata corruption detected at xfs_agi_read_verify+0x5e/0x110 [xfs], xfs_agi block 0xf3ffc302
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Unmount and run xfs_repair
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): First 64 bytes of corrupted metadata buffer:
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811f8ba2200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811f8ba2210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811f8ba2220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811f8ba2230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): metadata I/O error: block 0xf3ffc302 ("xfs_trans_read_buf_map") error 117 numblks 1
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Metadata corruption detected at xfs_agi_read_verify+0x5e/0x110 [xfs], xfs_agi block 0xf7ffc202
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Unmount and run xfs_repair
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): First 64 bytes of corrupted metadata buffer:
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8808e5335c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8808e5335c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8808e5335c20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8808e5335c30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): metadata I/O error: block 0xf7ffc202 ("xfs_trans_read_buf_map") error 117 numblks 1
So we fix using the following after going into runlevel 1 and unmounting the volume, Failing that, use a boot ISO to boot into the environment and perform these tasks:
xfs_repair -n /dev/mdskvmsanvg/mdskvmsanlv 2>&1 | more
Cheers,
TK