Header Shadow Image


command ConnectStoragePoolVDS failed: Cannot find master domain:

So we receive the following error from oVirt:

VDSM mdskvm-p01.mds.xyz command ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=87ec67c6-8da8-4161-afdf-180778a4b595, msdUUID=73fa156c-f085-466f-b409-130a9795a667'

and dig in a bit deeper to see what's going on:

[root@mdskvm-p01 log]# systemctl status vdsmd.service
â vdsmd.service – Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2018-03-30 23:18:02 EDT; 23h ago
  Process: 2787 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh –pre-start (code=exited, status=0/SUCCESS)
 Main PID: 2875 (vdsmd)
   CGroup: /system.slice/vdsmd.service
           ââ 2875 /usr/bin/python2 /usr/share/vdsm/vdsmd
           ââ16845 /usr/libexec/ioprocess –read-pipe-fd 51 –write-pipe-fd 50 –max-threads 10 –max-queued-requests 10

Mar 31 00:39:03 mdskvm-p01.mds.xyz vdsm[2875]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=d8dfd596-1e87-4e98-87ff-269edd…001d610>
                                               Traceback (most recent call last):
                                                 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task…
Mar 31 00:40:03 mdskvm-p01.mds.xyz vdsm[2875]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=d8dfd596-1e87-4e98-87ff-269edd…c0b96d0>
                                               Traceback (most recent call last):
                                                 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task…
Mar 31 00:40:23 mdskvm-p01.mds.xyz vdsm[2875]: WARN unhandled close event
Mar 31 00:40:35 mdskvm-p01.mds.xyz fence_ilo[20843]: Unable to connect/login to fencing device
Mar 31 00:40:37 mdskvm-p01.mds.xyz fence_ilo[20889]: Unable to connect/login to fencing device
Mar 31 00:41:03 mdskvm-p01.mds.xyz vdsm[2875]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=d8dfd596-1e87-4e98-87ff-269edd…009b650>
                                               Traceback (most recent call last):
                                                 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task…
Mar 31 00:41:57 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.ovirt-guest-agen… removed
Mar 31 00:41:57 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.org.qemu.guest_a… removed
Mar 31 00:43:29 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.ovirt-guest-agen… removed
Mar 31 00:43:29 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.org.qemu.guest_a… removed
Hint: Some lines were ellipsized, use -l to show in full.
[root@mdskvm-p01 log]#
[root@mdskvm-p01 log]#
[root@mdskvm-p01 log]# systemctl status vdsmd.service -l
â vdsmd.service – Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2018-03-30 23:18:02 EDT; 23h ago
  Process: 2787 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh –pre-start (code=exited, status=0/SUCCESS)
 Main PID: 2875 (vdsmd)
   CGroup: /system.slice/vdsmd.service
           ââ 2875 /usr/bin/python2 /usr/share/vdsm/vdsmd
           ââ16845 /usr/libexec/ioprocess –read-pipe-fd 51 –write-pipe-fd 50 –max-threads 10 –max-queued-requests 10

Mar 31 00:39:03 mdskvm-p01.mds.xyz vdsm[2875]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=d8dfd596-1e87-4e98-87ff-269edd92bdf1 at 0x7fcd3c0b9950> timeout=30.0, duration=0 at 0x7fcd4001d610>
                                               Traceback (most recent call last):
                                                 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
                                                   task()
                                                 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
                                                   self._callable()
                                                 File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 349, in __call__
                                                   self._execute()
                                                 File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 391, in _execute
                                                   self._vm.updateDriveVolume(drive)
                                                 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4209, in updateDriveVolume
                                                   vmDrive.volumeID)
                                                 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 6119, in _getVolumeSize
                                                   (domainID, volumeID))
                                               StorageUnavailableError: Unable to get volume size for domain 73fa156c-f085-466f-b409-130a9795a667 volume 81186557-9080-42d1-ba6a-633fb8b805e5
Mar 31 00:40:03 mdskvm-p01.mds.xyz vdsm[2875]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=d8dfd596-1e87-4e98-87ff-269edd92bdf1 at 0x7fcd5805cd90> timeout=30.0, duration=0 at 0x7fcd3c0b96d0>
                                               Traceback (most recent call last):
                                                 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
                                                   task()
                                                 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
                                                   self._callable()
                                                 File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 349, in __call__
                                                   self._execute()
                                                 File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 391, in _execute
                                                   self._vm.updateDriveVolume(drive)
                                                 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4209, in updateDriveVolume
                                                   vmDrive.volumeID)
                                                 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 6119, in _getVolumeSize
                                                   (domainID, volumeID))
                                               StorageUnavailableError: Unable to get volume size for domain 73fa156c-f085-466f-b409-130a9795a667 volume 81186557-9080-42d1-ba6a-633fb8b805e5
Mar 31 00:40:23 mdskvm-p01.mds.xyz vdsm[2875]: WARN unhandled close event
Mar 31 00:40:35 mdskvm-p01.mds.xyz fence_ilo[20843]: Unable to connect/login to fencing device
Mar 31 00:40:37 mdskvm-p01.mds.xyz fence_ilo[20889]: Unable to connect/login to fencing device
Mar 31 00:41:03 mdskvm-p01.mds.xyz vdsm[2875]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=d8dfd596-1e87-4e98-87ff-269edd92bdf1 at 0x3adeb90> timeout=30.0, duration=0 at 0x7fcd2009b650>
                                               Traceback (most recent call last):
                                                 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
                                                   task()
                                                 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
                                                   self._callable()
                                                 File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 349, in __call__
                                                   self._execute()
                                                 File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 391, in _execute
                                                   self._vm.updateDriveVolume(drive)
                                                 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4209, in updateDriveVolume
                                                   vmDrive.volumeID)
                                                 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 6119, in _getVolumeSize
                                                   (domainID, volumeID))
                                               StorageUnavailableError: Unable to get volume size for domain 73fa156c-f085-466f-b409-130a9795a667 volume 81186557-9080-42d1-ba6a-633fb8b805e5
Mar 31 00:41:57 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.ovirt-guest-agent.0 already removed
Mar 31 00:41:57 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.org.qemu.guest_agent.0 already removed
Mar 31 00:43:29 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.ovirt-guest-agent.0 already removed
Mar 31 00:43:29 mdskvm-p01.mds.xyz vdsm[2875]: WARN File: /var/lib/libvirt/qemu/channels/d8dfd596-1e87-4e98-87ff-269edd92bdf1.org.qemu.guest_agent.0 already removed
[root@mdskvm-p01 log]#
[root@mdskvm-p01 log]#
[root@mdskvm-p01 log]#
[root@mdskvm-p01 log]# systemctl restart vdsmd.service -l
[root@mdskvm-p01 log]# systemctl status vdsmd.service -l
â vdsmd.service – Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2018-04-01 00:04:52 EDT; 2s ago
  Process: 22701 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh –post-stop (code=exited, status=0/SUCCESS)
  Process: 22705 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh –pre-start (code=exited, status=0/SUCCESS)
 Main PID: 22783 (vdsmd)
   CGroup: /system.slice/vdsmd.service
           ââ22783 /usr/bin/python2 /usr/share/vdsm/vdsmd

Apr 01 00:04:50 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running prepare_transient_repository
Apr 01 00:04:51 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running syslog_available
Apr 01 00:04:51 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running nwfilter
Apr 01 00:04:51 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running dummybr
Apr 01 00:04:52 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running tune_system
Apr 01 00:04:52 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running test_space
Apr 01 00:04:52 mdskvm-p01.mds.xyz vdsmd_init_common.sh[22705]: vdsm: Running test_lo
Apr 01 00:04:52 mdskvm-p01.mds.xyz systemd[1]: Started Virtual Desktop Server Manager.
Apr 01 00:04:53 mdskvm-p01.mds.xyz vdsm[22783]: WARN MOM not available.
Apr 01 00:04:53 mdskvm-p01.mds.xyz vdsm[22783]: WARN MOM not available, KSM stats will be missing.
[root@mdskvm-p01 log]#
[root@mdskvm-p01 log]#

XFS metadata corruption shows up ( /var/log/messages ):

Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Metadata corruption detected at xfs_agi_read_verify+0x5e/0x110 [xfs], xfs_agi block 0xebffc502
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Unmount and run xfs_repair
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): First 64 bytes of corrupted metadata buffer:
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): metadata I/O error: block 0xebffc502 ("xfs_trans_read_buf_map") error 117 numblks 1
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Metadata corruption detected at xfs_agi_read_verify+0x5e/0x110 [xfs], xfs_agi block 0xefffc402
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Unmount and run xfs_repair
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): First 64 bytes of corrupted metadata buffer:
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ovirtmgmt: received packet on bond0 with own address as source address (addr:78:e7:d1:8f:4d:26, vlan:0)
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811e7aa1230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ovirtmgmt: received packet on bond0 with own address as source address (addr:78:e7:d1:8f:4d:26, vlan:0)
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): metadata I/O error: block 0xefffc402 ("xfs_trans_read_buf_map") error 117 numblks 1
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Metadata corruption detected at xfs_agi_read_verify+0x5e/0x110 [xfs], xfs_agi block 0xf3ffc302
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Unmount and run xfs_repair
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): First 64 bytes of corrupted metadata buffer:
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811f8ba2200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811f8ba2210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811f8ba2220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8811f8ba2230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): metadata I/O error: block 0xf3ffc302 ("xfs_trans_read_buf_map") error 117 numblks 1
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Metadata corruption detected at xfs_agi_read_verify+0x5e/0x110 [xfs], xfs_agi block 0xf7ffc202
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): Unmount and run xfs_repair
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): First 64 bytes of corrupted metadata buffer:
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8808e5335c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8808e5335c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8808e5335c20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: ffff8808e5335c30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  …………….
Mar 29 09:37:55 mdskvm-p01 kernel: XFS (dm-3): metadata I/O error: block 0xf7ffc202 ("xfs_trans_read_buf_map") error 117 numblks 1

So we fix using the following after going into runlevel 1 and unmounting the volume,  Failing that, use a boot ISO to boot into the environment and perform these tasks:

xfs_repair -n /dev/mdskvmsanvg/mdskvmsanlv 2>&1 | more

Cheers,
TK

Comments are closed.


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License