Symptoms
Dispathcher cannot start a VM: prlctl start VMID -v 10
Virsh cannot list instances: virsh list --all
[root@fractus01 ~]# virsh list --all
error: Disconnected from qemu:///system due to keepalive timeout
error: Failed to list domains
error: internal error: connection closed due to keepalive timeout
Diagnostic steps
Look into the journal, try to find failures there:
journalctl -xe -u libvirtd
journalctl -xe -u prl-disp
Try to find Zombie or Interruptable processes:
# ps aux | awk '$8~/(D|Z)/'
root 7307 0.0 0.0 0 0 ? Zsl 11:34 0:07 [libvirtd] <defunct>
Verify if prl-disp and libvirtd pid files are exist and how old they are:
# ls -lrt /var/run/ | grep -E prl_disp_service.pid\|libvirtd.pid
-rw-r--r-- 1 root root 6 Nov 11 18:41 libvirtd.pid
-rw-r--r-- 1 root root 7 Nov 11 18:48 prl_disp_service.pid
Try to restart libvirtd and prl-disp, if they stuck at restart, then their pid files need to be removed:
systemctl restart libvirtd
systemctl restart prl-disp
They rather stuck in restart so that kill their processes manually
and find these Zombies' parent processes pid.
ps -ef | grep -E libvirtd\|prl-disp
Also good idea to find if the VM disk has a storage lease:
vstorage -c ID file-info /vz/vmprivate/UUID/harddisk.hdd
Also verify if the storage has different versions at the same time:
vstorage -c ID stat 2>/dev/null | grep -oE '[0-9.-]+\.v[a-z][0-9]' | sort | uniq
7.14.109.1-3.vz7
7.19.108-6.vz7
Cause
Looks like the processes cannot operate their pd files in /var/run directory.
As result systemd cannot restart the services.
Maybe the storage has two-three various versions of the vstorage it will require to handle this too.
Resolution
rm -f /var/run/libvirtd.pid
rm -f /var/run/prl_disp_service.pid
# it's expected that parent process of
# libvirtd and prl-disp id the init process PID=1
# ask the parent process to reap children became Zombies
kill -SIGCHLD 1
# revoke lease of the VM disk if exists
# it should help to become virsh operable
vstorage -c ID revoke /vz/vmprivate/UUID/harddisk.hdd
# restart the services
systemctl restart libvirtd
systemctl restart prl-disp
If the storage has several versions, ask the customer to update the nodes to the latest version to avoid having several different vstorage at the same time.