Symptoms
When you configure compute for GPU passthrough and create a flavor:
# openstack --insecure flavor create --ram 16384 --vcpus 8 --property "pci_passthrough:alias"="gpu:1" --public gpu-flavor
And create a VM without GPU. Then shut down the VM make offline resize to flavor with GPU alias and check status VM. The status VM became to 'Error' state, an error appears 'PCI alias gpu is not defined'
Error text in nova-scheduler log
"Exhausted all hosts available for retrying build failures for instance "
Cause
The behavior is a consequence of VSTOR-47788. Alias is missing from the nova-compute config of the compute node (for which alias is defined in the backend), but present in nova-compute and nova-api config on the controller node.
Compute node
# grep alias /etc/kolla/nova-compute/nova.conf #
Backend(controller) nodes
# grep alias /etc/kolla/nova-api/nova.conf alias = {"vendor_id": "10de", "product_id": "1eb8", "device_type": "ANY", "name": "gpu"} # grep alias /etc/kolla/nova-compute/nova.conf alias = {"vendor_id": "10de", "product_id": "1eb8", "device_type": "ANY", "name": "gpu"}
To resize guest with PCI device the PCI alias on the compute node should be configured as well.
Resolution
After the cluster upgrade to 4.7.1 manually call if you use PCI passthrough with generic devices:
# vinfra service compute set --pci-passthrough-config your_config.yaml