Symptoms

You tried to delete/create a VM/volume and it shows spinning action of eg: Creating status. After a while the spinning status still shows up, and you only notice the action you made (create/delete VM) already took place after refreshing your browser.

Diagnostic steps

In the beholder log file (from the primary node) you notice an error related to "Failed to notify backend". Example:

# tail -f /var/log/hci/beholder/beholder.log

2024-09-21 13:46:39.203 INFO [-] 847700ad-514d-43ca-a22b-785db375d6f6 compute.hv3125dev.vstoragedomain compute.instance.update (info)
2024-09-21 13:46:39.204 INFO [-] {"timestamp": "2024-09-21 04:46:39.197865", "event_type": "compute.instance.update", "instance_uuid": "847700ad-514d-43ca-a22b-785db375d6f6", "instance_name": "vztest-428330-1", "state": "building", "host": "hv3125dev.vstoragedomain", "cpus": 24, "tags": []}
2024-09-21 13:46:39.205 INFO [-] 847700ad-514d-43ca-a22b-785db375d6f6 'updated' event recorded
2024-09-21 13:46:39.274 INFO [-] 847700ad-514d-43ca-a22b-785db375d6f6 compute.vztest1001.vstoragedomain compute.instance.update (info)
2024-09-21 13:46:39.274 INFO [-] {"timestamp": "2024-09-21 04:46:39.269962", "event_type": "compute.instance.update", "instance_uuid": "847700ad-514d-43ca-a22b-785db375d6f6", "instance_name": "vztest-428330-1", "state": "building", "host": "hv3125dev.vstoragedomain", "cpus": 24, "tags": []}
2024-09-21 13:46:39.275 INFO [-] 847700ad-514d-43ca-a22b-785db375d6f6 'updated' event recorded
2024-09-21 13:46:39.407 INFO [-] 847700ad-514d-43ca-a22b-785db375d6f6 compute.hv3119dev.vstoragedomain compute.instance.update (info)
2024-09-21 13:46:39.407 INFO [-] {"timestamp": "2024-09-21 04:46:39.393331", "event_type": "compute.instance.update", "instance_uuid": "847700ad-514d-43ca-a22b-785db375d6f6", "instance_name": "vztest-428330-1", "state": "building", "host": "hv3125dev.vstoragedomain", "cpus": 24, "tags": []}
2024-09-21 13:46:39.408 INFO [-] 847700ad-514d-43ca-a22b-785db375d6f6 'updated' event recorded
2024-09-21 13:46:39.750 INFO [-] network.hv3125dev.vstoragedomain port.create.start (info)
2024-09-21 13:46:40.023 ERROR [-] Failed to notify Backend: 403

Cause

The issue is caused by a bug in 6.2.1 (51). Bug ID VSTOR-89213

Resolution

The bug should be fixed in version 6.3. Workaround available. Consult support if you hesitate:

Step 1 : Identify Controller Node

[root@vhinode1 ~]# vinfra service compute node list -c id -c host -c state -c roles | grep controller
| 69e787fa-2d12-43ce-9e2b-321d9628f8b8 | vhinode1.vstoragedomain | healthy | - controller |
| 64a8f673-9a89-445b-9276-202e63a215a0 | vhinode2.vstoragedomain | healthy | - controller |
| 0ccfd274-7a83-4c46-83dc-2fd43573f6d6 | vhinode3.vstoragedomain | healthy | - controller |

Step 2 : Copy over two files that require synchronization to all controller node

/usr/libexec/vstorage-ui-backend/etc/backend.local.cfg
/usr/libexec/vstorage-ui-backend/etc/keystone/keystone-hci.conf

Before copying over, ensure that the value of "notifier_hmac_key" from keystone-hci.conf config have same value as the backend.local.cfg file

[root@vhinode1~]# cat  /usr/libexec/vstorage-ui-backend/etc/keystone/keystone-hci.conf
[hci]
notifier_address = 127.0.0.1
notifier_port = 39737
notifier_api_prefix = notifier/api/v1
notifier_hmac_key = '.......'

Step 2.1: Copy over file backend.local.cfg from 1st controller node (normally the primary node) to other controller nodes:

scp -i /etc/kolla/id_rsa /usr/libexec/vstorage-ui-backend/etc/backend.local.cfg vhinode2.vstoragedomain:/usr/libexec/vstorage-ui-backend/etc/backend.local.cfg

Step 2.2: Copy over file to keystone-hci-conf to other controller node:

scp -i /etc/kolla/id_rsa  /usr/libexec/vstorage-ui-backend/etc/keystone/keystone-hci.conf  vhinode2.vstoragedomain:/usr/libexec/vstorage-ui-backend/etc/keystone/keystone-hci.conf

Repeat over the step until all controller node is covered.

Step 3: Check on which node that keystone public service is running, and restart them on that particular node


[root@vhinode1~]# for n in $(vinfra node list -c host -f value | sort -V); do echo "===$n==="; ssh -i /etc/kolla/id_rsa $n "systemctl status vstorage-ui-keystone-public.service | grep running"; done
===vhinode.vstoragedomain===
===vhinode3.vstoragedomain===
===vhinode2.vstoragedomain===
===vhinode1.vstoragedomain===
Active: active (running) since Fri 2024-09-13 07:37:00 JST; 1 week 5 days ago

Step 3.1: Restart the keystone public service:

systemctl reload vstorage-ui-keystone-public.service

Step 4: Redeploy beholder

Notes: Ensure you are in the primary node before executing.

[root@vhi-node1 ~]# su - vstoradmin
Last login: Wed Sep 25 03:00:07 +08 2024

[vstoradmin@vhi-node1 ~]$ kolla-ansible deploy-beholder

The step above will show a very long output. A successful one looks like following. If you hesitate, please reach out to support for assistance.:

RUNNING HANDLER [beholder : Restart beholder container] *************************************************************************************************************************************************************
changed: [5046e8c9-4881-9c10-6233-e12cf94ccb9a]
changed: [4f3847e9-0de2-d696-f467-3fd471d6f855]
changed: [2785a030-db0c-c506-385d-b6b7ff9270c2]

PLAY RECAP **********************************************************************************************************************************************************************************************************
5046e8c9-4881-9c10-6233-e12cf94ccb9a : ok=11 changed=4 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 
4f3847e9-0de2-d696-f467-3fd471d6f855 : ok=9 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 
2785a030-db0c-c506-385d-b6b7ff9270c2 : ok=9 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0

-bash-5.1$ ^C
-bash-5.1$

Symptoms

Diagnostic steps

Cause

Resolution

Related articles