hello i use masakari for migrate servers when the compute being down. i install these container on the region: masakari-monitors masakari-engine masakari-api hacluster-pacemaker hacluster-corosync and on the compute server i install "hacluster-pacemaker-remote". In openstack i created segment and I added 2 hosts (in this case the name of that 2 hosts are: R3SG5 & R3SG12). for testing evacuate function i shut off one of that compute that i added in host. (in this case i shutoff R3SG5) i attached the log that i found in this directory: /var/log/kolla/masakari/masakari-hostmonitor.log ********* INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is 'online' (current: 'online'). INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is 'online' (current: 'online'). WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'eth0' is failed.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed. INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is 'offline' (current: 'offline'). INFO masakarimonitors.ha.masakari [-] Send a notification. {'notification': {'type': 'COMPUTE_HOST', 'hostname': 'R3SG5', 'generated_time': datetime.datetime(2022, 6, 14, 7, 6, 46, 138867), 'payload': {'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'}}} INFO masakarimonitors.ha.masakari [-] Response: openstack.instance_ha.v1.notification.Notification(type=COMPUTE_HOST, hostname=R3SG5, generated_time=2022-06-14T07:06:46.138867, payload={'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'}, id=105, notification_uuid=a7364095-cc7d-48f8-b963-c64ba147897c, source_host_uuid=6328f08c-c752-43d5-4689-801d91dd67ec, status=new, created_at=2022-06-14T07:06:47.000000, updated_at=None, location=Munch({'cloud': 'controller', 'region_name': 'RegionThree', 'zone': None, 'project': Munch({'id': 'a75a951b4537478e8cea39a932f830da', 'name': None, 'domain_id': None, 'domain_name': None})})) INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is 'online' (current: 'online'). WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'eth0' is failed.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed. INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is 'offline' (current: 'offline'). INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is 'online' (current: 'online'). WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'eth0' is failed.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. ***************** also i checked nova_scheduler logs and on that directory i receive error: *** ERROR oslo_messaging.rpc.server nova.exception.NoValidHost: No valid host was found. There are not enough hosts available. ******* finally in OpenStack dashboard in the notification section tatus change from running to failed. after the error state that shows in the notification section, my VM that was on R3SG5 became to ERROR state and the VM still exists on R3SG5 and it doenst been migrated to R3SG12. could you please help me why evacuate function doesn't work correctly?
Hi, I haven't used Masakari yet and I don't use kolla, but this message indicates that your pacemaker communication is not set up properly:
ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed.
I would start checking /etc/corosync/corosync.conf (or the respective file in a kolla/masakari deployment) if it matches your actual network setup. We use a designated network for the corosync traffic in our environment. If there aren't any other errors I would start this communication error first and see how far you get. Zitat von fereshteh loghmani <fereshtehloghmani@gmail.com>:
hello i use masakari for migrate servers when the compute being down. i install these container on the region: masakari-monitors masakari-engine masakari-api hacluster-pacemaker hacluster-corosync and on the compute server i install "hacluster-pacemaker-remote". In openstack i created segment and I added 2 hosts (in this case the name of that 2 hosts are: R3SG5 & R3SG12). for testing evacuate function i shut off one of that compute that i added in host. (in this case i shutoff R3SG5) i attached the log that i found in this directory: /var/log/kolla/masakari/masakari-hostmonitor.log
********* INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is 'online' (current: 'online'). INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is 'online' (current: 'online'). WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'eth0' is failed.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed. INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is 'offline' (current: 'offline'). INFO masakarimonitors.ha.masakari [-] Send a notification. {'notification': {'type': 'COMPUTE_HOST', 'hostname': 'R3SG5', 'generated_time': datetime.datetime(2022, 6, 14, 7, 6, 46, 138867), 'payload': {'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'}}} INFO masakarimonitors.ha.masakari [-] Response: openstack.instance_ha.v1.notification.Notification(type=COMPUTE_HOST, hostname=R3SG5, generated_time=2022-06-14T07:06:46.138867, payload={'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'}, id=105, notification_uuid=a7364095-cc7d-48f8-b963-c64ba147897c, source_host_uuid=6328f08c-c752-43d5-4689-801d91dd67ec, status=new, created_at=2022-06-14T07:06:47.000000, updated_at=None, location=Munch({'cloud': 'controller', 'region_name': 'RegionThree', 'zone': None, 'project': Munch({'id': 'a75a951b4537478e8cea39a932f830da', 'name': None, 'domain_id': None, 'domain_name': None})})) INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is 'online' (current: 'online'). WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'eth0' is failed.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed. INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is 'offline' (current: 'offline'). INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is 'online' (current: 'online'). WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'eth0' is failed.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. *****************
also i checked nova_scheduler logs and on that directory i receive error:
*** ERROR oslo_messaging.rpc.server nova.exception.NoValidHost: No valid host was found. There are not enough hosts available. *******
finally in OpenStack dashboard in the notification section tatus change from running to failed. after the error state that shows in the notification section, my VM that was on R3SG5 became to ERROR state and the VM still exists on R3SG5 and it doenst been migrated to R3SG12.
could you please help me why evacuate function doesn't work correctly?
participants (2)
-
Eugen Block
-
fereshteh loghmani