[kolla-ansible] Default masakari configuration should evacuate instances across nodes in wallaby?
I'm working on upgrading an openstack farm from victoria to wallaby. After successful upgrade, I would like to enable hacluster and masakari to test HA between failing compute nodes. Everything seems to be running (pacemaker with the remote nodes OK, corosync without errors, masakari-monitors detects the 2 compute nodes online), but... when I simulate the failure of a node with shutdown, the failure and notification appears in the hostmonitor log, but the instances that were on the failed node don't evacuate, and I couldn't find documentation that explains how to do this specific configuration, if necessary. Does anyone have any ideas?
What do the Masakari API and Masakari Engine logs show? -yoctozepto On Fri, 3 Dec 2021 at 19:57, Rodrigo Lima <rodrigo.lima@o2sistemas.com> wrote:
I'm working on upgrading an openstack farm from victoria to wallaby. After successful upgrade, I would like to enable hacluster and masakari to test HA between failing compute nodes. Everything seems to be running (pacemaker with the remote nodes OK, corosync without errors, masakari-monitors detects the 2 compute nodes online), but... when I simulate the failure of a node with shutdown, the failure and notification appears in the hostmonitor log, but the instances that were on the failed node don't evacuate, and I couldn't find documentation that explains how to do this specific configuration, if necessary. Does anyone have any ideas?
Hi! It´s a lot of information. From engine: 2021-12-06 20:02:33.502 7 INFO masakari.engine.manager [req-034e0a9e-1b34-4db3-a2e0-6d61a369ba6c 6c128366b66346d38fb5493adf0cf666 e39a7ea7d17046b5b97b2253bf8195bc - - -] Processing notification 2468563d-70c5-41ef-ad04-a51b4ad3dd4d of type: COMPUTE_HOST 2021-12-06 20:02:34.052 7 INFO masakari.compute.nova [req-882d4151-0549-4cb2-acbb-90509073179c nova - - - -] Disable nova-compute on ctl02-hml.amt.net.br 2021-12-06 20:02:34.113 7 INFO masakari.engine.drivers.taskflow.host_failure [req-882d4151-0549-4cb2-acbb-90509073179c nova - - - -] Sleeping 180 sec before starting recovery thread until nova recognizes the node down. 2021-12-06 20:05:34.130 7 INFO masakari.compute.nova [req-c570bdef-0786-47bd-94fc-e6c1396eabb1 nova - - - -] Fetch Server list on ctl02-hml.amt.net.br 2021-12-06 20:05:35.166 7 INFO masakari.compute.nova [req-adec6e54-15ff-4a28-98c3-01070ba2cf87 nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:35.949 7 INFO masakari.compute.nova [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:35.955 7 INFO masakari.compute.nova [req-e548525d-74f7-42c4-8045-2f63d645f76f nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:36.739 7 INFO masakari.compute.nova [req-9cea2ff2-0d32-42a1-8605-b889d95cadc0 nova - - - -] Call lock server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:36.747 7 INFO masakari.compute.nova [req-bc3bdea3-e923-4c08-bf84-bea711465dce nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:37.391 7 INFO masakari.compute.nova [req-9cea2ff2-0d32-42a1-8605-b889d95cadc0 nova - - - -] Call evacuate command for instance 618e44e8-248f-4f50-a760-581972352af8 on host None 2021-12-06 20:05:37.478 7 INFO masakari.compute.nova [req-84d38844-ae87-4767-b822-0099bc40aa16 nova - - - -] Call lock server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:38.059 7 INFO masakari.compute.nova [req-d57bd685-2e42-48e0-bf7d-9978ac516451 nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:38.102 7 INFO masakari.compute.nova [req-84d38844-ae87-4767-b822-0099bc40aa16 nova - - - -] Call evacuate command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 on host None 2021-12-06 20:05:38.734 7 INFO masakari.compute.nova [req-25b4bb9c-2e05-409a-bae8-fd7c8348ba5b nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:39.062 7 INFO masakari.compute.nova [req-3d69363b-aa6c-4752-be52-285bc36f25c2 nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.732 7 INFO masakari.compute.nova [req-425752db-0599-42a8-bf2a-0229148410cc nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall [req-3d69363b-aa6c-4752-be52-285bc36f25c2 nova - - - -] Fixed interval looping call 'masakari.engine.drivers.taskflow.host_failure.EvacuateInstancesTask._evacuate_and_confirm.<locals>._wait_for_evacuation_confirmation' failed: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall Traceback (most recent call last): 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 207, in _wait_for_evacuation_confirmation 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall raise exception.InstanceEvacuateFailed( 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall 2021-12-06 20:05:39.790 7 WARNING masakari.engine.drivers.taskflow.host_failure [req-d537ed5e-f5c8-4ba5-b559-0156883ca02b nova - - - -] Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.793 7 INFO masakari.compute.nova [req-c02a47f3-fc23-4d5b-9167-cf21bf8923db nova - - - -] Call unlock server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall [req-425752db-0599-42a8-bf2a-0229148410cc nova - - - -] Fixed interval looping call 'masakari.engine.drivers.taskflow.host_failure.EvacuateInstancesTask._evacuate_and_confirm.<locals>._wait_for_evacuation_confirmation' failed: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall Traceback (most recent call last): 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 207, in _wait_for_evacuation_confirmation 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall raise exception.InstanceEvacuateFailed( 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall 2021-12-06 20:05:40.423 7 WARNING masakari.engine.drivers.taskflow.host_failure [req-f8061ba7-353b-487c-8813-30f85216335f nova - - - -] Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:40.425 7 INFO masakari.compute.nova [req-7f0c16fe-8eaa-485b-a414-945fcd84a3c6 nova - - - -] Call unlock server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:41.080 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'EvacuateInstancesTask' (e49cc088-a9d7-4dd8-a6db-75339c6eaa4d) transitioned into state 'FAILURE' from state 'RUNNING' 4 predecessors (most recent first): Flow 'post_tasks' |__Flow 'main_tasks' |__Flow 'pre_tasks' |__Flow 'instance_evacuate_engine': masakari.exception.HostRecoveryFailureException: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br' 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver Traceback (most recent call last): 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver result = task.execute(**arguments) 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 396, in execute 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver _do_evacuate(self.context, host_name, instance_list) 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 376, in _do_evacuate 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver raise exception.HostRecoveryFailureException( 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver masakari.exception.HostRecoveryFailureException: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br' 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver 2021-12-06 20:05:41.088 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'EvacuateInstancesTask' (e49cc088-a9d7-4dd8-a6db-75339c6eaa4d) transitioned into state 'REVERTED' from state 'REVERTING' 2021-12-06 20:05:41.091 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'PrepareHAEnabledInstancesTask' (21c32e80-7521-44bb-bd28-53f88c3d13da) transitioned into state 'REVERTED' from state 'REVERTING' 2021-12-06 20:05:41.093 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'DisableComputeServiceTask' (b3522750-c7a6-4f71-ad8c-e2a61a40e2b8) transitioned into state 'REVERTED' from state 'REVERTING' 2021-12-06 20:05:41.095 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Flow 'instance_evacuate_engine' (3c8c5def-39b8-4956-8b2c-62a218e612ee) transitioned into state 'REVERTED' from state 'RUNNING' 2021-12-06 20:05:41.096 7 ERROR masakari.engine.manager [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Failed to process notification '2468563d-70c5-41ef-ad04-a51b4ad3dd4d'. Reason: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br': masakari.exception.HostRecoveryFailureException: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br' 2021-12-06 20:05:41.099 7 INFO masakari.engine.manager [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Notification 2468563d-70c5-41ef-ad04-a51b4ad3dd4d exits with status: error. 2021-12-06 20:07:08.803 7 INFO masakari.engine.manager [req-d2c18313-253f-4bb2-8abb-5ca5d8ac248a nova - - - -] Processing notification 2468563d-70c5-41ef-ad04-a51b4ad3dd4d of type: COMPUTE_HOST 2021-12-06 20:07:09.398 7 INFO masakari.compute.nova [req-5b595769-209f-422b-a71f-ef4507187a61 nova - - - -] Disable nova-compute on ctl02-hml.amt.net.br 2021-12-06 20:07:09.467 7 INFO masakari.engine.drivers.taskflow.host_failure [req-5b595769-209f-422b-a71f-ef4507187a61 nova - - - -] Sleeping 180 sec before starting recovery thread until nova recognizes the node down.2021-12-06 20:02:33.502 7 INFO masakari.engine.manager [req-034e0a9e-1b34-4db3-a2e0-6d61a369ba6c 6c128366b66346d38fb5493adf0cf666 e39a7ea7d17046b5b97b2253bf8195bc - - -] Processing notification 2468563d-70c5-41ef-ad04-a51b4ad3dd4d of type: COMPUTE_HOST 2021-12-06 20:02:34.052 7 INFO masakari.compute.nova [req-882d4151-0549-4cb2-acbb-90509073179c nova - - - -] Disable nova-compute on ctl02-hml.amt.net.br 2021-12-06 20:02:34.113 7 INFO masakari.engine.drivers.taskflow.host_failure [req-882d4151-0549-4cb2-acbb-90509073179c nova - - - -] Sleeping 180 sec before starting recovery thread until nova recognizes the node down. 2021-12-06 20:05:34.130 7 INFO masakari.compute.nova [req-c570bdef-0786-47bd-94fc-e6c1396eabb1 nova - - - -] Fetch Server list on ctl02-hml.amt.net.br 2021-12-06 20:05:35.166 7 INFO masakari.compute.nova [req-adec6e54-15ff-4a28-98c3-01070ba2cf87 nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:35.949 7 INFO masakari.compute.nova [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:35.955 7 INFO masakari.compute.nova [req-e548525d-74f7-42c4-8045-2f63d645f76f nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:36.739 7 INFO masakari.compute.nova [req-9cea2ff2-0d32-42a1-8605-b889d95cadc0 nova - - - -] Call lock server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:36.747 7 INFO masakari.compute.nova [req-bc3bdea3-e923-4c08-bf84-bea711465dce nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:37.391 7 INFO masakari.compute.nova [req-9cea2ff2-0d32-42a1-8605-b889d95cadc0 nova - - - -] Call evacuate command for instance 618e44e8-248f-4f50-a760-581972352af8 on host None 2021-12-06 20:05:37.478 7 INFO masakari.compute.nova [req-84d38844-ae87-4767-b822-0099bc40aa16 nova - - - -] Call lock server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:38.059 7 INFO masakari.compute.nova [req-d57bd685-2e42-48e0-bf7d-9978ac516451 nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:38.102 7 INFO masakari.compute.nova [req-84d38844-ae87-4767-b822-0099bc40aa16 nova - - - -] Call evacuate command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 on host None 2021-12-06 20:05:38.734 7 INFO masakari.compute.nova [req-25b4bb9c-2e05-409a-bae8-fd7c8348ba5b nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:39.062 7 INFO masakari.compute.nova [req-3d69363b-aa6c-4752-be52-285bc36f25c2 nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.732 7 INFO masakari.compute.nova [req-425752db-0599-42a8-bf2a-0229148410cc nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall [req-3d69363b-aa6c-4752-be52-285bc36f25c2 nova - - - -] Fixed interval looping call 'masakari.engine.drivers.taskflow.host_failure.EvacuateInstancesTask._evacuate_and_confirm.<locals>._wait_for_evacuation_confirmation' failed: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall Traceback (most recent call last): 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 207, in _wait_for_evacuation_confirmation 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall raise exception.InstanceEvacuateFailed( 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall 2021-12-06 20:05:39.790 7 WARNING masakari.engine.drivers.taskflow.host_failure [req-d537ed5e-f5c8-4ba5-b559-0156883ca02b nova - - - -] Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.793 7 INFO masakari.compute.nova [req-c02a47f3-fc23-4d5b-9167-cf21bf8923db nova - - - -] Call unlock server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall [req-425752db-0599-42a8-bf2a-0229148410cc nova - - - -] Fixed interval looping call 'masakari.engine.drivers.taskflow.host_failure.EvacuateInstancesTask._evacuate_and_confirm.<locals>._wait_for_evacuation_confirmation' failed: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall Traceback (most recent call last): 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 207, in _wait_for_evacuation_confirmation 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall raise exception.InstanceEvacuateFailed( 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall 2021-12-06 20:05:40.423 7 WARNING masakari.engine.drivers.taskflow.host_failure [req-f8061ba7-353b-487c-8813-30f85216335f nova - - - -] Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:40.425 7 INFO masakari.compute.nova [req-7f0c16fe-8eaa-485b-a414-945fcd84a3c6 nova - - - -] Call unlock server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:41.080 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'EvacuateInstancesTask' (e49cc088-a9d7-4dd8-a6db-75339c6eaa4d) transitioned into state 'FAILURE' from state 'RUNNING' 4 predecessors (most recent first): Flow 'post_tasks' |__Flow 'main_tasks' |__Flow 'pre_tasks' |__Flow 'instance_evacuate_engine': masakari.exception.HostRecoveryFailureException: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br' 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver Traceback (most recent call last): 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver result = task.execute(**arguments) 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 396, in execute 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver _do_evacuate(self.context, host_name, instance_list) 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 376, in _do_evacuate 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver raise exception.HostRecoveryFailureException( 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver masakari.exception.HostRecoveryFailureException: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br' 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver 2021-12-06 20:05:41.088 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'EvacuateInstancesTask' (e49cc088-a9d7-4dd8-a6db-75339c6eaa4d) transitioned into state 'REVERTED' from state 'REVERTING' 2021-12-06 20:05:41.091 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'PrepareHAEnabledInstancesTask' (21c32e80-7521-44bb-bd28-53f88c3d13da) transitioned into state 'REVERTED' from state 'REVERTING' 2021-12-06 20:05:41.093 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'DisableComputeServiceTask' (b3522750-c7a6-4f71-ad8c-e2a61a40e2b8) transitioned into state 'REVERTED' from state 'REVERTING' 2021-12-06 20:05:41.095 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Flow 'instance_evacuate_engine' (3c8c5def-39b8-4956-8b2c-62a218e612ee) transitioned into state 'REVERTED' from state 'RUNNING' 2021-12-06 20:05:41.096 7 ERROR masakari.engine.manager [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Failed to process notification '2468563d-70c5-41ef-ad04-a51b4ad3dd4d'. Reason: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br': masakari.exception.HostRecoveryFailureException: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br' 2021-12-06 20:05:41.099 7 INFO masakari.engine.manager [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Notification 2468563d-70c5-41ef-ad04-a51b4ad3dd4d exits with status: error. 2021-12-06 20:07:08.803 7 INFO masakari.engine.manager [req-d2c18313-253f-4bb2-8abb-5ca5d8ac248a nova - - - -] Processing notification 2468563d-70c5-41ef-ad04-a51b4ad3dd4d of type: COMPUTE_HOST 2021-12-06 20:07:09.398 7 INFO masakari.compute.nova [req-5b595769-209f-422b-a71f-ef4507187a61 nova - - - -] Disable nova-compute on ctl02-hml.amt.net.br 2021-12-06 20:07:09.467 7 INFO masakari.engine.drivers.taskflow.host_failure [req-5b595769-209f-422b-a71f-ef4507187a61 nova - - - -] Sleeping 180 sec before starting recovery thread until nova recognizes the node down. I tested forcing a kernel panic on target node Em sáb., 4 de dez. de 2021 às 08:48, Radosław Piliszek < radoslaw.piliszek@gmail.com> escreveu:
What do the Masakari API and Masakari Engine logs show?
-yoctozepto
On Fri, 3 Dec 2021 at 19:57, Rodrigo Lima <rodrigo.lima@o2sistemas.com> wrote:
I'm working on upgrading an openstack farm from victoria to wallaby. After successful upgrade, I would like to enable hacluster and masakari to test HA between failing compute nodes. Everything seems to be running (pacemaker with the remote nodes OK, corosync without errors, masakari-monitors detects the 2 compute nodes online), but... when I simulate the failure of a node with shutdown, the failure and notification appears in the hostmonitor log, but the instances that were on the failed node don't evacuate, and I couldn't find documentation that explains how to do this specific configuration, if necessary. Does anyone have any ideas?
Then I suggest looking at nova api and nova compute logs correlated with these ERROR timestamps. -yoctozepto On Mon, 6 Dec 2021 at 21:11, Rodrigo Lima <rodrigo.lima@o2sistemas.com> wrote:
Hi!
It´s a lot of information. From engine: 2021-12-06 20:02:33.502 7 INFO masakari.engine.manager [req-034e0a9e-1b34-4db3-a2e0-6d61a369ba6c 6c128366b66346d38fb5493adf0cf666 e39a7ea7d17046b5b97b2253bf8195bc - - -] Processing notification 2468563d-70c5-41ef-ad04-a51b4ad3dd4d of type: COMPUTE_HOST 2021-12-06 20:02:34.052 7 INFO masakari.compute.nova [req-882d4151-0549-4cb2-acbb-90509073179c nova - - - -] Disable nova-compute on ctl02-hml.amt.net.br 2021-12-06 20:02:34.113 7 INFO masakari.engine.drivers.taskflow.host_failure [req-882d4151-0549-4cb2-acbb-90509073179c nova - - - -] Sleeping 180 sec before starting recovery thread until nova recognizes the node down. 2021-12-06 20:05:34.130 7 INFO masakari.compute.nova [req-c570bdef-0786-47bd-94fc-e6c1396eabb1 nova - - - -] Fetch Server list on ctl02-hml.amt.net.br 2021-12-06 20:05:35.166 7 INFO masakari.compute.nova [req-adec6e54-15ff-4a28-98c3-01070ba2cf87 nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:35.949 7 INFO masakari.compute.nova [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:35.955 7 INFO masakari.compute.nova [req-e548525d-74f7-42c4-8045-2f63d645f76f nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:36.739 7 INFO masakari.compute.nova [req-9cea2ff2-0d32-42a1-8605-b889d95cadc0 nova - - - -] Call lock server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:36.747 7 INFO masakari.compute.nova [req-bc3bdea3-e923-4c08-bf84-bea711465dce nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:37.391 7 INFO masakari.compute.nova [req-9cea2ff2-0d32-42a1-8605-b889d95cadc0 nova - - - -] Call evacuate command for instance 618e44e8-248f-4f50-a760-581972352af8 on host None 2021-12-06 20:05:37.478 7 INFO masakari.compute.nova [req-84d38844-ae87-4767-b822-0099bc40aa16 nova - - - -] Call lock server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:38.059 7 INFO masakari.compute.nova [req-d57bd685-2e42-48e0-bf7d-9978ac516451 nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:38.102 7 INFO masakari.compute.nova [req-84d38844-ae87-4767-b822-0099bc40aa16 nova - - - -] Call evacuate command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 on host None 2021-12-06 20:05:38.734 7 INFO masakari.compute.nova [req-25b4bb9c-2e05-409a-bae8-fd7c8348ba5b nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:39.062 7 INFO masakari.compute.nova [req-3d69363b-aa6c-4752-be52-285bc36f25c2 nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.732 7 INFO masakari.compute.nova [req-425752db-0599-42a8-bf2a-0229148410cc nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall [req-3d69363b-aa6c-4752-be52-285bc36f25c2 nova - - - -] Fixed interval looping call 'masakari.engine.drivers.taskflow.host_failure.EvacuateInstancesTask._evacuate_and_confirm.<locals>._wait_for_evacuation_confirmation' failed: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall Traceback (most recent call last): 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 207, in _wait_for_evacuation_confirmation 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall raise exception.InstanceEvacuateFailed( 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall 2021-12-06 20:05:39.790 7 WARNING masakari.engine.drivers.taskflow.host_failure [req-d537ed5e-f5c8-4ba5-b559-0156883ca02b nova - - - -] Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.793 7 INFO masakari.compute.nova [req-c02a47f3-fc23-4d5b-9167-cf21bf8923db nova - - - -] Call unlock server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall [req-425752db-0599-42a8-bf2a-0229148410cc nova - - - -] Fixed interval looping call 'masakari.engine.drivers.taskflow.host_failure.EvacuateInstancesTask._evacuate_and_confirm.<locals>._wait_for_evacuation_confirmation' failed: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall Traceback (most recent call last): 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 207, in _wait_for_evacuation_confirmation 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall raise exception.InstanceEvacuateFailed( 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall 2021-12-06 20:05:40.423 7 WARNING masakari.engine.drivers.taskflow.host_failure [req-f8061ba7-353b-487c-8813-30f85216335f nova - - - -] Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:40.425 7 INFO masakari.compute.nova [req-7f0c16fe-8eaa-485b-a414-945fcd84a3c6 nova - - - -] Call unlock server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:41.080 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'EvacuateInstancesTask' (e49cc088-a9d7-4dd8-a6db-75339c6eaa4d) transitioned into state 'FAILURE' from state 'RUNNING' 4 predecessors (most recent first): Flow 'post_tasks' |__Flow 'main_tasks' |__Flow 'pre_tasks' |__Flow 'instance_evacuate_engine': masakari.exception.HostRecoveryFailureException: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br' 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver Traceback (most recent call last): 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver result = task.execute(**arguments) 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 396, in execute 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver _do_evacuate(self.context, host_name, instance_list) 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 376, in _do_evacuate 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver raise exception.HostRecoveryFailureException( 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver masakari.exception.HostRecoveryFailureException: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br' 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver 2021-12-06 20:05:41.088 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'EvacuateInstancesTask' (e49cc088-a9d7-4dd8-a6db-75339c6eaa4d) transitioned into state 'REVERTED' from state 'REVERTING' 2021-12-06 20:05:41.091 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'PrepareHAEnabledInstancesTask' (21c32e80-7521-44bb-bd28-53f88c3d13da) transitioned into state 'REVERTED' from state 'REVERTING' 2021-12-06 20:05:41.093 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'DisableComputeServiceTask' (b3522750-c7a6-4f71-ad8c-e2a61a40e2b8) transitioned into state 'REVERTED' from state 'REVERTING' 2021-12-06 20:05:41.095 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Flow 'instance_evacuate_engine' (3c8c5def-39b8-4956-8b2c-62a218e612ee) transitioned into state 'REVERTED' from state 'RUNNING' 2021-12-06 20:05:41.096 7 ERROR masakari.engine.manager [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Failed to process notification '2468563d-70c5-41ef-ad04-a51b4ad3dd4d'. Reason: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br': masakari.exception.HostRecoveryFailureException: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br' 2021-12-06 20:05:41.099 7 INFO masakari.engine.manager [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Notification 2468563d-70c5-41ef-ad04-a51b4ad3dd4d exits with status: error. 2021-12-06 20:07:08.803 7 INFO masakari.engine.manager [req-d2c18313-253f-4bb2-8abb-5ca5d8ac248a nova - - - -] Processing notification 2468563d-70c5-41ef-ad04-a51b4ad3dd4d of type: COMPUTE_HOST 2021-12-06 20:07:09.398 7 INFO masakari.compute.nova [req-5b595769-209f-422b-a71f-ef4507187a61 nova - - - -] Disable nova-compute on ctl02-hml.amt.net.br 2021-12-06 20:07:09.467 7 INFO masakari.engine.drivers.taskflow.host_failure [req-5b595769-209f-422b-a71f-ef4507187a61 nova - - - -] Sleeping 180 sec before starting recovery thread until nova recognizes the node down.2021-12-06 20:02:33.502 7 INFO masakari.engine.manager [req-034e0a9e-1b34-4db3-a2e0-6d61a369ba6c 6c128366b66346d38fb5493adf0cf666 e39a7ea7d17046b5b97b2253bf8195bc - - -] Processing notification 2468563d-70c5-41ef-ad04-a51b4ad3dd4d of type: COMPUTE_HOST 2021-12-06 20:02:34.052 7 INFO masakari.compute.nova [req-882d4151-0549-4cb2-acbb-90509073179c nova - - - -] Disable nova-compute on ctl02-hml.amt.net.br 2021-12-06 20:02:34.113 7 INFO masakari.engine.drivers.taskflow.host_failure [req-882d4151-0549-4cb2-acbb-90509073179c nova - - - -] Sleeping 180 sec before starting recovery thread until nova recognizes the node down. 2021-12-06 20:05:34.130 7 INFO masakari.compute.nova [req-c570bdef-0786-47bd-94fc-e6c1396eabb1 nova - - - -] Fetch Server list on ctl02-hml.amt.net.br 2021-12-06 20:05:35.166 7 INFO masakari.compute.nova [req-adec6e54-15ff-4a28-98c3-01070ba2cf87 nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:35.949 7 INFO masakari.compute.nova [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:35.955 7 INFO masakari.compute.nova [req-e548525d-74f7-42c4-8045-2f63d645f76f nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:36.739 7 INFO masakari.compute.nova [req-9cea2ff2-0d32-42a1-8605-b889d95cadc0 nova - - - -] Call lock server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:36.747 7 INFO masakari.compute.nova [req-bc3bdea3-e923-4c08-bf84-bea711465dce nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:37.391 7 INFO masakari.compute.nova [req-9cea2ff2-0d32-42a1-8605-b889d95cadc0 nova - - - -] Call evacuate command for instance 618e44e8-248f-4f50-a760-581972352af8 on host None 2021-12-06 20:05:37.478 7 INFO masakari.compute.nova [req-84d38844-ae87-4767-b822-0099bc40aa16 nova - - - -] Call lock server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:38.059 7 INFO masakari.compute.nova [req-d57bd685-2e42-48e0-bf7d-9978ac516451 nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:38.102 7 INFO masakari.compute.nova [req-84d38844-ae87-4767-b822-0099bc40aa16 nova - - - -] Call evacuate command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 on host None 2021-12-06 20:05:38.734 7 INFO masakari.compute.nova [req-25b4bb9c-2e05-409a-bae8-fd7c8348ba5b nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:39.062 7 INFO masakari.compute.nova [req-3d69363b-aa6c-4752-be52-285bc36f25c2 nova - - - -] Call get server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.732 7 INFO masakari.compute.nova [req-425752db-0599-42a8-bf2a-0229148410cc nova - - - -] Call get server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall [req-3d69363b-aa6c-4752-be52-285bc36f25c2 nova - - - -] Fixed interval looping call 'masakari.engine.drivers.taskflow.host_failure.EvacuateInstancesTask._evacuate_and_confirm.<locals>._wait_for_evacuation_confirmation' failed: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall Traceback (most recent call last): 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 207, in _wait_for_evacuation_confirmation 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall raise exception.InstanceEvacuateFailed( 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.778 7 ERROR oslo.service.loopingcall 2021-12-06 20:05:39.790 7 WARNING masakari.engine.drivers.taskflow.host_failure [req-d537ed5e-f5c8-4ba5-b559-0156883ca02b nova - - - -] Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:39.793 7 INFO masakari.compute.nova [req-c02a47f3-fc23-4d5b-9167-cf21bf8923db nova - - - -] Call unlock server command for instance 618e44e8-248f-4f50-a760-581972352af8 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall [req-425752db-0599-42a8-bf2a-0229148410cc nova - - - -] Fixed interval looping call 'masakari.engine.drivers.taskflow.host_failure.EvacuateInstancesTask._evacuate_and_confirm.<locals>._wait_for_evacuation_confirmation' failed: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall Traceback (most recent call last): 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 207, in _wait_for_evacuation_confirmation 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall raise exception.InstanceEvacuateFailed( 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:40.422 7 ERROR oslo.service.loopingcall 2021-12-06 20:05:40.423 7 WARNING masakari.engine.drivers.taskflow.host_failure [req-f8061ba7-353b-487c-8813-30f85216335f nova - - - -] Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887: masakari.exception.InstanceEvacuateFailed: Failed to evacuate instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:40.425 7 INFO masakari.compute.nova [req-7f0c16fe-8eaa-485b-a414-945fcd84a3c6 nova - - - -] Call unlock server command for instance 746178b2-14ce-4ce2-83f6-2bf9d613a887 2021-12-06 20:05:41.080 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'EvacuateInstancesTask' (e49cc088-a9d7-4dd8-a6db-75339c6eaa4d) transitioned into state 'FAILURE' from state 'RUNNING' 4 predecessors (most recent first): Flow 'post_tasks' |__Flow 'main_tasks' |__Flow 'pre_tasks' |__Flow 'instance_evacuate_engine': masakari.exception.HostRecoveryFailureException: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br' 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver Traceback (most recent call last): 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver result = task.execute(**arguments) 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 396, in execute 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver _do_evacuate(self.context, host_name, instance_list) 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/engine/drivers/taskflow/host_failure.py", line 376, in _do_evacuate 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver raise exception.HostRecoveryFailureException( 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver masakari.exception.HostRecoveryFailureException: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br' 2021-12-06 20:05:41.080 7 ERROR masakari.engine.drivers.taskflow.driver 2021-12-06 20:05:41.088 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'EvacuateInstancesTask' (e49cc088-a9d7-4dd8-a6db-75339c6eaa4d) transitioned into state 'REVERTED' from state 'REVERTING' 2021-12-06 20:05:41.091 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'PrepareHAEnabledInstancesTask' (21c32e80-7521-44bb-bd28-53f88c3d13da) transitioned into state 'REVERTED' from state 'REVERTING' 2021-12-06 20:05:41.093 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Task 'DisableComputeServiceTask' (b3522750-c7a6-4f71-ad8c-e2a61a40e2b8) transitioned into state 'REVERTED' from state 'REVERTING' 2021-12-06 20:05:41.095 7 WARNING masakari.engine.drivers.taskflow.driver [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Flow 'instance_evacuate_engine' (3c8c5def-39b8-4956-8b2c-62a218e612ee) transitioned into state 'REVERTED' from state 'RUNNING' 2021-12-06 20:05:41.096 7 ERROR masakari.engine.manager [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Failed to process notification '2468563d-70c5-41ef-ad04-a51b4ad3dd4d'. Reason: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br': masakari.exception.HostRecoveryFailureException: Failed to evacuate instances '618e44e8-248f-4f50-a760-581972352af8,746178b2-14ce-4ce2-83f6-2bf9d613a887' from host 'ctl02-hml.amt.net.br' 2021-12-06 20:05:41.099 7 INFO masakari.engine.manager [req-49414eed-fdcf-4fd0-b803-e5cc5f2f9227 nova - - - -] Notification 2468563d-70c5-41ef-ad04-a51b4ad3dd4d exits with status: error. 2021-12-06 20:07:08.803 7 INFO masakari.engine.manager [req-d2c18313-253f-4bb2-8abb-5ca5d8ac248a nova - - - -] Processing notification 2468563d-70c5-41ef-ad04-a51b4ad3dd4d of type: COMPUTE_HOST 2021-12-06 20:07:09.398 7 INFO masakari.compute.nova [req-5b595769-209f-422b-a71f-ef4507187a61 nova - - - -] Disable nova-compute on ctl02-hml.amt.net.br 2021-12-06 20:07:09.467 7 INFO masakari.engine.drivers.taskflow.host_failure [req-5b595769-209f-422b-a71f-ef4507187a61 nova - - - -] Sleeping 180 sec before starting recovery thread until nova recognizes the node down.
I tested forcing a kernel panic on target node
Em sáb., 4 de dez. de 2021 às 08:48, Radosław Piliszek < radoslaw.piliszek@gmail.com> escreveu:
What do the Masakari API and Masakari Engine logs show?
-yoctozepto
On Fri, 3 Dec 2021 at 19:57, Rodrigo Lima <rodrigo.lima@o2sistemas.com> wrote:
I'm working on upgrading an openstack farm from victoria to wallaby. After successful upgrade, I would like to enable hacluster and masakari to test HA between failing compute nodes. Everything seems to be running (pacemaker with the remote nodes OK, corosync without errors, masakari-monitors detects the 2 compute nodes online), but... when I simulate the failure of a node with shutdown, the failure and notification appears in the hostmonitor log, but the instances that were on the failed node don't evacuate, and I couldn't find documentation that explains how to do this specific configuration, if necessary. Does anyone have any ideas?
Hi, Rodrigo Lima: Have you found the failure notification in the masakari dashboard or DB( Table notifications)? If so, the failure notification would trigger the recovery workflow, and you need to check the masakari-api and masakari-engine log to find the detail. If not, it could be a deployment problem, and you need to check the connectivity between masakari-api and masakari-hostmonitor. 发件人: Rodrigo Lima [mailto:rodrigo.lima@o2sistemas.com] 发送时间: 2021年12月4日 2:56 收件人: openstack-discuss@lists.openstack.org 主题: [lists.openstack.org代发][kolla-ansible] Default masakari configuration should evacuate instances across nodes in wallaby? I'm working on upgrading an openstack farm from victoria to wallaby. After successful upgrade, I would like to enable hacluster and masakari to test HA between failing compute nodes. Everything seems to be running (pacemaker with the remote nodes OK, corosync without errors, masakari-monitors detects the 2 compute nodes online), but... when I simulate the failure of a node with shutdown, the failure and notification appears in the hostmonitor log, but the instances that were on the failed node don't evacuate, and I couldn't find documentation that explains how to do this specific configuration, if necessary. Does anyone have any ideas?
Hi, Rodrigo Lima: Have you create the failover segment and add the hosts into the segment? What is the ‘enabled’flag of the segment? 发件人: Sam Su (苏正伟) 发送时间: 2021年12月5日 16:05 收件人: 'rodrigo.lima@o2sistemas.com' <rodrigo.lima@o2sistemas.com> 抄送: 'openstack-discuss@lists.openstack.org' <openstack-discuss@lists.openstack.org> 主题: 答复: [lists.openstack.org代发][kolla-ansible] Default masakari configuration should evacuate instances across nodes in wallaby? Hi, Rodrigo Lima: Have you found the failure notification in the masakari dashboard or DB( Table notifications)? If so, the failure notification would trigger the recovery workflow, and you need to check the masakari-api and masakari-engine log to find the detail. If not, it could be a deployment problem, and you need to check the connectivity between masakari-api and masakari-hostmonitor. 发件人: Rodrigo Lima [mailto:rodrigo.lima@o2sistemas.com] 发送时间: 2021年12月4日 2:56 收件人: openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> 主题: [lists.openstack.org代发][kolla-ansible] Default masakari configuration should evacuate instances across nodes in wallaby? I'm working on upgrading an openstack farm from victoria to wallaby. After successful upgrade, I would like to enable hacluster and masakari to test HA between failing compute nodes. Everything seems to be running (pacemaker with the remote nodes OK, corosync without errors, masakari-monitors detects the 2 compute nodes online), but... when I simulate the failure of a node with shutdown, the failure and notification appears in the hostmonitor log, but the instances that were on the failed node don't evacuate, and I couldn't find documentation that explains how to do this specific configuration, if necessary. Does anyone have any ideas?
participants (3)
-
Radosław Piliszek
-
Rodrigo Lima
-
Sam Su (苏正伟)