evacuate problem

Eugen Block eblock at nde.ag
Thu Jun 16 08:48:23 UTC 2022


Hi,

I haven't used Masakari yet and I don't use kolla, but this message  
indicates that your pacemaker communication is not set up properly:

>  ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync
> communication is failed.

I would start checking /etc/corosync/corosync.conf (or the respective  
file in a kolla/masakari deployment) if it matches your actual network  
setup. We use a designated network for the corosync traffic in our  
environment. If there aren't any other errors I would start this  
communication error first and see how far you get.


Zitat von fereshteh loghmani <fereshtehloghmani at gmail.com>:

> hello
> i use masakari for migrate servers when the compute being down.
> i install these container on the region:
> masakari-monitors
> masakari-engine
> masakari-api
> hacluster-pacemaker
> hacluster-corosync
> and on the compute server i install "hacluster-pacemaker-remote".
> In openstack i created segment and I added 2 hosts (in this case the name
> of that 2 hosts are: R3SG5 & R3SG12).
> for testing evacuate function i shut off one of that compute that i added
> in host.
> (in this case i shutoff R3SG5)
> i attached the log that i found in this directory:
> /var/log/kolla/masakari/masakari-hostmonitor.log
>
> *********
>  INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is
> 'online' (current: 'online').
>  INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is
> 'online' (current: 'online').
>  WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync
> communication using 'eth0' is failed.:
> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
> running command.
>  ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync
> communication is failed.
>  INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is
> 'offline' (current: 'offline').
>  INFO masakarimonitors.ha.masakari [-] Send a notification.
> {'notification': {'type': 'COMPUTE_HOST', 'hostname': 'R3SG5',
> 'generated_time': datetime.datetime(2022, 6, 14, 7, 6, 46, 138867),
> 'payload': {'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status':
> 'NORMAL'}}}
>  INFO masakarimonitors.ha.masakari [-] Response:
> openstack.instance_ha.v1.notification.Notification(type=COMPUTE_HOST,
> hostname=R3SG5, generated_time=2022-06-14T07:06:46.138867,
> payload={'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status':
> 'NORMAL'}, id=105, notification_uuid=a7364095-cc7d-48f8-b963-c64ba147897c,
> source_host_uuid=6328f08c-c752-43d5-4689-801d91dd67ec, status=new,
> created_at=2022-06-14T07:06:47.000000, updated_at=None,
> location=Munch({'cloud': 'controller', 'region_name': 'RegionThree',
> 'zone': None, 'project': Munch({'id': 'a75a951b4537478e8cea39a932f830da',
> 'name': None, 'domain_id': None, 'domain_name': None})}))
>  INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is
> 'online' (current: 'online').
>  WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync
> communication using 'eth0' is failed.:
> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
> running command.
>  ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync
> communication is failed.
>  INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is
> 'offline' (current: 'offline').
>  INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is
> 'online' (current: 'online').
>  WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync
> communication using 'eth0' is failed.:
> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
> running command.
> *****************
>
>
>
> also i checked nova_scheduler logs and on that directory i receive error:
>
> ***
> ERROR oslo_messaging.rpc.server nova.exception.NoValidHost: No valid host
> was found. There are not enough hosts available.
> *******
>
> finally in OpenStack dashboard in the notification section tatus change
> from running to failed.  after the error state that shows in the
> notification section, my VM that was on R3SG5 became to ERROR state and the
> VM still exists on  R3SG5 and it doenst been migrated to R3SG12.
>
> could you please help me why evacuate function doesn't work correctly?






More information about the openstack-discuss mailing list