Hi,I guess it is an issue with the hostname. It use a fqdn hostname in nova-compute services such as ‘lab-11053-26006-cmp-1.cluster.local’ but use a short hostname ‘lab-11053-26006-cmp-1’ in masakari-monitors.
You can configurate a fqdn hostname in masakari-monitors as bellow,
[default]
hostname= lab-11053-26006-cmp-1.cluster.local(example)
Otherwise, it would use a short hostname.
masakarimonitors/conf/service.py
service_opts = [
cfg.StrOpt('hostname',
default=socket.gethostname(),
deprecated_name="host",
help='''
Hostname, FQDN or IP address of this host. Must be valid within AMQP key.
Possible values:
* String with hostname, FQDN or IP address. Default is hostname of this host.
'''),
发件人: Shubham Kumar Yadav <shubham.kumar.yadav369@gmail.com>
发送时间: 2024年1月4日 13:20
收件人: openstack-discuss@lists.openstack.org
主题: beginning with [masakari]
Hi, i have recently started working on Masakari (kubernetes in openstack) so i wanted some help with it please
i was trying to test the masakari-instance-monitor & masakari-host-monitor but having trouble with it.
i have created a pacemaker remote cluster (with pacemaker & corosync on controller nodes and pacemaker remote on compute node).
pcs status
Cluster name: lab-cluster
Cluster Summary:
* Stack: corosync
* Current DC: lab-11053-26006-ceph-2 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Dec 14 08:41:41 2023
* Last change: Thu Dec 14 08:41:22 2023 by root via cibadmin on lab-11053-26006-ceph-1
* 6 nodes configured
* 3 resource instances configured
Node List:
* Online: [ lab-11053-26006-ceph-1 lab-11053-26006-ceph-2 lab-11053-26006-ceph-3 ]
* RemoteOnline: [ lab-11053-26006-comp-1.cluster.local lab-11053-26006-comp-2.cluster.local lab-11053-26006-comp-3.cluster.local ]
Full List of Resources:
* lab-11053-26006-comp-1.cluster.local (ocf:pacemaker:remote): Started lab-11053-26006-ceph-1
* lab-11053-26006-comp-2.cluster.local (ocf:pacemaker:remote): Started lab-11053-26006-ceph-2
* lab-11053-26006-comp-3.cluster.local (ocf:pacemaker:remote): Started lab-11053-26006-ceph-3
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
i have also created segment and added segment hosts(computes) in it.
when i tried to test the masakari host monitor by manually powering off the compute node, masakari should move the vms from powered off computes to some running compute but after the compute is powered off vms still remains on the powered off compute.
Am i missing some conf? is there any need to add some conf in nova and keystone too?
in masakari engine logs i see
host_failure.evacuate_all_instances = True
instance_failure.process_all_instances = False
host_failure.add_reserved_host_to_aggregate = False
host_failure.ha_enabled_instance_metadata_key = HA_Enabled log_opt_values
host_failure.ignore_instances_in_error_state = False
host_failure.service_disable_reason = Masakari detected host failed.
and i get below error when i turn off an instance on the compute, logs are from masakari-instance-monitor
2024-01-03 13:01:15.574 1 DEBUG openstack.resource [-] Attribute [] not found in [<openstack.resource._ComponentManager object at 0x7fa051f48730>]: ''. __getattribute__ /var/lib/openstack/lib/python3.8/site-packages/openstack/resource.py:622
2024-01-03 13:01:15.595 1 WARNING masakarimonitors.ha.masakari [-] Retry sending a notification. (BadRequestException: 400: Client Error for url: http://masakari-api.openstack.svc.cluster.local:15868/v1/9db00e10af034fc6a4d50813131ae3a6/notifications, Host with name lab-11053-26006-cmp-1 could not be found.): openstack.exceptions.BadRequestException: BadRequestException: 400: Client Error for url: http://masakari-api.openstack.svc.cluster.local:15868/v1/9db00e10af034fc6a4d50813131ae3a6/notifications, Host with name lab-11053-26006-cmp-1 could not be found.
(do i need to create some masakari notifications for it to work in openstack? if yes how may i do it)
below i have attached the logs from masakari-hostmonitor and masakari-engine and also some components which i created in openstack about masakari
do you have any suggestion please let me know!
Thank You