Hi, Sorry for repone late.

 

1."host with name lab-11053-26006-cmp-1 could not be found" issue.

    cfg.StrOpt('hostname',
               
default=socket.gethostname(),
              
deprecated_name="host",
              
help='''
Hostname, FQDN or IP address of this host. Must be valid within AMQP key.

Possible values:

* String with hostname, FQDN or IP address. Default is hostname of this host.
'''
),

It is hostname of this host, string type.

You configrate it with a list of hostnames. Wrongly configrated.

 

2.Notifications is auto triggered by masakari-monitors, once instance or compute node failure.

If notifications is not auto triggered,myabe something goes wrong with the masakari-monitors, and you can create notification to test recovery workflow for the failures.

 

3.Hostmonitor based on pacemaker+corosync not works. Can you give more clue?

What is the command output before and after one compute node poweroff?

#cibadmin --query

 

Can you give some log of masakari-hostmonitor service?

 

发件人: Shubham Kumar Yadav <shubham.kumar.yadav369@gmail.com>
发送时间: 2024111 17:26
收件人: openstack-discuss@lists.openstack.org
主题: beginning with [masakari]

 

Hi, i have recently started working on Masakari (kubernetes in openstack) so i wanted some help with it please
i was trying to test the masakari-instance-monitor & masakari-host-monitor but having trouble with it.
i have created a pacemaker remote cluster (with pacemaker & corosync on controller nodes and pacemaker remote on compute node).

pcs status
Cluster name: lab-cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: lab-11053-26006-ceph-2 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Dec 14 08:41:41 2023
  * Last change:  Thu Dec 14 08:41:22 2023 by root via cibadmin on lab-11053-26006-ceph-1
  * 6 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ lab-11053-26006-ceph-1 lab-11053-26006-ceph-2 lab-11053-26006-ceph-3 ]
  * RemoteOnline: [ lab-11053-26006-comp-1.cluster.local lab-11053-26006-comp-2.cluster.local lab-11053-26006-comp-3.cluster.local ]

Full List of Resources:
  * lab-11053-26006-comp-1.cluster.local        (ocf:pacemaker:remote):  Started lab-11053-26006-ceph-1
  * lab-11053-26006-comp-2.cluster.local        (ocf:pacemaker:remote):  Started lab-11053-26006-ceph-2
  * lab-11053-26006-comp-3.cluster.local        (ocf:pacemaker:remote):  Started lab-11053-26006-ceph-3

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


i have also created segment and added segment hosts(computes) in it.

when i tried to test the masakari host monitor by manually powering off the compute node, masakari should move the vms from powered off computes to some running compute but after the compute is powered off vms still remains on the powered off compute.
Am i missing some conf? is there any need to add some conf in nova and keystone too?

in masakari engine logs i see
host_failure.evacuate_all_instances = True
instance_failure.process_all_instances = False
host_failure.add_reserved_host_to_aggregate = False
host_failure.ha_enabled_instance_metadata_key = HA_Enabled log_opt_values
host_failure.ignore_instances_in_error_state = False
host_failure.service_disable_reason = Masakari detected host failed.

2 -  question do i have to create some notification and vmoves?

openstack notification create
    <type>
    <hostname>
    <generated_time>
    <payload>

if yes what is the use of notification & vmoves

i ahve attached a file with the email which contains the componenets i created for masakari(segments &hosts in segments)