Hi, i have recently started working on Masakari (kubernetes in openstack) so i wanted some help with it please i was trying to test the masakari-instance-monitor & masakari-host-monitor but having trouble with it. i have created a pacemaker remote cluster (with pacemaker & corosync on controller nodes and pacemaker remote on compute node). *pcs statusCluster name: lab-clusterCluster Summary: * Stack: corosync * Current DC: lab-11053-26006-ceph-2 (version 2.1.2-ada5c3b36e2) - partition with quorum * Last updated: Thu Dec 14 08:41:41 2023 * Last change: Thu Dec 14 08:41:22 2023 by root via cibadmin on lab-11053-26006-ceph-1 * 6 nodes configured * 3 resource instances configuredNode List: * Online: [ lab-11053-26006-ceph-1 lab-11053-26006-ceph-2 lab-11053-26006-ceph-3 ] * RemoteOnline: [ lab-11053-26006-comp-1.cluster.local lab-11053-26006-comp-2.cluster.local lab-11053-26006-comp-3.cluster.local ]Full List of Resources: * lab-11053-26006-comp-1.cluster.local (ocf:pacemaker:remote): Started lab-11053-26006-ceph-1 * lab-11053-26006-comp-2.cluster.local (ocf:pacemaker:remote): Started lab-11053-26006-ceph-2 * lab-11053-26006-comp-3.cluster.local (ocf:pacemaker:remote): Started lab-11053-26006-ceph-3Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled* i have also created segment and added segment hosts(computes) in it. when i tried to test the masakari host monitor by manually powering off the compute node, masakari should move the vms from powered off computes to some running compute but after the compute is powered off vms still remains on the powered off compute. Am i missing some conf? is there any need to add some conf in nova and keystone too? *in masakari engine logs i seehost_failure.evacuate_all_instances = Trueinstance_failure.process_all_instances = Falsehost_failure.add_reserved_host_to_aggregate = Falsehost_failure.ha_enabled_instance_metadata_key = HA_Enabled log_opt_valueshost_failure.ignore_instances_in_error_state = Falsehost_failure.service_disable_reason = Masakari detected host failed.* *2 - question do i have to create some notification and vmoves?* openstack notification create <type> <hostname> <generated_time> <payload> if yes what is the use of notification & vmoves i ahve attached a file with the email which contains the componenets i created for masakari(segments &hosts in segments)
Hi, Sorry for repone late. 1."host with name lab-11053-26006-cmp-1 could not be found" issue. cfg.StrOpt('hostname', default=socket.gethostname(), deprecated_name="host", help=''' Hostname, FQDN or IP address of this host. Must be valid within AMQP key. Possible values: * String with hostname, FQDN or IP address. Default is hostname of this host. '''), It is hostname of this host, string type. You configrate it with a list of hostnames. Wrongly configrated. 2.Notifications is auto triggered by masakari-monitors, once instance or compute node failure. If notifications is not auto triggered,myabe something goes wrong with the masakari-monitors, and you can create notification to test recovery workflow for the failures. 3.Hostmonitor based on pacemaker+corosync not works. Can you give more clue? What is the command output before and after one compute node poweroff? #cibadmin --query Can you give some log of masakari-hostmonitor service? 发件人: Shubham Kumar Yadav <shubham.kumar.yadav369@gmail.com> 发送时间: 2024年1月11日 17:26 收件人: openstack-discuss@lists.openstack.org 主题: beginning with [masakari] Hi, i have recently started working on Masakari (kubernetes in openstack) so i wanted some help with it please i was trying to test the masakari-instance-monitor & masakari-host-monitor but having trouble with it. i have created a pacemaker remote cluster (with pacemaker & corosync on controller nodes and pacemaker remote on compute node). pcs status Cluster name: lab-cluster Cluster Summary: * Stack: corosync * Current DC: lab-11053-26006-ceph-2 (version 2.1.2-ada5c3b36e2) - partition with quorum * Last updated: Thu Dec 14 08:41:41 2023 * Last change: Thu Dec 14 08:41:22 2023 by root via cibadmin on lab-11053-26006-ceph-1 * 6 nodes configured * 3 resource instances configured Node List: * Online: [ lab-11053-26006-ceph-1 lab-11053-26006-ceph-2 lab-11053-26006-ceph-3 ] * RemoteOnline: [ lab-11053-26006-comp-1.cluster.local lab-11053-26006-comp-2.cluster.local lab-11053-26006-comp-3.cluster.local ] Full List of Resources: * lab-11053-26006-comp-1.cluster.local (ocf:pacemaker:remote): Started lab-11053-26006-ceph-1 * lab-11053-26006-comp-2.cluster.local (ocf:pacemaker:remote): Started lab-11053-26006-ceph-2 * lab-11053-26006-comp-3.cluster.local (ocf:pacemaker:remote): Started lab-11053-26006-ceph-3 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled i have also created segment and added segment hosts(computes) in it. when i tried to test the masakari host monitor by manually powering off the compute node, masakari should move the vms from powered off computes to some running compute but after the compute is powered off vms still remains on the powered off compute. Am i missing some conf? is there any need to add some conf in nova and keystone too? in masakari engine logs i see host_failure.evacuate_all_instances = True instance_failure.process_all_instances = False host_failure.add_reserved_host_to_aggregate = False host_failure.ha_enabled_instance_metadata_key = HA_Enabled log_opt_values host_failure.ignore_instances_in_error_state = False host_failure.service_disable_reason = Masakari detected host failed. 2 - question do i have to create some notification and vmoves? openstack notification create <type> <hostname> <generated_time> <payload> if yes what is the use of notification & vmoves i ahve attached a file with the email which contains the componenets i created for masakari(segments &hosts in segments)
participants (2)
-
Sam Su (苏正伟)
-
Shubham Kumar Yadav