On Thu, Sep 3, 2020 at 12:31 AM Satish Patel <satish.txt@gmail.com> wrote:

Mohammed,

Dis-regard my earlier emails. i found senlin does auto-healing. you

need to create a health policy and attach it to your cluster.

Here is my policy I created to monitor nodes' heath and if for some

reason it dies or crashes, senlin will auto create that instance to

fulfill the need.

type: senlin.policy.health

version: 1.1

description: A policy for maintaining node health from a cluster.

properties:

detection:

# Number of seconds between two adjacent checking

interval: 60

detection_modes:

# Type for health checking, valid values include:

# NODE_STATUS_POLLING, NODE_STATUS_POLL_URL, LIFECYCLE_EVENTS

- type: NODE_STATUS_POLLING

recovery:

# Action that can be retried on a failed node, will improve to

# support multiple actions in the future. Valid values include:

# REBOOT, REBUILD, RECREATE

actions:

- name: RECREATE

** Here is the POC

[root@os-infra-1-utility-container-e139058e ~]# nova list

+--------------------------------------+---------------+--------+------------+-------------+-------------------+

| ID | Name | Status | Task

State | Power State | Networks |

+--------------------------------------+---------------+--------+------------+-------------+-------------------+

| 38ba7f7c-2f5f-4502-a5d0-6c4841d6d145 | cirros_server | ACTIVE | -

| Running | net1=192.168.1.26 |

| ba55deb6-9488-4455-a472-a0a957cb388a | cirros_server | ACTIVE | -

| Running | net1=192.168.1.14 |

+--------------------------------------+---------------+--------+------------+-------------+-------------------+

** Lets delete one of the nodes.

[root@os-infra-1-utility-container-e139058e ~]# nova delete

ba55deb6-9488-4455-a472-a0a957cb388a

Request to delete server ba55deb6-9488-4455-a472-a0a957cb388a has been accepted.

** After a few min i can see RECOVERING nodes.

[root@os-infra-1-utility-container-e139058e ~]# openstack cluster node list

+----------+---------------+-------+------------+------------+-------------+--------------+----------------------+----------------------+---------+

| id | name | index | status | cluster_id |

physical_id | profile_name | created_at | updated_at

| tainted |

+----------+---------------+-------+------------+------------+-------------+--------------+----------------------+----------------------+---------+

| d4a8f219 | node-YPsjB6bV | 6 | RECOVERING | 091fbd52 |

ba55deb6 | myserver | 2020-09-02T21:01:47Z |

2020-09-03T04:01:58Z | False |

| bc50c0b9 | node-hoiHkRcS | 7 | ACTIVE | 091fbd52 |

38ba7f7c | myserver | 2020-09-03T03:40:29Z |

2020-09-03T03:57:58Z | False |

+----------+---------------+-------+------------+------------+-------------+--------------+----------------------+----------------------+---------+

** Finally it's up and running with a new ip address.

[root@os-infra-1-utility-container-e139058e ~]# nova list

+--------------------------------------+---------------+--------+------------+-------------+-------------------+

| ID | Name | Status | Task

State | Power State | Networks |

+--------------------------------------+---------------+--------+------------+-------------+-------------------+

| 38ba7f7c-2f5f-4502-a5d0-6c4841d6d145 | cirros_server | ACTIVE | -

| Running | net1=192.168.1.26 |

| 73a658cd-c40a-45d8-9b57-cc9e6c2b4dc1 | cirros_server | ACTIVE | -

| Running | net1=192.168.1.17 |

+--------------------------------------+---------------+--------+------------+-------------+-------------------+

On Tue, Sep 1, 2020 at 8:51 AM Mohammed Naser <mnaser@vexxhost.com> wrote:

>

> Hi Satish,

>

> I'm interested by this, did you end up finding a solution for this?

>

> Thanks,

> Mohammed

>

> On Thu, Aug 27, 2020 at 1:54 PM Satish Patel <satish.txt@gmail.com> wrote:

> >

> > Folks,

> >

> > I have created very simple cluster using following command

> >

> > openstack cluster create --profile myserver --desired-capacity 2

> > --min-size 2 --max-size 3 --strict my-asg

> >

> > It spun up 2 vm immediately now because the desired capacity is 2 so I

> > am assuming if any node dies in the cluster it should spin up node to

> > make count 2 right?

> >

> > so i killed one of node with "nove delete <instance-foo-1>" but

> > senlin didn't create node automatically to make desired capacity 2 (In

> > AWS when you kill node in ASG it will create new node so is this

> > senlin different then AWS?)

> >

>

>

> --

> Mohammed Naser

> VEXXHOST, Inc.