That’s awesome. Thank you. 

On Thu, Sep 3, 2020 at 12:31 AM Satish Patel <satish.txt@gmail.com> wrote:
Mohammed,



Dis-regard my earlier emails. i found senlin does auto-healing. you

need to create a health policy and attach it to your cluster.



Here is my policy I created to monitor nodes' heath and if for some

reason it dies or crashes, senlin will auto create that instance to

fulfill the need.



type: senlin.policy.health

version: 1.1

description: A policy for maintaining node health from a cluster.

properties:

  detection:

    # Number of seconds between two adjacent checking

    interval: 60



    detection_modes:

      # Type for health checking, valid values include:

      # NODE_STATUS_POLLING, NODE_STATUS_POLL_URL, LIFECYCLE_EVENTS

      - type: NODE_STATUS_POLLING



  recovery:

    # Action that can be retried on a failed node, will improve to

    # support multiple actions in the future. Valid values include:

    # REBOOT, REBUILD, RECREATE

    actions:

      - name: RECREATE





** Here is the POC



[root@os-infra-1-utility-container-e139058e ~]# nova list

+--------------------------------------+---------------+--------+------------+-------------+-------------------+

| ID                                   | Name          | Status | Task

State | Power State | Networks          |

+--------------------------------------+---------------+--------+------------+-------------+-------------------+

| 38ba7f7c-2f5f-4502-a5d0-6c4841d6d145 | cirros_server | ACTIVE | -

      | Running     | net1=192.168.1.26 |

| ba55deb6-9488-4455-a472-a0a957cb388a | cirros_server | ACTIVE | -

      | Running     | net1=192.168.1.14 |

+--------------------------------------+---------------+--------+------------+-------------+-------------------+



** Lets delete one of the nodes.



[root@os-infra-1-utility-container-e139058e ~]# nova delete

ba55deb6-9488-4455-a472-a0a957cb388a

Request to delete server ba55deb6-9488-4455-a472-a0a957cb388a has been accepted.



** After a few min i can see RECOVERING nodes.



[root@os-infra-1-utility-container-e139058e ~]# openstack cluster node list

+----------+---------------+-------+------------+------------+-------------+--------------+----------------------+----------------------+---------+

| id       | name          | index | status     | cluster_id |

physical_id | profile_name | created_at           | updated_at

  | tainted |

+----------+---------------+-------+------------+------------+-------------+--------------+----------------------+----------------------+---------+

| d4a8f219 | node-YPsjB6bV |     6 | RECOVERING | 091fbd52   |

ba55deb6    | myserver     | 2020-09-02T21:01:47Z |

2020-09-03T04:01:58Z | False   |

| bc50c0b9 | node-hoiHkRcS |     7 | ACTIVE     | 091fbd52   |

38ba7f7c    | myserver     | 2020-09-03T03:40:29Z |

2020-09-03T03:57:58Z | False   |

+----------+---------------+-------+------------+------------+-------------+--------------+----------------------+----------------------+---------+



** Finally it's up and running with a new ip address.



[root@os-infra-1-utility-container-e139058e ~]# nova list

+--------------------------------------+---------------+--------+------------+-------------+-------------------+

| ID                                   | Name          | Status | Task

State | Power State | Networks          |

+--------------------------------------+---------------+--------+------------+-------------+-------------------+

| 38ba7f7c-2f5f-4502-a5d0-6c4841d6d145 | cirros_server | ACTIVE | -

      | Running     | net1=192.168.1.26 |

| 73a658cd-c40a-45d8-9b57-cc9e6c2b4dc1 | cirros_server | ACTIVE | -

      | Running     | net1=192.168.1.17 |

+--------------------------------------+---------------+--------+------------+-------------+-------------------+



On Tue, Sep 1, 2020 at 8:51 AM Mohammed Naser <mnaser@vexxhost.com> wrote:

>

> Hi Satish,

>

> I'm interested by this, did you end up finding a solution for this?

>

> Thanks,

> Mohammed

>

> On Thu, Aug 27, 2020 at 1:54 PM Satish Patel <satish.txt@gmail.com> wrote:

> >

> > Folks,

> >

> > I have created very simple cluster using following command

> >

> > openstack cluster create --profile myserver --desired-capacity 2

> > --min-size 2 --max-size 3 --strict my-asg

> >

> > It spun up 2 vm immediately now because the desired capacity is 2 so I

> > am assuming if any node dies in the cluster it should spin up node to

> > make count 2 right?

> >

> > so i killed one of node with "nove delete <instance-foo-1>"  but

> > senlin didn't create node automatically to make desired capacity 2 (In

> > AWS when you kill node in ASG it will create new node so is this

> > senlin different then AWS?)

> >

>

>

> --

> Mohammed Naser

> VEXXHOST, Inc.

--
Mohammed Naser
VEXXHOST, Inc.