<div dir="auto">That’s awesome. Thank you. </div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Sep 3, 2020 at 12:31 AM Satish Patel <<a href="mailto:satish.txt@gmail.com">satish.txt@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">Mohammed,<br><br><br><br>Dis-regard my earlier emails. i found senlin does auto-healing. you<br><br>need to create a health policy and attach it to your cluster.<br><br><br><br>Here is my policy I created to monitor nodes' heath and if for some<br><br>reason it dies or crashes, senlin will auto create that instance to<br><br>fulfill the need.<br><br><br><br>type: senlin.policy.health<br><br>version: 1.1<br><br>description: A policy for maintaining node health from a cluster.<br><br>properties:<br><br>  detection:<br><br>    # Number of seconds between two adjacent checking<br><br>    interval: 60<br><br><br><br>    detection_modes:<br><br>      # Type for health checking, valid values include:<br><br>      # NODE_STATUS_POLLING, NODE_STATUS_POLL_URL, LIFECYCLE_EVENTS<br><br>      - type: NODE_STATUS_POLLING<br><br><br><br>  recovery:<br><br>    # Action that can be retried on a failed node, will improve to<br><br>    # support multiple actions in the future. Valid values include:<br><br>    # REBOOT, REBUILD, RECREATE<br><br>    actions:<br><br>      - name: RECREATE<br><br><br><br><br><br>** Here is the POC<br><br><br><br>[root@os-infra-1-utility-container-e139058e ~]# nova list<br><br>+--------------------------------------+---------------+--------+------------+-------------+-------------------+<br><br>| ID                                   | Name          | Status | Task<br><br>State | Power State | Networks          |<br><br>+--------------------------------------+---------------+--------+------------+-------------+-------------------+<br><br>| 38ba7f7c-2f5f-4502-a5d0-6c4841d6d145 | cirros_server | ACTIVE | -<br><br>      | Running     | net1=192.168.1.26 |<br><br>| ba55deb6-9488-4455-a472-a0a957cb388a | cirros_server | ACTIVE | -<br><br>      | Running     | net1=192.168.1.14 |<br><br>+--------------------------------------+---------------+--------+------------+-------------+-------------------+<br><br><br><br>** Lets delete one of the nodes.<br><br><br><br>[root@os-infra-1-utility-container-e139058e ~]# nova delete<br><br>ba55deb6-9488-4455-a472-a0a957cb388a<br><br>Request to delete server ba55deb6-9488-4455-a472-a0a957cb388a has been accepted.<br><br><br><br>** After a few min i can see RECOVERING nodes.<br><br><br><br>[root@os-infra-1-utility-container-e139058e ~]# openstack cluster node list<br><br>+----------+---------------+-------+------------+------------+-------------+--------------+----------------------+----------------------+---------+<br><br>| id       | name          | index | status     | cluster_id |<br><br>physical_id | profile_name | created_at           | updated_at<br><br>  | tainted |<br><br>+----------+---------------+-------+------------+------------+-------------+--------------+----------------------+----------------------+---------+<br><br>| d4a8f219 | node-YPsjB6bV |     6 | RECOVERING | 091fbd52   |<br><br>ba55deb6    | myserver     | 2020-09-02T21:01:47Z |<br><br>2020-09-03T04:01:58Z | False   |<br><br>| bc50c0b9 | node-hoiHkRcS |     7 | ACTIVE     | 091fbd52   |<br><br>38ba7f7c    | myserver     | 2020-09-03T03:40:29Z |<br><br>2020-09-03T03:57:58Z | False   |<br><br>+----------+---------------+-------+------------+------------+-------------+--------------+----------------------+----------------------+---------+<br><br><br><br>** Finally it's up and running with a new ip address.<br><br><br><br>[root@os-infra-1-utility-container-e139058e ~]# nova list<br><br>+--------------------------------------+---------------+--------+------------+-------------+-------------------+<br><br>| ID                                   | Name          | Status | Task<br><br>State | Power State | Networks          |<br><br>+--------------------------------------+---------------+--------+------------+-------------+-------------------+<br><br>| 38ba7f7c-2f5f-4502-a5d0-6c4841d6d145 | cirros_server | ACTIVE | -<br><br>      | Running     | net1=192.168.1.26 |<br><br>| 73a658cd-c40a-45d8-9b57-cc9e6c2b4dc1 | cirros_server | ACTIVE | -<br><br>      | Running     | net1=192.168.1.17 |<br><br>+--------------------------------------+---------------+--------+------------+-------------+-------------------+<br><br><br><br>On Tue, Sep 1, 2020 at 8:51 AM Mohammed Naser <<a href="mailto:mnaser@vexxhost.com" target="_blank">mnaser@vexxhost.com</a>> wrote:<br><br>><br><br>> Hi Satish,<br><br>><br><br>> I'm interested by this, did you end up finding a solution for this?<br><br>><br><br>> Thanks,<br><br>> Mohammed<br><br>><br><br>> On Thu, Aug 27, 2020 at 1:54 PM Satish Patel <<a href="mailto:satish.txt@gmail.com" target="_blank">satish.txt@gmail.com</a>> wrote:<br><br>> ><br><br>> > Folks,<br><br>> ><br><br>> > I have created very simple cluster using following command<br><br>> ><br><br>> > openstack cluster create --profile myserver --desired-capacity 2<br><br>> > --min-size 2 --max-size 3 --strict my-asg<br><br>> ><br><br>> > It spun up 2 vm immediately now because the desired capacity is 2 so I<br><br>> > am assuming if any node dies in the cluster it should spin up node to<br><br>> > make count 2 right?<br><br>> ><br><br>> > so i killed one of node with "nove delete <instance-foo-1>"  but<br><br>> > senlin didn't create node automatically to make desired capacity 2 (In<br><br>> > AWS when you kill node in ASG it will create new node so is this<br><br>> > senlin different then AWS?)<br><br>> ><br><br>><br><br>><br><br>> --<br><br>> Mohammed Naser<br><br>> VEXXHOST, Inc.<br><br></blockquote></div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Mohammed Naser<br>VEXXHOST, Inc.</div>