[tripleo] Scale up/down Ansible tasks
Hi folks, Today I spent a bit of time on: https://blueprints.launchpad.net/tripleo/+spec/scale-down-tasks Which is basically adding the capability of running Ansible tasks before a node is removed during a scale down or after a scale-up. I'm focusing on the scale-down right now, as I know it's something people have been waiting for (e.g. RHSM unsubscribe, Ceph OSD tear down, Nova Compute, etc). I need inputs from folks now, on what kind of tasks would be needed, I will test them and make sure the interface we provide is enough. John, Olie, and Martin in copy have maybe some ideas, please let me know some examples of Ansible tasks that you folks want to run before a node is deleted in Ironic. Prototype: https://review.openstack.org/#/q/topic:bp/scale-down-tasks+(status:open+OR+s...) Thanks a lot, -- Emilien Macchi
On Wed, Apr 10, 2019 at 5:58 PM Emilien Macchi <emilien@redhat.com> wrote:
Hi folks,
Today I spent a bit of time on: https://blueprints.launchpad.net/tripleo/+spec/scale-down-tasks
Which is basically adding the capability of running Ansible tasks before a node is removed during a scale down or after a scale-up. I'm focusing on the scale-down right now, as I know it's something people have been waiting for (e.g. RHSM unsubscribe, Ceph OSD tear down, Nova Compute, etc).
I need inputs from folks now, on what kind of tasks would be needed, I will test them and make sure the interface we provide is enough. John, Olie, and Martin in copy have maybe some ideas, please let me know some examples of Ansible tasks that you folks want to run before a node is deleted in Ironic.
In the Ceph case ceph-ansible has playbooks to correctly handle the scale down of different ceph services, e.g. delete a monitor [1] or delete an OSD [2]. The process would be to generate a ceph-ansible inventory, e.g. the same way we do when we scale up [3], and then execute one of those playbooks with that inventory. Examples of running these playbook are in Seb's blog [4]. This would be a great feature to have because if you don't tell the Ceph cluster that a node is not part of it anymore, then it will want to find it and not be happy if you just delete the node. It's better to tell the ceph cluster not to worry about a particular node anymore by running one of these playbooks before the node is deleted. Thanks, John [1] https://github.com/ceph/ceph-ansible/blob/master/infrastructure-playbooks/sh... [2] https://github.com/ceph/ceph-ansible/blob/master/infrastructure-playbooks/sh... [3] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/c... [4] https://www.sebastien-han.fr/blog/2016/08/16/Ceph-ansible-can-now-shrink-you...
Prototype: https://review.openstack.org/#/q/topic:bp/scale-down-tasks+(status:open+OR+s...)
Thanks a lot, -- Emilien Macchi
On Wed, Apr 10, 2019 at 6:30 PM John Fulton <johfulto@redhat.com> wrote:
In the Ceph case ceph-ansible has playbooks to correctly handle the scale down of different ceph services, e.g. delete a monitor [1] or delete an OSD [2]. The process would be to generate a ceph-ansible inventory, e.g. the same way we do when we scale up [3], and then execute one of those playbooks with that inventory. Examples of running these playbook are in Seb's blog [4].
This would be a great feature to have because if you don't tell the Ceph cluster that a node is not part of it anymore, then it will want to find it and not be happy if you just delete the node. It's better to tell the ceph cluster not to worry about a particular node anymore by running one of these playbooks before the node is deleted.
So if I'm not mistaken, these tasks need to run within the mistral_executor on the Undercloud against a generated ceph-ansible inventory. Which means, no tasks are run on hosts on local mode. Let me know if I'm wrong, I'll make sure this is working fine for the scale tasks. -- Emilien Macchi
On Wed, Apr 10, 2019 at 11:58 PM Emilien Macchi <emilien@redhat.com> wrote:
Hi folks,
Today I spent a bit of time on: https://blueprints.launchpad.net/tripleo/+spec/scale-down-tasks
Which is basically adding the capability of running Ansible tasks before a node is removed during a scale down or after a scale-up. I'm focusing on the scale-down right now, as I know it's something people have been waiting for (e.g. RHSM unsubscribe, Ceph OSD tear down, Nova Compute, etc).
I need inputs from folks now, on what kind of tasks would be needed, I will test them and make sure the interface we provide is enough. John, Olie, and Martin in copy have maybe some ideas, please let me know some examples of Ansible tasks that you folks want to run before a node is deleted in Ironic.
For nova/neutron it would be to disable the service/agent: (overcloud) $ openstack compute service list (overcloud) $ openstack compute service set [hostname] nova-compute --disable (overcloud) $ openstack network agent list (overcloud) $ openstack network agent set --disable [openvswitch-agent-id] After service is stopped/or host delete (overcloud) $ openstack compute service delete [service-id] (overcloud) $ openstack network agent delete [openvswitch-agent-id] Regards, Martin [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/...
Prototype: https://review.openstack.org/#/q/topic:bp/scale-down-tasks+(status:open+OR+s...)
Thanks a lot, -- Emilien Macchi
On Thu, 11 Apr 2019 at 08:23, Martin Schuppert <mschuppert@redhat.com> wrote:
On Wed, Apr 10, 2019 at 11:58 PM Emilien Macchi <emilien@redhat.com> wrote:
Hi folks,
Today I spent a bit of time on: https://blueprints.launchpad.net/tripleo/+spec/scale-down-tasks
Which is basically adding the capability of running Ansible tasks before a node is removed during a scale down or after a scale-up. I'm focusing on the scale-down right now, as I know it's something people have been waiting for (e.g. RHSM unsubscribe, Ceph OSD tear down, Nova Compute, etc).
I need inputs from folks now, on what kind of tasks would be needed, I will test them and make sure the interface we provide is enough. John, Olie, and Martin in copy have maybe some ideas, please let me know some examples of Ansible tasks that you folks want to run before a node is deleted in Ironic.
For nova/neutron it would be to disable the service/agent:
(overcloud) $ openstack compute service list (overcloud) $ openstack compute service set [hostname] nova-compute --disable
(overcloud) $ openstack network agent list (overcloud) $ openstack network agent set --disable [openvswitch-agent-id]
After service is stopped/or host delete (overcloud) $ openstack compute service delete [service-id] (overcloud) $ openstack network agent delete [openvswitch-agent-id]
Regards, Martin
[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/...
Might be worth confirming there are no instances running on the node too. Cheers, Ollie
Prototype: https://review.openstack.org/#/q/topic:bp/scale-down-tasks+(status:open+OR+s...)
Thanks a lot, -- Emilien Macchi
On Thu, Apr 11, 2019 at 3:23 AM Martin Schuppert <mschuppert@redhat.com> wrote:
For nova/neutron it would be to disable the service/agent:
(overcloud) $ openstack compute service list (overcloud) $ openstack compute service set [hostname] nova-compute --disable
(overcloud) $ openstack network agent list (overcloud) $ openstack network agent set --disable [openvswitch-agent-id]
After service is stopped/or host delete (overcloud) $ openstack compute service delete [service-id] (overcloud) $ openstack network agent delete [openvswitch-agent-id]
Ok so these commands would need to be executed from the Undercloud in the mistral_container, since they do nothing on local nodes but just do CLI against APIs. I take note and will make sure it's possible. Do you folks already have some playbooks doing these things or should we start from scratch? -- Emilien Macchi
On Thu, Apr 11, 2019 at 2:20 PM Emilien Macchi <emilien@redhat.com> wrote:
On Thu, Apr 11, 2019 at 3:23 AM Martin Schuppert <mschuppert@redhat.com> wrote:
For nova/neutron it would be to disable the service/agent:
(overcloud) $ openstack compute service list (overcloud) $ openstack compute service set [hostname] nova-compute --disable
(overcloud) $ openstack network agent list (overcloud) $ openstack network agent set --disable [openvswitch-agent-id]
After service is stopped/or host delete (overcloud) $ openstack compute service delete [service-id] (overcloud) $ openstack network agent delete [openvswitch-agent-id]
Ok so these commands would need to be executed from the Undercloud in the mistral_container, since they do nothing on local nodes but just do CLI against APIs. I take note and will make sure it's possible.
Do you folks already have some playbooks doing these things or should we start from scratch?
No, right now those are manual tasks, but Rajesh could help on this. Regards, Martin
-- Emilien Macchi
On Thu, Apr 11, 2019 at 8:28 AM Martin Schuppert <mschuppert@redhat.com> wrote:
On Thu, Apr 11, 2019 at 2:20 PM Emilien Macchi <emilien@redhat.com> wrote:
On Thu, Apr 11, 2019 at 3:23 AM Martin Schuppert <mschuppert@redhat.com> wrote:
For nova/neutron it would be to disable the service/agent:
(overcloud) $ openstack compute service list (overcloud) $ openstack compute service set [hostname] nova-compute --disable
(overcloud) $ openstack network agent list (overcloud) $ openstack network agent set --disable [openvswitch-agent-id]
After service is stopped/or host delete (overcloud) $ openstack compute service delete [service-id] (overcloud) $ openstack network agent delete [openvswitch-agent-id]
Ok so these commands would need to be executed from the Undercloud in the mistral_container, since they do nothing on local nodes but just do CLI against APIs. I take note and will make sure it's possible.
Do you folks already have some playbooks doing these things or should we start from scratch?
No, right now those are manual tasks, but Rajesh could help on this.
I'm prototyping it on https://review.openstack.org/#/c/653893/ - I'll iterate on that patch and once it works I'll tackle Neutron. Feedback is welcome! -- Emilien Macchi
participants (4)
-
Emilien Macchi
-
John Fulton
-
Martin Schuppert
-
Oliver Walsh