[openstack-dev] [TripleO] config-download/ansible next steps

Dmitry Tantsur dtantsur at redhat.com
Mon Jun 18 12:51:01 UTC 2018


On 06/13/2018 03:17 PM, James Slagle wrote:
> On Wed, Jun 13, 2018 at 6:49 AM, Dmitry Tantsur <dtantsur at redhat.com> wrote:
>> Slightly hijacking the thread to provide a status update on one of the items
>> :)
> 
> Thanks for jumping in.
> 
> 
>> The immediate plan right now is to wait for metalsmith 0.4.0 to hit the
>> repositories, then start experimenting. I need to find a way to
>> 1. make creating nova instances no-op
>> 2. collect the required information from the created stack (I need networks,
>> ports, hostnames, initial SSH keys, capabilities, images)
>> 3. update the config-download code to optionally include the role [2]
>> I'm not entirely sure where to start, so any hints are welcome.
> 
> Here are a couple of possibilities.
> 
> We could reuse the OS::TripleO::{{role.name}}Server mappings that we
> already have in place for pre-provisioned nodes (deployed-server).
> This could be mapped to a template that exposes some Ansible tasks as
> outputs that drives metalsmith to do the deployment. When
> config-download runs, it would execute these ansible tasks to
> provision the nodes with Ironic. This has the advantage of maintaining
> compatibility with our existing Heat parameter interfaces. It removes
> Nova from the deployment so that from the undercloud perspective you'd
> roughly have:
> 
> Mistral -> Heat -> config-download -> Ironic (driven via ansible/metalsmith)

One thing that came to my mind while planning this work is that I'd prefer all 
nodes to be processed in one step. This will help avoiding some issues that we 
have now. For example, the following does not work reliably:

  compute-0: just any profile:compute
  compute-1: precise node=abcd
  control-0: any node

This has two issues that will pop up randomly:
1. compute-0 can pick node abcd designated for compute-1
2. control-0 can pick a compute node, failing either compute-0 or compute-1

This problem is hard to fix if all deployment requests are processed separately, 
but is quite trivial if the decision is done based on the whole deployment plan. 
I'm going to work on a bulk scheduler like that in metalsmith.

> 
> A further (or completely different) iteration might look like:
> 
> Step 1: Mistral -> Ironic (driven via ansible/metalsmith)
> Step 2: Heat -> config-download

Step 1 will still use provided environment to figure out the count of nodes for 
each role, their images, capabilities and (optionally) precise node scheduling?
I'm a bit worried about the last bit: IIRC we rely on Heat's %index% variable 
currently. We can, of course, ask people to replace it with something more 
explicit on upgrade.

> 
> Step 2 would use the pre-provisioned node (deployed-server)  feature
> already existing in TripleO and treat the just provisioned by Ironic
> nodes, as pre-provisioned from the Heat stack perspective. Step 1 and
> Step 2 would also probably be driven by a higher level Mistral
> workflow. This has the advantage of minimal impact to
> tripleo-heat-templates, and also removes Heat from the baremetal
> provisioning step. However, we'd likely need some python compatibility
> libraries that could translate Heat parameter values such as
> HostnameMap to ansible vars for some basic backwards compatibility.

Overall, I like this option better. It will allow an operator to isolate the 
bare metal provisioning step from everything else.

> 
>>
>> [1] https://github.com/openstack/metalsmith
>> [2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html
>>
>>>
>>> Obviously we have things to consider here such as backwards compatibility
>>> and
>>> upgrades, but overall, I think this would be a great simplification to our
>>> overall deployment workflow.
>>>
>>
>> Yeah, this is tricky. Can we make Heat "forget" about Nova instances? Maybe
>> by re-defining them to OS::Heat::None?
> 
> Not exactly, as Heat would delete the previous versions of the
> resources. We'd need some special migrations, or could support the
> existing method forever for upgrades, and only deprecate it for new
> deployments.

Do I get it right that if we redefine OS::TripleO::{{role.name}}Server to be 
OS::Heat::None, Heat will delete the old {{role.name}}Server instances on the 
next update? This is sad..

I'd prefer not to keep Nova support forever, this is going to be hard to 
maintain and cover by the CI. Should we extend Heat to support "forgetting" 
resources? I think it may have a use case outside of TripleO.

> 
> I'd like to help with this work. I'll start by taking a look at what
> you've got so far. Feel free to reach out if you'd like some
> additional dev assistance or testing.
> 

Thanks!



More information about the OpenStack-dev mailing list