[openstack-dev] [TripleO] config-download/ansible next steps

Steven Hardy shardy at redhat.com
Mon Jun 18 13:43:27 UTC 2018


On Mon, Jun 18, 2018 at 1:51 PM, Dmitry Tantsur <dtantsur at redhat.com> wrote:
> On 06/13/2018 03:17 PM, James Slagle wrote:
>>
>> On Wed, Jun 13, 2018 at 6:49 AM, Dmitry Tantsur <dtantsur at redhat.com>
>> wrote:
>>>
>>> Slightly hijacking the thread to provide a status update on one of the
>>> items
>>> :)
>>
>>
>> Thanks for jumping in.
>>
>>
>>> The immediate plan right now is to wait for metalsmith 0.4.0 to hit the
>>> repositories, then start experimenting. I need to find a way to
>>> 1. make creating nova instances no-op
>>> 2. collect the required information from the created stack (I need
>>> networks,
>>> ports, hostnames, initial SSH keys, capabilities, images)
>>> 3. update the config-download code to optionally include the role [2]
>>> I'm not entirely sure where to start, so any hints are welcome.
>>
>>
>> Here are a couple of possibilities.
>>
>> We could reuse the OS::TripleO::{{role.name}}Server mappings that we
>> already have in place for pre-provisioned nodes (deployed-server).
>> This could be mapped to a template that exposes some Ansible tasks as
>> outputs that drives metalsmith to do the deployment. When
>> config-download runs, it would execute these ansible tasks to
>> provision the nodes with Ironic. This has the advantage of maintaining
>> compatibility with our existing Heat parameter interfaces. It removes
>> Nova from the deployment so that from the undercloud perspective you'd
>> roughly have:
>>
>> Mistral -> Heat -> config-download -> Ironic (driven via
>> ansible/metalsmith)
>
>
> One thing that came to my mind while planning this work is that I'd prefer
> all nodes to be processed in one step. This will help avoiding some issues
> that we have now. For example, the following does not work reliably:
>
>  compute-0: just any profile:compute
>  compute-1: precise node=abcd
>  control-0: any node
>
> This has two issues that will pop up randomly:
> 1. compute-0 can pick node abcd designated for compute-1
> 2. control-0 can pick a compute node, failing either compute-0 or compute-1
>
> This problem is hard to fix if all deployment requests are processed
> separately, but is quite trivial if the decision is done based on the whole
> deployment plan. I'm going to work on a bulk scheduler like that in
> metalsmith.
>
>>
>> A further (or completely different) iteration might look like:
>>
>> Step 1: Mistral -> Ironic (driven via ansible/metalsmith)
>> Step 2: Heat -> config-download
>
>
> Step 1 will still use provided environment to figure out the count of nodes
> for each role, their images, capabilities and (optionally) precise node
> scheduling?
> I'm a bit worried about the last bit: IIRC we rely on Heat's %index%
> variable currently. We can, of course, ask people to replace it with
> something more explicit on upgrade.
>
>>
>> Step 2 would use the pre-provisioned node (deployed-server)  feature
>> already existing in TripleO and treat the just provisioned by Ironic
>> nodes, as pre-provisioned from the Heat stack perspective. Step 1 and
>> Step 2 would also probably be driven by a higher level Mistral
>> workflow. This has the advantage of minimal impact to
>> tripleo-heat-templates, and also removes Heat from the baremetal
>> provisioning step. However, we'd likely need some python compatibility
>> libraries that could translate Heat parameter values such as
>> HostnameMap to ansible vars for some basic backwards compatibility.
>
>
> Overall, I like this option better. It will allow an operator to isolate the
> bare metal provisioning step from everything else.
>
>>
>>>
>>> [1] https://github.com/openstack/metalsmith
>>> [2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html
>>>
>>>>
>>>> Obviously we have things to consider here such as backwards
>>>> compatibility
>>>> and
>>>> upgrades, but overall, I think this would be a great simplification to
>>>> our
>>>> overall deployment workflow.
>>>>
>>>
>>> Yeah, this is tricky. Can we make Heat "forget" about Nova instances?
>>> Maybe
>>> by re-defining them to OS::Heat::None?
>>
>>
>> Not exactly, as Heat would delete the previous versions of the
>> resources. We'd need some special migrations, or could support the
>> existing method forever for upgrades, and only deprecate it for new
>> deployments.
>
>
> Do I get it right that if we redefine OS::TripleO::{{role.name}}Server to be
> OS::Heat::None, Heat will delete the old {{role.name}}Server instances on
> the next update? This is sad..
>
> I'd prefer not to keep Nova support forever, this is going to be hard to
> maintain and cover by the CI. Should we extend Heat to support "forgetting"
> resources? I think it may have a use case outside of TripleO.

This is already supported, it's just not the default:

https://docs.openstack.org/heat/latest/template_guide/hot_spec.html#resources-section

you can used e.g deletion_policy: retain to skip the deletion of the
underlying heat-managed resource.

Steve



More information about the OpenStack-dev mailing list