[TripleO] Scaling node counts with only Ansible (N=1)
James Slagle
james.slagle at gmail.com
Tue Jul 16 12:15:02 UTC 2019
On Mon, Jul 15, 2019 at 2:25 PM Dan Sneddon <dsneddon at redhat.com> wrote:
>
>
>
> On Mon, Jul 15, 2019 at 2:13 AM Harald Jensås <hjensas at redhat.com> wrote:
>>
>> On Sat, 2019-07-13 at 16:19 -0400, James Slagle wrote:
>> > On Fri, Jul 12, 2019 at 3:59 PM Harald Jensås <hjensas at redhat.com>
>> > wrote:
>> > > I've said this before, but I think we should turn this nova-less
>> > > around. Now with nova-less we create a bunch of servers, and write
>> > > up
>> > > the parameters file to use the deployed-server approach.
>> > > Effectively we
>> > > still neet to have the resource group in heat making a server
>> > > resource
>> > > for every server. Creating the fake server resource is fast,
>> > > because
>> > > Heat does'nt call Nova,Ironic to create any resources. But the
>> > > stack is
>> > > equally big, with a stack for every node. i.e not N=1.
>> > >
>> > > What you are doing here, is essentially to say we don't create a
>> > > resource group that then creates N number of role stacks, one for
>> > > each
>> > > overcloud node. You are creating a single generic "server"
>> > > definition
>> > > per Role. So we drop the resource group and create
>> > > OS::Triple::{{Role}}.Server 1-time (once). To me it's backwards to
>> > > push
>> > > a large struct with properties for N=many nodes into the creation
>> > > of
>> > > that stack.
>> >
>> > I'm not entirely following what you're saying is backwards. What I've
>> > proposed is that we *don't* have any node specific data in the stack.
>> > It sounds like you're saying the way we do it today is backwards.
>> >
>>
>> What I mean to say is that I think the way we are integrating nova-less
>> by first deploying the servers, to then provide the data to Heat to
>> create the resource groups as we do today becomes backwards when your
>> work on N=1 is introduced.
>>
>>
>> > It's correct that what's been proposed with metalsmith currently
>> > still
>> > requires the full ResourceGroup with a member for each node. With the
>> > template changes I'm proposing, that wouldn't be required, so we
>> > could
>> > actually do the Heat stack first, then metalsmith.
>> >
>>
>> Yes, this is what I think we should do. Especially if your changes here
>> removes the resource group entirely. It makes more sense to create the
>> stack, and once that is created we can do deployment, scaling etc
>> without updating the stack again.
>>
>> > > Currently the puppet/role-role.yaml creates all the network ports
>> > > etc.
>> > > As you only want to create it once, it instead could simply output
>> > > the
>> > > UUID of the networks+subnets. These are identical for all servers
>> > > in
>> > > the role. So we end up with a small heat stack.
>> > >
>> > > Once the stack is created we could use that generic "server" role
>> > > data
>> > > to feed into something (ansible?, python?, mistral?) that calls
>> > > metalsmith to build the servers, then create ports for each server
>> > > in
>> > > neutron, one port for each network+subnet defined in the role. Then
>> > > feed that output into the json (hieradata) that is pushed to each
>> > > node
>> > > and used during service configuration, all the things we need to
>> > > configure network interfaces, /etc/hosts and so on. We need a way
>> > > to
>> > > keep track of which ports belong to wich node, but I guess
>> > > something
>> > > simple like using the node's ironic UUID in either the name,
>> > > description or tag field of the neutron port will work. There is
>> > > also
>> > > the extra filed in Ironic which is json type, so we could place a
>> > > map
>> > > of network->port_uuid in there as well.
>> >
>> > It won't matter whether we do baremetal provisioning before or after
>> > the Heat stack. Heat won't care, as it won't have any expectation to
>> > create any servers or that they are already created. We can define
>> > where we end up calling the metalsmith piece as it should be
>> > independent of the Heat stack if we make these template changes.
>> >
>>
>> This is true. But, in your previous mail in this thread you wrote:
>>
>> """
>> Other points:
>>
>> - Baremetal provisioning and port creation are presently handled by
>> Heat. With the ongoing efforts to migrate baremetal provisioning out
>> of Heat (nova-less deploy), I think these efforts are very
>> complimentary. Eventually, we get to a point where Heat is not
>> actually creating any other OpenStack API resources. For now, the
>> patches only work when using pre-provisioned nodes.
>> """
>>
>> IMO "baremetal provision and port creation" fit together. (I read the
>> above statement so as well.) Currently nova-less creates the ctlplane
>> port and provision the baremetal node. If we want to do both baremetal
>> provisioning and port creation togheter (I think this makes sense), we
>> have to do it after the stack has created the networks.
>>
>> What I envision is to have one method that creates all the ports,
>> ctlplane + composable networks in a unified way. Today these are
>> created differently, the ctlplane port is part of the server resource
>> (or metalsmith in nova-less case) and the other ports are created by
>> heat.
>
>
> This is my main question about this proposal. When TripleO was in its infancy, there wasn't a mechanism to create Neutron ports separately from the server, so we created a Nova Server resource that specified which network the port was on (originally there was only one port created, now we create additional ports in Neutron). This can be seen in the puppet/<role>-role.yaml file, for example:
>
> resources:
> Controller:
> type: OS::TripleO::ControllerServer
> deletion_policy: {get_param: ServerDeletionPolicy}
> metadata:
> os-collect-config:
> command: {get_param: ConfigCommand}
> splay: {get_param: ConfigCollectSplay}
> properties:
> [...]
> networks:
> - if:
> - ctlplane_fixed_ip_set
> - network: ctlplane
> subnet: {get_param: ControllerControlPlaneSubnet}
> fixed_ip:
> yaql:
> expression: $.data.where(not isEmpty($)).first()
> data:
> - get_param: [ControllerIPs, 'ctlplane', {get_param: NodeIndex}]
> - network: ctlplane
> subnet: {get_param: ControllerControlPlaneSubnet}
>
> This has the side-effect that the ports are created by Nova calling Neutron rather than by Heat calling Neutron for port creation. We have maintained this mechanism even in the latest versions of THT for backwards compatibility. This would all be easier if we were creating the Neutron ctlplane port and then assigning it to the server, but that breaks backwards-compatibility.
>
> How would the creation of the ctlplane port be handled in this proposal? If metalsmith is creating the ctlplane port, do we still need a separate Server resource for every node? If so, I imagine it would have a much smaller stack than what we currently create for each server. If not, would metalsmith create a port on the ctlplane as part of the provisioning steps, and then pass this port back? We still need to be able to support fixed IPs for ctlplane ports, so we need to be able to pass a specific IP to metalsmith.
I think most of your questions pertain to defining the right interface
for baremetal provisioning with metalsmith. We more or less have a
clean slate there in terms of how we want that to look going forward.
Given that it won't use Nova, my understanding is that the port(s)
will be created via Neutron directly.
We won't need separate server resources in the stack for every node
once provisioning is not part of the stack. We will need to look at
how we are creating the other network isolation ports per server
however. It's something that we'll need to look at to see if we want
to keep using Neutron just for IPAM. It seems a little wasteful to me,
but perhaps it's not an issue even with thousands of ports.
Initially, you'd be able to scale with just Ansible as long as the
operator does mistakenly use overlapping IP's. We could also add
ansible tasks that created the ports in Neutron (or verified they were
already created) so that the actual IPAM usage is properly reflected
in Neutron.
--
-- James Slagle
--
More information about the openstack-discuss
mailing list