[openstack-dev] [TripleO] containerized undercloud in Queens

Alex Schultz aschultz at redhat.com
Tue Oct 3 19:50:50 UTC 2017

On Tue, Oct 3, 2017 at 11:12 AM, Dan Prince <dprince at redhat.com> wrote:
> On Mon, 2017-10-02 at 15:20 -0600, Alex Schultz wrote:
>> Hey Dan,
>> Thanks for sending out a note about this. I have a few questions
>> inline.
>> On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince <dprince at redhat.com>
>> wrote:
>> > One of the things the TripleO containers team is planning on
>> > tackling
>> > in Queens is fully containerizing the undercloud. At the PTG we
>> > created
>> > an etherpad [1] that contains a list of features that need to be
>> > implemented to fully replace instack-undercloud.
>> >
>> I know we talked about this at the PTG and I was skeptical that this
>> will land in Queens. With the exception of the Container's team
>> wanting this, I'm not sure there is an actual end user who is looking
>> for the feature so I want to make sure we're not just doing more work
>> because we as developers think it's a good idea.
> I've heard from several operators that they were actually surprised we
> implemented containers in the Overcloud first. Validating a new
> deployment framework on a single node Undercloud (for operators) before
> overtaking their entire cloud deployment has a lot of merit to it IMO.
> When you share the same deployment architecture across the
> overcloud/undercloud it puts us in a better position to decide where to
> expose new features to operators first (when creating the undercloud or
> overcloud for example).
> Also, if you read my email again I've explicitly listed the
> "Containers" benefit last. While I think moving the undercloud to
> containers is a great benefit all by itself this is more of a
> "framework alignment" in TripleO and gets us out of maintaining huge
> amounts of technical debt. Re-using the same framework for the
> undercloud and overcloud has a lot of merit. It effectively streamlines
> the development process for service developers, and 3rd parties wishing
> to integrate some of their components on a single node. Why be forced
> to create a multi-node dev environment if you don't have to (aren't
> using HA for example).
> Lets be honest. While instack-undercloud helped solve the old "seed" VM
> issue it was outdated the day it landed upstream. The entire premise of
> the tool is that it uses old style "elements" to create the undercloud
> and we moved away from those as the primary means driving the creation
> of the Overcloud years ago at this point. The new 'undercloud_deploy'
> installer gets us back to our roots by once again sharing the same
> architecture to create the over and underclouds. A demo from long ago
> expands on this idea a bit:  https://www.youtube.com/watch?v=y1qMDLAf26
> Q&t=5s
> In short, we aren't just doing more work because developers think it is
> a good idea. This has potential to be one of the most useful
> architectural changes in TripleO that we've made in years. Could
> significantly decrease our CI reasources if we use it to replace the
> existing scenarios jobs which take multiple VMs per job. Is a building
> block we could use for other features like and HA undercloud. And yes,
> it does also have a huge impact on developer velocity in that many of
> us already prefer to use the tool as a means of streamlining our
> dev/test cycles to minutes instead of hours. Why spend hours running
> quickstart Ansible scripts when in many cases you can just doit.sh. htt
> ps://github.com/dprince/undercloud_containers/blob/master/doit.sh

So like I've repeatedly said, I'm not completely against it as I agree
what we have is not ideal.  I'm not -2, I'm -1 pending additional
information. I'm trying to be realistic and reduce our risk for this
cycle.   IMHO doit.sh is not acceptable as an undercloud installer and
this is what I've been trying to point out as the actual impact to the
end user who has to use this thing. We have an established
installation method for the undercloud, that while isn't great, isn't
a bash script with git fetches, etc.  So as for the implementation,
this is what I want to see properly flushed out prior to accepting
this feature as complete for Queens (and the new default).  I would
like to see a plan of what features need to be added (eg. the stuff on
the etherpad), folks assigned to do this work, and estimated
timelines.  Given that we shouldn't be making major feature changes
after M2 (~9 weeks), I want to get an understanding of what is
realistically going to make it.  If after reviewing the initial
details we find that it's not actually going to make M2, then let's
agree to this now rather than trying to force it in at the end.

I know you've been a great proponent of the containerized undercloud
and I agree it offers a lot more for development efforts. But I just
want to make sure that we are getting all the feedback we can before
continuing down this path.  Since, as you point out, a bunch of this
work is already available for consumption by developers, I don't see
making it the new default as a requirement for Queens unless it's a
fully implemented and tested.  There's nothing stopping folks from
using it now and making incremental improvements during Queens and we
commit to making it the new default for Rocky.

The point of this cycle was supposed to be more stablization/getting
all the containers in place. Doing something like this seems to go
against what we were actually trying to achieve.  I'd rather make
smaller incremental progress with your proposal being the end goal and
agreeing that perhaps Rocky is more realistic for the default cut

> Lastly, this isn't just a containers team thing. We've been using the
> undercloud_deploy architecture across many teams to help develop for
> almost an entire cycle now. Huge benefits. I would go as far as saying
> that undercloud_deploy was *the* biggest feature in Pike that enabled
> us to bang out a majority of the docker/service templates in tripleo-
> heat-templates.
>>  Given that etherpad
>> appears to contain a pretty big list of features, are we going to be
>> able to land all of them by M2?  Would it be beneficial to craft a
>> basic spec related to this to ensure we are not missing additional
>> things?
> I'm not sure there is a lot of value in creating a spec at this point.
> We've already got an approved blueprint for the feature in Pike here: h
> ttps://blueprints.launchpad.net/tripleo/+spec/containerized-undercloud
> I think we might get more velocity out of grooming the etherpad and
> perhaps dividing this work among the appropriate teams.

That's fine, but I would like to see additional efforts made to
organize this work, assign folks and add proper timelines.

>> > Benefits of this work:
>> >
>> >  -Alignment: aligning the undercloud and overcloud installers gets
>> > rid
>> > of dual maintenance of services.
>> >
>> I like reusing existing stuff. +1
>> >  -Composability: tripleo-heat-templates and our new Ansible
>> > architecture around it are composable. This means any set of
>> > services
>> > can be used to build up your own undercloud. In other words the
>> > framework here isn't just useful for "underclouds". It is really
>> > the
>> > ability to deploy Tripleo on a single node with no external
>> > dependencies. Single node TripleO installer. The containers team
>> > has
>> > already been leveraging existing (experimental) undercloud_deploy
>> > installer to develop services for Pike.
>> >
>> Is this something that is actually being asked for or is this just an
>> added bonus because it allows developers to reduce what is actually
>> being deployed for testing?
> There is an implied ask for this feature when a new developer starts to
> use TripleO. Right now resource bar is quite high for TripleO. You have
> to have a multi-node development environment at the very least (one
> undercloud node, and one overcloud node). The ideas we are talking
> about here short circuits this in many cases... where if you aren't
> testing HA services or Ironic you could simple use undercloud_deploy to
> test tripleo-heat-template changes on a single VM. Less resources, and
> much less time spent learning and waiting.

IMHO I don't think the undercloud install is the limiting factor for
new developers and I'm not sure this is actually reducing that
complexity.  It does reduce the amount of hardware needed to develop
some items, but there's a cost in complexity by moving the
configuration to THT which is already where many people struggle.  As
I previously mentioned, there's nothing stopping us from promoting the
containerized undercloud as a development tool and ensuring it's full
featured before switching to it as the default at a later date.

>> >  -Development: The containerized undercloud is a great development
>> > tool. It utilizes the same framework as the full overcloud
>> > deployment
>> > but takes about 20 minutes to deploy.  This means faster
>> > iterations,
>> > less waiting, and more testing.  Having this be a first class
>> > citizen
>> > in the ecosystem will ensure this platform is functioning for
>> > developers to use all the time.
>> >
>> Seems to go with the previous question about the re-usability for
>> people who are not developers.  Has everyone (including non-container
>> folks) tried this out and attest that it's a better workflow for
>> them?
>>  Are there use cases that are made worse by switching?
> I would let other chime in but the feedback I've gotten has mostly been
>  that it improves the dev/test cycle greatly.
>> >  -CI resources: better use of CI resources. At the PTG we received
>> > feedback from the OpenStack infrastructure team that our upstream
>> > CI
>> > resource usage is quite high at times (even as high as 50% of the
>> > total). Because of the shared framework and single node
>> > capabilities we
>> > can re-architecture much of our upstream CI matrix around single
>> > node.
>> > We no longer require multinode jobs to be able to test many of the
>> > services in tripleo-heat-templates... we can just use a single
>> > cloud VM
>> > instead. We'll still want multinode undercloud -> overcloud jobs
>> > for
>> > testing things like HA and baremetal provisioning. But we can cover
>> > a
>> > large set of the services (in particular many of the new scenario
>> > jobs
>> > we added in Pike) with single node CI test runs in much less time.
>> >
>> I like this idea but would like to see more details around this.
>> Since this is a new feature we need to make sure that we are properly
>> covering the containerized undercloud with CI as well.  I think we
>> need 3 jobs to properly cover this feature before marking it done. I
>> added them to the etherpad but I think we need to ensure the
>> following
>> 3 jobs are defined and voting by M2 to consider actually switching
>> from the current instack-undercloud installation to the containerized
>> version.
>> 1) undercloud-containers - a containerized install, should be voting
>> by m1
>> 2) undercloud-containers-update - minor updates run on containerized
>> underclouds, should be voting by m2
>> 3) undercloud-containers-upgrade - major upgrade from
>> non-containerized to containerized undercloud, should be voting by
>> m2.
>> If we have these jobs, is there anything we can drop or mark as
>> covered that is currently being covered by an overcloud job?

Can you please comment on these expectations as being achievable?  If
they are not achievable, I don't think we can agree to switch the
default for Queens.  As we shipped the 'undercloud deploy' as
experimental for Pike, it's well within reason to continue to do so
for Queens. Perhaps we change the labeling to beta or working it into
a --containerized option for 'undercloud install'.

I think my ask for the undercloud-containers job as non-voting by m1
is achievable today because it's currently green (pending any zuul
freezes). My concern is really minor updates and upgrades need to be
understood and accounted for ASAP.  If we're truly able to reuse some
of the work we did for O->P upgrades, then these should be fairly
straight forward things to accomplish and there would be fewer
blockers to make the switch.

>> >  -Containers: There are no plans to containerize the existing
>> > instack-
>> > undercloud work. By moving our undercloud installer to a tripleo-
>> > heat-
>> > templates and Ansible architecture we can leverage containers.
>> > Interestingly, the same installer also supports baremetal (package)
>> > installation as well at this point. Like to overcloud however I
>> > think
>> > making containers our undercloud default would better align the
>> > TripleO
>> > tooling.
>> >
>> > We are actively working through a few issues with the deployment
>> > framework Ansible effort to fully integrate that into the
>> > undercloud
>> > installer. We are also reaching out to other teams like the UI and
>> > Security folks to coordinate the efforts around those components.
>> > If
>> > there are any questions about the effort or you'd like to be
>> > involved
>> > in the implementation let us know. Stay tuned for more specific
>> > updates
>> > as we organize to get as much of this in M1 and M2 as possible.
>> >
>> I would like to see weekly updates on this effort during the IRC
>> meeting. As previously mentioned around squad status, I'll be asking
>> for them during the meeting so it would be nice to get an update this
>> on a weekly basis so we can make sure that we'll be OK to cut over.
>> Also what does the cut over plan look like?  This is something that
>> might be beneficial to have in a spec. IMHO, I'm ok to continue
>> pushing the container effort using the openstack undercloud deploy
>> method for now. Once we have voting CI jobs and the feature list has
>> been covered then we can evaluate if we've made the M2 time frame to
>> switching openstack undercloud deploy to be the new undercloud
>> install.  I want to make sure we don't introduce regressions and are
>> doing thing in a user friendly fashion since the undercloud is the
>> first intro an end user gets to tripleo. It would be a good idea to
>> review what the new install process looks like and make sure it "just
>> works" given that the current process[0] (with all it's flaws) is
>> fairly trivial to perform.

Basically what I would like to see before making this new default is:
1) minor updates work (with CI)
2) P->Q upgrades work (with CI)
3) Documentation complete
4) no UX impact for installation (eg. how they installed it before is
the same as they install it now for containers)

If these are accounted for and completed before M2 then I would be +2
on the switch.

>> Thanks,
>> -Alex
>> [0] https://docs.openstack.org/tripleo-docs/latest/install/installati
>> on/installation.html#installing-the-undercloud
>> > On behalf of the containers team,
>> >
>> > Dan
>> >
>> > [1] https://etherpad.openstack.org/p/tripleo-queens-undercloud-cont
>> > aine
>> > rs
>> >
>> > ___________________________________________________________________
>> > _______
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsu
>> > bscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> _____________________________________________________________________
>> _____
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubs
>> cribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list