[openstack-dev] [TripleO] containerized undercloud in Queens

Alex Schultz aschultz at redhat.com
Tue Oct 3 22:03:34 UTC 2017


On Tue, Oct 3, 2017 at 2:46 PM, Dan Prince <dprince at redhat.com> wrote:
>
>
> On Tue, Oct 3, 2017 at 3:50 PM, Alex Schultz <aschultz at redhat.com> wrote:
>>
>> On Tue, Oct 3, 2017 at 11:12 AM, Dan Prince <dprince at redhat.com> wrote:
>> > On Mon, 2017-10-02 at 15:20 -0600, Alex Schultz wrote:
>> >> Hey Dan,
>> >>
>> >> Thanks for sending out a note about this. I have a few questions
>> >> inline.
>> >>
>> >> On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince <dprince at redhat.com>
>> >> wrote:
>> >> > One of the things the TripleO containers team is planning on
>> >> > tackling
>> >> > in Queens is fully containerizing the undercloud. At the PTG we
>> >> > created
>> >> > an etherpad [1] that contains a list of features that need to be
>> >> > implemented to fully replace instack-undercloud.
>> >> >
>> >>
>> >> I know we talked about this at the PTG and I was skeptical that this
>> >> will land in Queens. With the exception of the Container's team
>> >> wanting this, I'm not sure there is an actual end user who is looking
>> >> for the feature so I want to make sure we're not just doing more work
>> >> because we as developers think it's a good idea.
>> >
>> > I've heard from several operators that they were actually surprised we
>> > implemented containers in the Overcloud first. Validating a new
>> > deployment framework on a single node Undercloud (for operators) before
>> > overtaking their entire cloud deployment has a lot of merit to it IMO.
>> > When you share the same deployment architecture across the
>> > overcloud/undercloud it puts us in a better position to decide where to
>> > expose new features to operators first (when creating the undercloud or
>> > overcloud for example).
>> >
>> > Also, if you read my email again I've explicitly listed the
>> > "Containers" benefit last. While I think moving the undercloud to
>> > containers is a great benefit all by itself this is more of a
>> > "framework alignment" in TripleO and gets us out of maintaining huge
>> > amounts of technical debt. Re-using the same framework for the
>> > undercloud and overcloud has a lot of merit. It effectively streamlines
>> > the development process for service developers, and 3rd parties wishing
>> > to integrate some of their components on a single node. Why be forced
>> > to create a multi-node dev environment if you don't have to (aren't
>> > using HA for example).
>> >
>> > Lets be honest. While instack-undercloud helped solve the old "seed" VM
>> > issue it was outdated the day it landed upstream. The entire premise of
>> > the tool is that it uses old style "elements" to create the undercloud
>> > and we moved away from those as the primary means driving the creation
>> > of the Overcloud years ago at this point. The new 'undercloud_deploy'
>> > installer gets us back to our roots by once again sharing the same
>> > architecture to create the over and underclouds. A demo from long ago
>> > expands on this idea a bit:  https://www.youtube.com/watch?v=y1qMDLAf26
>> > Q&t=5s
>> >
>> > In short, we aren't just doing more work because developers think it is
>> > a good idea. This has potential to be one of the most useful
>> > architectural changes in TripleO that we've made in years. Could
>> > significantly decrease our CI reasources if we use it to replace the
>> > existing scenarios jobs which take multiple VMs per job. Is a building
>> > block we could use for other features like and HA undercloud. And yes,
>> > it does also have a huge impact on developer velocity in that many of
>> > us already prefer to use the tool as a means of streamlining our
>> > dev/test cycles to minutes instead of hours. Why spend hours running
>> > quickstart Ansible scripts when in many cases you can just doit.sh. htt
>> > ps://github.com/dprince/undercloud_containers/blob/master/doit.sh
>> >
>>
>> So like I've repeatedly said, I'm not completely against it as I agree
>> what we have is not ideal.  I'm not -2, I'm -1 pending additional
>> information. I'm trying to be realistic and reduce our risk for this
>> cycle.
>
>
> This reduces our complexity greatly I think in that once it is completed
> will allow us to eliminate two project (instack and instack-undercloud) and
> the maintenance thereof. Furthermore, as this dovetails nice with the
> Ansible
>

I agree. So I think there's some misconceptions here about my thoughts
on this effort. I am not against this effort. I am for this effort and
wish to see more of it. I want to see the effort communicated publicly
via ML and IRC meetings.  What I am against switching the default
undercloud method until the containerization of the undercloud has the
appropriate test coverage and documentation to ensure it is on par
with what it is replacing.  Does this make sense?

>>
>>  IMHO doit.sh is not acceptable as an undercloud installer and
>> this is what I've been trying to point out as the actual impact to the
>> end user who has to use this thing.
>
>
> doit.sh is an example of where the effort is today. It is essentially the
> same stuff we document online here:
> http://tripleo.org/install/containers_deployment/undercloud.html.
>
> Similar to quickstart it is just something meant to help you setup a dev
> environment.
>

Right, providing something that the non-developer uses vs providing
something for hacking are two separate things. Making it consumable by
the end user (not developer) is what I'm pointing out that needs to be
accounted for.  This is a recurring theme that I have pushed for in
OpenStack to ensure that the operator (actual end user) is accounted
for when making decisions.  Tripleo has not done a good job of this
either.  Sure the referenced documentation works for the dev case, but
probably not the actual deployer/operator case.   There needs to be a
migration guide or documentation of old configuration -> new
configuration for the people who are familiar with non-containerized
undercloud vs containerized undercloud.  Do we have all the use cases
accounted for etc. etc. This is the part that I don't think we have
figured out and which is what I'm asking that we make sure we account
for with this.

>>
>> We have an established
>> installation method for the undercloud, that while isn't great, isn't
>> a bash script with git fetches, etc.  So as for the implementation,
>> this is what I want to see properly flushed out prior to accepting
>> this feature as complete for Queens (and the new default).
>
>
> Of course the feature would need to prove itself before it becomes the new
> default Undercloud. I'm trying to build consensus and get the team focused
> on these things.
>
> What strikes me as odd is your earlier comment about " I want to make sure
> we're not just doing more work because we as developers think it's a good
> idea." I'm a developer and I do think this is a good idea. Please don't try
> to de-motivate this effort just because you happen to believe this. It was
> accepted for Pike and unfortunately we didn't get enough buy in early enough
> to get focus on it. Now that is starting to change and just as it is you are
> suggesting we not keep it a priority?
>

Once again, I agree and I am on board to the end goal that I think is
trying to be achieved by this effort. What I am currently not on board
with is the time frame of for Queens based on concerns previously
mentioned.  This is not about trying to demotivating an effort. It's
about ensuring quality and something that is consumable by an
additional set of end users of the software (the operator/deployer,
not developer).  Given that we have not finished the overcloud
deployment and are still working on fixing items found for that, I
personally feel it's a bit early to consider switching the undercloud
default install to a containerized method.  That being said, I have
repeatedly stated that if we account for updates, upgrades, docs and
the operator UX there's no problems with this effort. I just don't
think it's realistic given current timelines (~9 weeks). Please feel
free to provide information/patches to the contrary.  I have not said
don't work on it.  I just want to make sure we have all the pieces in
place needed to consider it a proper replacement for the existing
undercloud installation (by M2).  If anything there's probably more
work that needs to be done and if we want to make it a priority to
happen, then it needs to be documented and communicated so folks can
assist as they have cycles.

>
>>
>> I would
>> like to see a plan of what features need to be added (eg. the stuff on
>> the etherpad), folks assigned to do this work, and estimated
>> timelines.  Given that we shouldn't be making major feature changes
>> after M2 (~9 weeks), I want to get an understanding of what is
>> realistically going to make it.  If after reviewing the initial
>> details we find that it's not actually going to make M2, then let's
>> agree to this now rather than trying to force it in at the end.
>
>
> All of this is forthcoming. Those details will come in time.
>
>>
>>
>> I know you've been a great proponent of the containerized undercloud
>> and I agree it offers a lot more for development efforts. But I just
>> want to make sure that we are getting all the feedback we can before
>> continuing down this path.  Since, as you point out, a bunch of this
>> work is already available for consumption by developers, I don't see
>> making it the new default as a requirement for Queens unless it's a
>> fully implemented and tested.  There's nothing stopping folks from
>> using it now and making incremental improvements during Queens and we
>> commit to making it the new default for Rocky.
>>
>> The point of this cycle was supposed to be more stablization/getting
>> all the containers in place. Doing something like this seems to go
>> against what we were actually trying to achieve.  I'd rather make
>> smaller incremental progress with your proposal being the end goal and
>> agreeing that perhaps Rocky is more realistic for the default cut
>> over.
>
>
> I thought the point of this release was full containerization? And part of
> that is containerizing the undercloud too right?
>

Not that I was aware of. Others have asked because they have not been
aware that it included the undercloud.  Given that we are wanting to
eventually look to kubernetes maybe we don't need to containerize the
undercloud as it may be it could be discarded with that switch.
That's probably a longer discussion. It might need to be researched
which is why it's important to understand why we're doing the
containerization effort and what exactly it entails.  Given that I
don't think we're looking to deploy kubernetes via
THT/tripleo-puppet/containers, I wonder what impact this would have
with this effort?  That's probably a conversation for another thread.

>>
>>
>> > Lastly, this isn't just a containers team thing. We've been using the
>> > undercloud_deploy architecture across many teams to help develop for
>> > almost an entire cycle now. Huge benefits. I would go as far as saying
>> > that undercloud_deploy was *the* biggest feature in Pike that enabled
>> > us to bang out a majority of the docker/service templates in tripleo-
>> > heat-templates.
>> >
>> >>  Given that etherpad
>> >> appears to contain a pretty big list of features, are we going to be
>> >> able to land all of them by M2?  Would it be beneficial to craft a
>> >> basic spec related to this to ensure we are not missing additional
>> >> things?
>> >
>> > I'm not sure there is a lot of value in creating a spec at this point.
>> > We've already got an approved blueprint for the feature in Pike here: h
>> > ttps://blueprints.launchpad.net/tripleo/+spec/containerized-undercloud
>> >
>> > I think we might get more velocity out of grooming the etherpad and
>> > perhaps dividing this work among the appropriate teams.
>> >
>>
>> That's fine, but I would like to see additional efforts made to
>> organize this work, assign folks and add proper timelines.
>>
>> >>
>> >> > Benefits of this work:
>> >> >
>> >> >  -Alignment: aligning the undercloud and overcloud installers gets
>> >> > rid
>> >> > of dual maintenance of services.
>> >> >
>> >>
>> >> I like reusing existing stuff. +1
>> >>
>> >> >  -Composability: tripleo-heat-templates and our new Ansible
>> >> > architecture around it are composable. This means any set of
>> >> > services
>> >> > can be used to build up your own undercloud. In other words the
>> >> > framework here isn't just useful for "underclouds". It is really
>> >> > the
>> >> > ability to deploy Tripleo on a single node with no external
>> >> > dependencies. Single node TripleO installer. The containers team
>> >> > has
>> >> > already been leveraging existing (experimental) undercloud_deploy
>> >> > installer to develop services for Pike.
>> >> >
>> >>
>> >> Is this something that is actually being asked for or is this just an
>> >> added bonus because it allows developers to reduce what is actually
>> >> being deployed for testing?
>> >
>> > There is an implied ask for this feature when a new developer starts to
>> > use TripleO. Right now resource bar is quite high for TripleO. You have
>> > to have a multi-node development environment at the very least (one
>> > undercloud node, and one overcloud node). The ideas we are talking
>> > about here short circuits this in many cases... where if you aren't
>> > testing HA services or Ironic you could simple use undercloud_deploy to
>> > test tripleo-heat-template changes on a single VM. Less resources, and
>> > much less time spent learning and waiting.
>> >
>>
>> IMHO I don't think the undercloud install is the limiting factor for
>> new developers and I'm not sure this is actually reducing that
>> complexity.  It does reduce the amount of hardware needed to develop
>> some items, but there's a cost in complexity by moving the
>> configuration to THT which is already where many people struggle.  As
>> I previously mentioned, there's nothing stopping us from promoting the
>> containerized undercloud as a development tool and ensuring it's full
>> featured before switching to it as the default at a later date.
>
>
> Because the new undercloud_deploy installer uses t-h-t we get containers for
> free. Additionally as we convert over to Ansible instead of Heat software
> deployments we also get better operator feedback there as well. Woudn't it
> be nice to have an Undercloud installer driven by Ansible instead of Python
> and tripleo-image-elements?

Yup, and once again I recognize this as a benefit.

>
> The reason I linked in doit.sh above (and if you actually go and look at the
> recent patches) we are already wiring these things up right now (before M1!)
> and it looks really nice. As we eventually move away from Puppet for
> configuration that too goes away. So I think the idea here is a
> net-reduction in complexity because we no longer have to maintain
> instack-undercloud, puppet modules, and elements.
>
> It isn't that the undercloud install is a limiting factor. It is that the
> set of services making up your "Undercloud" can be anything you want because
> t-h-t supports all of our services. Anything you want with minimal t-h-t,
> Ansible, and containers. This means you can effectively develop on a single
> node for many cases and it will just work in a multi-node Overcloud setup
> too because we have the same architecture.
>

My concern is making sure we aren't moving too fast and introducing
more regressions/bugs/missing use cases/etc. My hope is by documenting
all of this, ensuring we have proper expectations around a definition
of done (and time frames), and allowing for additional review, we will
reduce the risk introduced by this switch.  These types of things
align with what we talked about at the PTG in during the retro[0]
(see: start define definition of done,  start status reporting on ML,
stop over committing, stop big change without tests, less complexity,
etc, etc).  This stuff's complicated, let's make sure we do it right.

Thanks,
-Alex

[0] http://people.redhat.com/aschultz/denver-ptg/tripleo-ptg-retro.jpg

> Dan
>
>> >>
>> >> >  -Development: The containerized undercloud is a great development
>> >> > tool. It utilizes the same framework as the full overcloud
>> >> > deployment
>> >> > but takes about 20 minutes to deploy.  This means faster
>> >> > iterations,
>> >> > less waiting, and more testing.  Having this be a first class
>> >> > citizen
>> >> > in the ecosystem will ensure this platform is functioning for
>> >> > developers to use all the time.
>> >> >
>> >>
>> >> Seems to go with the previous question about the re-usability for
>> >> people who are not developers.  Has everyone (including non-container
>> >> folks) tried this out and attest that it's a better workflow for
>> >> them?
>> >>  Are there use cases that are made worse by switching?
>> >
>> > I would let other chime in but the feedback I've gotten has mostly been
>> >  that it improves the dev/test cycle greatly.
>> >
>> >>
>> >> >  -CI resources: better use of CI resources. At the PTG we received
>> >> > feedback from the OpenStack infrastructure team that our upstream
>> >> > CI
>> >> > resource usage is quite high at times (even as high as 50% of the
>> >> > total). Because of the shared framework and single node
>> >> > capabilities we
>> >> > can re-architecture much of our upstream CI matrix around single
>> >> > node.
>> >> > We no longer require multinode jobs to be able to test many of the
>> >> > services in tripleo-heat-templates... we can just use a single
>> >> > cloud VM
>> >> > instead. We'll still want multinode undercloud -> overcloud jobs
>> >> > for
>> >> > testing things like HA and baremetal provisioning. But we can cover
>> >> > a
>> >> > large set of the services (in particular many of the new scenario
>> >> > jobs
>> >> > we added in Pike) with single node CI test runs in much less time.
>> >> >
>> >>
>> >> I like this idea but would like to see more details around this.
>> >> Since this is a new feature we need to make sure that we are properly
>> >> covering the containerized undercloud with CI as well.  I think we
>> >> need 3 jobs to properly cover this feature before marking it done. I
>> >> added them to the etherpad but I think we need to ensure the
>> >> following
>> >> 3 jobs are defined and voting by M2 to consider actually switching
>> >> from the current instack-undercloud installation to the containerized
>> >> version.
>> >>
>> >> 1) undercloud-containers - a containerized install, should be voting
>> >> by m1
>> >> 2) undercloud-containers-update - minor updates run on containerized
>> >> underclouds, should be voting by m2
>> >> 3) undercloud-containers-upgrade - major upgrade from
>> >> non-containerized to containerized undercloud, should be voting by
>> >> m2.
>> >>
>> >> If we have these jobs, is there anything we can drop or mark as
>> >> covered that is currently being covered by an overcloud job?
>> >>
>>
>> Can you please comment on these expectations as being achievable?  If
>> they are not achievable, I don't think we can agree to switch the
>> default for Queens.  As we shipped the 'undercloud deploy' as
>> experimental for Pike, it's well within reason to continue to do so
>> for Queens. Perhaps we change the labeling to beta or working it into
>> a --containerized option for 'undercloud install'.
>>
>> I think my ask for the undercloud-containers job as non-voting by m1
>> is achievable today because it's currently green (pending any zuul
>> freezes). My concern is really minor updates and upgrades need to be
>> understood and accounted for ASAP.  If we're truly able to reuse some
>> of the work we did for O->P upgrades, then these should be fairly
>> straight forward things to accomplish and there would be fewer
>> blockers to make the switch.
>>
>> >> >  -Containers: There are no plans to containerize the existing
>> >> > instack-
>> >> > undercloud work. By moving our undercloud installer to a tripleo-
>> >> > heat-
>> >> > templates and Ansible architecture we can leverage containers.
>> >> > Interestingly, the same installer also supports baremetal (package)
>> >> > installation as well at this point. Like to overcloud however I
>> >> > think
>> >> > making containers our undercloud default would better align the
>> >> > TripleO
>> >> > tooling.
>> >> >
>> >> > We are actively working through a few issues with the deployment
>> >> > framework Ansible effort to fully integrate that into the
>> >> > undercloud
>> >> > installer. We are also reaching out to other teams like the UI and
>> >> > Security folks to coordinate the efforts around those components.
>> >> > If
>> >> > there are any questions about the effort or you'd like to be
>> >> > involved
>> >> > in the implementation let us know. Stay tuned for more specific
>> >> > updates
>> >> > as we organize to get as much of this in M1 and M2 as possible.
>> >> >
>> >>
>> >> I would like to see weekly updates on this effort during the IRC
>> >> meeting. As previously mentioned around squad status, I'll be asking
>> >> for them during the meeting so it would be nice to get an update this
>> >> on a weekly basis so we can make sure that we'll be OK to cut over.
>> >>
>> >> Also what does the cut over plan look like?  This is something that
>> >> might be beneficial to have in a spec. IMHO, I'm ok to continue
>> >> pushing the container effort using the openstack undercloud deploy
>> >> method for now. Once we have voting CI jobs and the feature list has
>> >> been covered then we can evaluate if we've made the M2 time frame to
>> >> switching openstack undercloud deploy to be the new undercloud
>> >> install.  I want to make sure we don't introduce regressions and are
>> >> doing thing in a user friendly fashion since the undercloud is the
>> >> first intro an end user gets to tripleo. It would be a good idea to
>> >> review what the new install process looks like and make sure it "just
>> >> works" given that the current process[0] (with all it's flaws) is
>> >> fairly trivial to perform.
>> >>
>>
>> Basically what I would like to see before making this new default is:
>> 1) minor updates work (with CI)
>> 2) P->Q upgrades work (with CI)
>> 3) Documentation complete
>> 4) no UX impact for installation (eg. how they installed it before is
>> the same as they install it now for containers)
>>
>> If these are accounted for and completed before M2 then I would be +2
>> on the switch.
>>
>> >> Thanks,
>> >> -Alex
>> >>
>> >> [0] https://docs.openstack.org/tripleo-docs/latest/install/installati
>> >> on/installation.html#installing-the-undercloud
>> >>
>> >> > On behalf of the containers team,
>> >> >
>> >> > Dan
>> >> >
>> >> > [1] https://etherpad.openstack.org/p/tripleo-queens-undercloud-cont
>> >> > aine
>> >> > rs
>> >> >
>> >> > ___________________________________________________________________
>> >> > _______
>> >> > OpenStack Development Mailing List (not for usage questions)
>> >> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsu
>> >> > bscribe
>> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >>
>> >> _____________________________________________________________________
>> >> _____
>> >> OpenStack Development Mailing List (not for usage questions)
>> >> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubs
>> >> cribe
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> >
>> > __________________________________________________________________________
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list