[openstack-dev] [tripleo] let's talk (development) environment deployment tooling and workflows

Alex Schultz aschultz at redhat.com
Wed Sep 21 16:37:10 UTC 2016


On Wed, Sep 21, 2016 at 9:00 AM, John Trowbridge <trown at redhat.com> wrote:
>
>
>
> On 09/19/2016 01:21 PM, Steven Hardy wrote:
> > Hi Alex,
> >
> > Firstly, thanks for this detailed feedback - it's very helpful to have
> > someone with a fresh perspective look at the day-1 experience for TripleO,
> > and while some of what follows are "know issues", it's great to get some
> > perspective on them, as well as ideas re how we might improve things.
> >
> > On Thu, Sep 15, 2016 at 09:09:24AM -0600, Alex Schultz wrote:
> >> Hi all,
> >>
> >> I've recently started looking at the various methods for deploying and
> >> developing tripleo.  What I would like to bring up is the current
> >> combination of the tooling for managing the VM instances and the
> >> actual deployment method to launch the undercloud/overcloud
> >> installation.  While running through the various methods and reading
> >> up on the documentation, I'm concerned that they are not currently
> >> flexible enough for a developer (or operator for that matter) to be
> >> able to setup the various environment configurations for testing
> >> deployments and doing development.  Additionally I ran into issues
> >> just trying get them working at all so this probably doesn't help when
> >> trying to attract new contributors as well.  The focus of this email
> >> and of my experience seems to relate with workflow-simplification
> >> spec[0].  I would like to share my experiences with the various
> >> tooling available and raise some ideas.
> >>
> >> Example Situation:
> >>
> >> For example, I have a laptop with 16G of RAM and an SSD and I'd like
> >> to get started with tripleo.  How can I deploy tripleo?
> >
> > So, this is probably problem #1, because while I have managed to deploy a
> > minimal TripleO environment on a laptop with 16G of RAM, I think it's
> > pretty widely known that it's not really enough (certainly with our default
> > configuration, which has unfortunately grown over time as more and more
> > things got integrated).
> >
> > I see two options here:
> >
> > 1. Document the reality (which is really you need a physical machine with
> > at least 32G RAM unless you're prepared to deal with swapping).
> >
> > 2. Look at providing a "TripleO lite" install option, which disables some
> > services (both on the undercloud and default overcloud install).
> >
> > Either of these are defintely possible, but (2) seems like the best
> > long-term solution (although it probably means another CI job).
> >
> >> Tools:
> >>
> >> instack:
> >>
> >> I started with the tripleo docs[1] that reference using the instack
> >> tools for virtual environment creation while deploying tripleo.   The
> >> docs say you need at least 12G of RAM[2].  The docs lie (step 7[3]).
> >> So after basically shutting everything down and letting it deploy with
> >> all my RAM, the deployment fails because the undercloud runs out of
> >> RAM and OOM killer kills off heat.  This was not because I had reduced
> >> the amount of ram for the undercloud node or anything.  It was because
> >> by default, 6GB of RAM with no swap is configured for the undercloud
> >> (not sure if this is a bug?).  So I added a swap file to the
> >> undercloud and continued. My next adventure was having the overcloud
> >> deployment fail because lack of memory as puppet fails trying to spawn
> >> a process and gets denied.  The instack method does not configure swap
> >> for the VMs that are deployed and the deployment did not work with 5GB
> >> RAM for each node.  So for a full 16GB I was unable to follow the
> >> documentation and use instack to successfully deploy.  At this point I
> >> switched over to trying to use tripleo-quickstart.  Eventually I was
> >> able to figure out a configuration with instack to get it to deploy
> >> when I figured out how to enable swap for the overcloud deployment.
> >
> > Yeah, so this definitely exposes that we need to update the docs, and also
> > provide an easy install-time option to enable swap on all-the-things for
> > memory contrained environments.
> >
> >> tripleo-quickstart:
> >>
> >> The next thing I attempted to use was the tripleo-quickstart[4].
> >> Following the directions I attempted to deploy against my localhost.
> >> I turns out that doesn't work as expected since ansible likes to do
> >> magic when dealing with localhost[5].  Ultimately I was unable to get
> >> it working against my laptop locally because I ran into some libvirt
> >> issues.  But I was able to get it to work when I pointed it at a
> >> separate machine.  It should be noted that tripleo-quickstart creates
> >> an undercloud with swap which was nice because then it actually works,
> >> but is an inconsistent experience depending on which tool you used for
> >> your deployment.
> >
> > Yeah, so while a lot of folks have good luck with tripleo-quickstart, it
> > has the disadvantage of not currently being the tool used in upstream
> > TripleO CI (which folks have looked at fixing, but it's not yet happened).
> >
> > The original plan was for tripleo-quickstart to completely replace the
> > instack-virt-setup workflow:
> >
> > https://blueprints.launchpad.net/tripleo/+spec/tripleo-quickstart
> >
> > But for a variety of reasons, we never quite got to that - we may need a
> > summit discussion on the path forward here.
> >
> > For me (as an upstream developer) it really boils down to the CI usage
> > issue - at all times I want to use the tool which gets me closest to what
> > runs in upstream CI (which although we actually use instack-virt-setup, we
> > otherwise follow the tripleo-docs procedure pretty closely, using a helper
> > script called tripleo.sh, which you can run locally):
> >
> > http://paste.fedoraproject.org/431073/30480114
> >
> This kind of feels like quickstart FUD to me.
>
> CI does not run tripleo.sh. So the paste above is not actually
> reproducing any tripleo-ci job. tripleo-ci runs a wrapper around
> tripleo.sh[1] with specific ENV variables set in a different script[2],
> and then some required steps to actually make that work in yet another
> script[3].
>
> Further, if you replaced the instack-virt-setup line in the above script
> with a run of tripleo-quickstart, the other steps could still be run on
> top of a quickstart undercloud.
>
> [1]
> https://github.com/openstack-infra/tripleo-ci/blob/master/scripts/deploy.sh
> [2]
> https://github.com/openstack-infra/tripleo-ci/blob/master/toci_gate_test.sh#L107-L200
> [3]
> https://github.com/openstack-infra/tripleo-ci/blob/master/toci_instack.sh#L180-L182


Sounds like we need to document and communicate better on what
actually constitutes reproducing a CI environment failure. I'm
somewhat aware that there exists ways to do this but not sure on the
details. Also if we wish to use the CI scripts as the standard for
environment deployment, those should be documented as the primary way
to setup an environment.  I don't think it really matters which
tool(s) is under the covers so long as it everyone is doing things in
a somewhat consistent fashion.

>
>
>
> >> Thoughts:
> >>
> >> What these two methods showed me is that the deployment of tripleo is
> >> not exactly a foolproof thing and that there are a lot of assumptions
> >> that are being handled by the both of these tools.  My initial goal to
> >> start this conversation around tooling and workflows was to bring the
> >> idea of separation of the (virtual) environment configuration from the
> >> actual deployment of tripleo as well as identifying places for
> >> improvement as a way to speed up development and deployment testing.
> >> I believe there are a few reasons why this can be beneficial.
> >
> > Yep, I think this goal is uncontentious, and it's pretty much the original
> > aim of tripleo-quickstart.
> >
> >> The first reason is that as a developer, I would like to simplify the
> >> development environment creation process and be able to draw the line
> >> between environment and actual deployment tool.  By developing and
> >> documenting a working development/deployment workflow, we can simplify
> >> the onboarding experience as well as possibly accelerating the
> >> existing development processes by reducing the time spent messing with
> >> creating environments.  Does tripleo need to manage creation of VMs to
> >> deploy on? The answer is probably no.  As the end user will want to
> >> deploy tripleo on his or her gear, the focus for tripleo probably
> >> should be on improving that process.  Now this doesn't mean that we
> >> can't write stuff to do this, as it's important for development and
> >> testing.  I'm not sure this is a core part of what should be
> >> 'tripleo'.
> >
> > Yeah, agreed - the automation around setting up the VMs is really just a
> > convenience, and it's not really a core part of TripleO - any tool could be
> > used provided the VMs end up configured in the way we require.
> >
> >> Another reason why I think this is important is as we talk about
> >> creating different scenarios for CI[6] to improve testing, it would
> >> also be useful for a developer or qa engineer to be able to test
> >> different environmental configurations that would be more realistic of
> >> actual deployment scenarios without having to hunt down multiple
> >> machines or configure actual hardware networking.  For example,
> >> creating environments using multiple networks, changing NICs,
> >> providing different sized nodes based on roles, etc can all be done
> >> virtually.  While tripleo-quickstart has some of these options, it is
> >> mixed in with the tripleo deployment process and does not seem to
> >> align with being able to deploy tripleo in more real world networking
> >> or environmental scenarios.
> >
> > Yeah, so I think this is one reason why the tripleo-quickstart discussion
> > has sometimes proven tricky - the original spec was about replacing only
> > the virt-setup pieces, but there was subsequently some scope-creep.  I
> > think this is being adressed, but it'd be good to have folks working on
> > that chime in here.
> >
>
> Indeed, we have moved all of the ansible code for doing full deployments
> outside of tripleo-quickstart. There is still a bunch of CI helper
> scripts for RDO CI in tree, and a playbook that exercises the full
> deployment code, but I would like to move all of that out of the
> quickstart tree as well.
>
> We have also added the ability to consume the images produced by
> tripleo-ci rather than using images produced by RDO. This required
> adding the ability to use an overcloud-full image as an undercloud
> image, since tripleo-ci no longer produces one. I think this will
> actually allow us to stop producing an undercloud image downstream of
> TripleO as well, which means that those images could be produced using
> only methods from tripleo-docs.
>
> I also recently started to look at what it would be like to add a
> "virt-setup" option to tripleo.sh that runs quickstart[4]. It is
> actually pretty simple for doing just the basic instack-virt-setup part.
>
> However, what developers actually want is to reproduce exactly what runs
> in CI. With our current CI architecture it is a bit harder to do that
> cleanly in an external tool, but I actually was able to get a minimal
> POC of it working[5]. It requires quite a bit of hacky stuff to make
> tripleo-ci deploy.sh work[6], but I think it shines a light on what we
> need to improve to make tripleo-ci externally consumable. It does seem
> like we could get there in under 5 patches though, which is less than I
> thought when starting on the POC.
>
> The one piece not addressed in either POC patch is building changes
> under test using DLRN. There is a function in tripleo.sh that does this
> for us in CI, and it can be leveraged for the developer use case as
> well. There is also an ansible role in RDO[7] that could be used for
> that purpose. The ansible role has quite a few more features than
> tripleo.sh, including multi-gerrit support and allowing to build
> packaging changes along with the code changes. It also has a hook in the
> tripleo-quickstart code to inject the DLRN repo created into the
> overcloud image before we ever boot anything, so from a quickstart
> perspective it is quite a bit nicer.


To me, some of the 'features' aren't clearly explained in
documentation so that a user picking up the tool are aware they
exists.  My primary goal behind this email was to get a conversation
started around some of the deficiencies. Things working with DLRN/RDO
are probably necessary to work with tripleo as they are source for
packages, but I think they need to be captured and identified as
proper dependencies for working with/developing tripleo and to help
people understand how one relies on the other. The reason my POC patch
didn't include this is because as someone starting with this project,
I'm not fully aware of this use case nor have I had to do this yet.

>
>
> [4] https://review.openstack.org/371587
> [5] https://review.openstack.org/374116
> [6]
> https://review.openstack.org/#/c/374116/1/roles/libvirt/setup/undercloud/templates/complete_deploy.sh.j2
> [7] https://github.com/redhat-openstack/ansible-role-tripleo-gate
>
> >> Since there are a bunch of assumptions baked into the existing
> >> development scripts, I would say the current approach is more 'it
> >> works in devstack' than 'it works for the end user'.  This is not to
> >> say the currently tools don't have their uses as they currently work
> >> for the existing CI setup and for many developers today.  I think we
> >> can do better if we draw clearer lines between what is tripleo and
> >> what is something that is environmental and a supporting tool.
> >
> > I'm not sure I agree here - after you have your virt stuff setup, the
> > TripleO pieces which do the deployment are identical to those used by the
> > end user (unlike Devstack where it's likely the entire environment has been
> > configured using a different tool to production deployments).
> >
> >> Ideas:
> >>
> >> As part of bringing something to get the conversation started and to
> >> better understand how things work, I spent about two days coming up
> >> with a PoC[7] for a workflow that splits the environment creation,
> >> configuration, and management out from the actual deployment of the
> >> undercloud/overcloud.  Having previously used other tools for managing
> >> environments for deploying openstack, I thought I'd try deploying
> >> tripleo using something I was familiar with, fuel-devops[8].  The
> >> whole point of fuel-devops is to be able to create and manage entire
> >> virtual environments (and their networking configuration) on a given
> >> host.  With this, I was able to create my environment setup in a yaml
> >> file[9] which would then be able to be reproduced.  So with this tool,
> >> I'm able to create a number nodes of a given memory, disk, network
> >> configuration as part of an 'environment'.  This environment is
> >> completely separated from another environment which means given a
> >> large enough virtual host, I could have multiple tripleo deployments
> >> occurring simultaneously on their own networks.  This is a nice
> >> feature, but just an added bonus to the tool (along with snapshotting
> >> and a few other nifty things).  The bigger feature here is that this
> >> is more representative of what someone using tripleo is going to
> >> experience. They are going to have their environment already
> >> configured and would like to deploy tripleo on it.  Once the
> >> environment was created, I started to understand what it would be like
> >> for an end user to take an undercloud image and deploy it.
> >> Fortunately because we're still dealing with VMs, you can just point
> >> the undercloud node at the undercloud image itself[10] for testing
> >> purposes.
>
> I think this could be knowing one tool better than another.
> tripleo-quickstart can be run in such a way as it only does the
> environment setup part. And it takes a yaml config file that describes
> that environment. Further, it deploys the VMs using qemu:///session
> inside a non-root user, so they would be totally segregated from VMs
> using qemu:///system or another non-root user. There may be some small
> amount of effort to actually use quickstart to deploy 2 environments on
> the same host. I have not tried that, as on a 32G virthost getting a
> single realistic deployment (ie HA w/ ceph) is challenging enough.
> However, I do run a couple of utility VMs (irc bouncer etc.) on the same
> virthost that I do quickstart deploys on, and quickstart does not ever
> interfere.
>

Yes, I believe I was able to do this because of my familiarity with
one tool over the other. But as I've previously mentioned, some of the
features you're referring to are not really spelled out for an end
user to be able to come in and leverage right at the start.  With the
POC I'm was trying to point out the separation of environment creation
from deployment tooling opens up additional features and can help
bring clarity to what should and should not be a part of tripleo.

>
> I actually thought it would be neat to add an opposite feature, where a
> user could pair two smaller virthosts and deploy VMs across them. I
> havent spent any time looking at how difficult that would be to
> implement though.


Yes this would be useful for people with a bunch of older gear on
hand.  Of course I think this is where OVB might be a better fit.

>
>
> I do think snapshotting would be a really great feature for
> tripleo-quickstart, and could be something worth looking at for Ocata.
> >
> > This looks very interesting, thanks for sharing! :)
> >
> > That said, my main concern if we go this way is we'd end up with three ways
> > to do a virt setup vs the current two ;)
> >
> > Definitely worthy of broader discussion though, particularly in the context
> > of this vs the ansible based tripleo-quickstart.
> >
> >> Once the environment exists, it starts exposing what exactly it means
> >> to deploy a tripleo undercloud/overcloud.  The majority of the effort
> >> I had to expend for this PoC was actually related to the construction
> >> of the instackenv.json to load the overcloud nodes into ironic.  As
> >> mentioned in the workflow-simplification spec[0], this is a known
> >> limitation and there are possible solutions and I think this is
> >> important of the end user experience when trying to work with tripleo.
> >> It should be noted that I managed to get the undercloud and
> >> controller/compute deployed (but eating into VM swap space) in 12GB on
> >> my laptop.  This was something I was unable to do with either instack
> >> or tripleo-quickstart.
>
> Again, I think this could be knowing one tool better than another.
> tripleo-quickstart can tune CPU and memory for each VM[8]. Though it
> looks like we could document that better. What is in the defaults and
> config files is what we have found to work well on 32G virthosts.
>

So the issue I have with that configuration is that it has assumptions
about roles and classifications for the environment building.  If you
took a look at the POC code, I'm creating a bunch of nodes and really
only two classifications (undercloud and a bunch of slave nodes).
Technically I shouldn't need to make that designation either, but I
did for the sake of this example.  The example I provided could be
tweaked to create as many nodes as I want with whatever memory
configuration independent of the tripleo designation. The point being
that the environment tooling is unaware of what's actually being
deployed. Environment creation is generic and not specifically tied to
tripleo.  This allows for future reuse by other projects or as tripleo
evolves and needs different node types for containers or whatever. In
the case of quickstart that means I'll need to now modify it to
understand what a container node looks like as opposed to just having
a generic set of nodes that tripleo should determine how/where to work
with.  This is the idea I'm really trying to convey, less intertwined
code and more generic building blocks that can be combined for
expandable solutions.  That way we as well as the rest of the
openstack/opensource community can benefit from the code that is being
written.


>
> I will say I would be pretty surprised if the "cloud in 12GB" could do
> any post-deploy validation without falling over.

Yes but if I'm attempting to do a quick bug fix or small feature
development, I don't need a rally run or a full blown tempest run.
>From my view point of working on the deployment process, a successful
deployment is my first goal.  Considering I'm starting to run into
problems just even doing the deployment (and no validation), this was
an achievement.

Thanks,
-Alex

>
>
> [8]
> https://github.com/openstack/tripleo-quickstart/blob/master/roles/common/defaults/main.yml#L15-L56
>
> >
> > So, I'm a little unclear here, presumably the actual RAM usage was the
> > same, so is this just because you were able to easily configure swap on all
> > the VMs?
> >
> >> There are some short coming with this particular tool choice. My
> >> understanding is that fuel-devops is still limited to managing a
> >> single host.  So you don't use it against remote nodes, but it is good
> >> if you have decently sized physical machine or want to work locally.
> >> I ran into issues with network configurations and pxe booting, but I
> >> have a feeling that's more of a bug in libvirt and my lack of time to
> >> devote to undercloud setup.  So it's not perfect, but it does show off
> >> the basics of the concept.  Overall I think clearly defining the
> >> tripleo installation process from the environment configuration is an
> >> important step for end user usability and even developer workflows.
> >
> > I think the multi-node use-case is mostly handled via OVB[1] now which is
> > where you basically use an OpenStack cloud to host the VMs used for a
> > TripleO deployment (yes, that is OpenStack on OpenStack on OpenStack).
> >
> > We're using that in CI and it works pretty well, so I think the main gap is
> > a super-easy day-1 workflow that allows users/developers to get up and
> > running easily on a single node (as mentioned above tho, quickstart was
> > aimed at closing this gapm and has been working well for a lot of folks).
> >
> > Thanks for the feedback - defintely more here we can discuss and hopefully
> > refine into actionable bugs/specs/patches! :)
> >
> > Steve
> >
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list