[openstack-dev] [TripleO] Austin summit - session recap/summary

Paul Belanger pabelanger at redhat.com
Thu May 19 16:34:42 UTC 2016


On Thu, May 19, 2016 at 03:50:15PM +0100, Derek Higgins wrote:
> On 18 May 2016 at 13:34, Paul Belanger <pabelanger at redhat.com> wrote:
> > On Wed, May 18, 2016 at 12:22:55PM +0100, Derek Higgins wrote:
> >> On 6 May 2016 at 14:18, Paul Belanger <pabelanger at redhat.com> wrote:
> >> > On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:
> >> >> Hi all,
> >> >>
> >> >> Some folks have requested a summary of our summit sessions, as has been
> >> >> provided for some other projects.
> >> >>
> >> >> I'll probably go into more detail on some of these topics either via
> >> >> subsequent more focussed threads an/or some blog posts but what follows is
> >> >> an overview of our summit sessions[1] with notable actions or decisions
> >> >> highlighted.  I'm including some of my own thoughts and conclusions, folks
> >> >> are welcome/encouraged to follow up with their own clarifications or
> >> >> different perspectives :)
> >> >>
> >> >> TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:
> >> >>
> >> >> -------------------------------------
> >> >> Upgrades - current status and roadmap
> >> >> -------------------------------------
> >> >>
> >> >> In this session we discussed the current state of upgrades - initial
> >> >> support for full major version upgrades has been implemented, but the
> >> >> implementation is monolithic, highly coupled to pacemaker, and inflexible
> >> >> with regard to third-party extraconfig changes.
> >> >>
> >> >> The main outcomes were that we will add support for more granular
> >> >> definition of the upgrade lifecycle to the new composable services format,
> >> >> and that we will explore moving towards the proposed lightweight HA
> >> >> architecture to reduce the need for so much pacemaker specific logic.
> >> >>
> >> >> We also agreed that investigating use of mistral to drive upgrade workflows
> >> >> was a good idea - currently we have a mixture of scripts combined with Heat
> >> >> to drive the upgrade process, and some refactoring into discrete mistral
> >> >> workflows may provide a more maintainable solution.  Potential for using
> >> >> the existing SoftwareDeployment approach directly via mistral (outside of
> >> >> the heat templates) was also discussed as something to be further
> >> >> investigated and prototyped.
> >> >>
> >> >> We also touched on the CI implications of upgrades - we've got an upgrades
> >> >> job now, but we need to ensure coverage of full release-to-release upgrades
> >> >> (not just commit to commit).
> >> >>
> >> >> -------------------------------
> >> >> Containerization status/roadmap
> >> >> -------------------------------
> >> >>
> >> >> In this session we discussed the current status of containers in TripleO
> >> >> (which is to say, the container based compute node which deploys containers
> >> >> via Heat onto an an Atomic host node that is also deployed via Heat), and
> >> >> what strategy is most appropriate to achieve a fully containerized TripleO
> >> >> deployment.
> >> >>
> >> >> Several folks from Kolla participated in the session, and there was
> >> >> significant focus on where work may happen such that further collaboration
> >> >> between communities is possible.  To some extent this discussion on where
> >> >> (as opposed to how) proved a distraction and prevented much discussion on
> >> >> supportable architectural implementation for TripleO, thus what follows is
> >> >> mostly my perspective on the issues that exist:
> >> >>
> >> >> Significant uncertainty exists wrt integration between Kolla and TripleO -
> >> >> there's largely consensus that we want to consume the container images
> >> >> defined by the Kolla community, but much less agreement that we can
> >> >> feasably switch to the ansible-orchestrated deployment/config flow
> >> >> supported by Kolla without breaking many of our primary operator interfaces
> >> >> in a fundamentally unacceptable way, for example:
> >> >>
> >> >> - The Mistral based API is being implemented on the expectation that the
> >> >>   primary interface to TripleO deployments is a parameters schema exposed
> >> >>   by a series of Heat templates - this is no longer true in a "split stack"
> >> >>   model where we have to hand off to an alternate service orchestration tool.
> >> >>
> >> >> - The tripleo-ui (based on the Mistral based API) consumes heat parameter
> >> >>   schema to build it's UI, and Ansible doesn't support the necessary
> >> >>   parameter schema definition (such as types and descriptions) to enable
> >> >>   this pattern to be replicated.  Ansible also doesn't provide a HTTP API,
> >> >>   so we'd still have to maintain and API surface for the (non python) UI to
> >> >>   consume.
> >> >>
> >> >> We also discussed ideas around integration with kubernetes (a hot topic on
> >> >> the Kolla track this summit), but again this proved inconclusive beyond
> >> >> that yes someone should try developing a PoC to stimulate further
> >> >> discussion.  Again, significant challenges exist:
> >> >>
> >> >> - We still need to maintain the Heat parameter interfaces for the API/UI,
> >> >>   and there is also a strong preference to maintain puppet as a tool for
> >> >>   generating service configuration (so that existing operator integrations
> >> >>   via puppet continue to function) - this is a barrier to directly
> >> >>   consuming the kolla-kubernetes effort directly.
> >> >>
> >> >> - A COE layer like kubernetes is a poor fit for deployments where operators
> >> >>   require strict control of service placement (e.g exactly which nodes a service
> >> >>   runs on, IP address assignments to specific nodes etc) - this is already
> >> >>   a strong requirement for TripleO users and we need to figure out if/how
> >> >>   it's possible to control container placement per node/namespace.
> >> >>
> >> >> - There are several uncertainties regarding the HA architecture, such as
> >> >>   how do we achieve fencing for nodes (which is currently provided via
> >> >>   pacemaker), in particular the HA model for real production deployments
> >> >>   via kubernetes for stateful services such as rabbit/galera is unclear.
> >> >>
> >> >> Overall a session with much discussion, but further prototyping and
> >> >> discussion is required before we can define a definitive implementation
> >> >> strategy (several folks are offering to be involved in this).
> >> >>
> >> >> ---------------------------------------------
> >> >> Work session (Composable Services and beyond)
> >> >> ---------------------------------------------
> >> >>
> >> >> In this session we discussed the status of the currently in-progress work
> >> >> to decompose our monolithic manifests into per-service profiles[3] in
> >> >> puppet-tripleo, then consume these profiles via per-service templates in
> >> >> tripleo-heat-templates[4][5], and potential further work to enable fully
> >> >> composable (including user defined) roles.
> >> >>
> >> >> Overall there was agreement that the composable services work and puppet
> >> >> refactoring are going well, but that we need to improve velocity and get
> >> >> more reviewers helping to land the changes.  There was also agreement that
> >> >> a sub-team should form temporarily to drive the remaining work[6], that
> >> >> we should not land any new features in the "old" template architecture and
> >> >> relatedly that tripleo cores should help rebase and convert currently
> >> >> under-review changes to the new format where needed to ease the transition.
> >> >>
> >> >> I described a possible approach to providing fully composable roles that
> >> >> uses some template pre-processing (via jinja2)[7], a blueprint and initial
> >> >> implementation will be posted soon, but overall the response was positive,
> >> >> and it may provide a workable path to fully composable roles that won't
> >> >> break upgrades of existing deployments.
> >> >>
> >> >> ---------------------------------
> >> >> Work session (API and TripleO UI)
> >> >> ---------------------------------
> >> >>
> >> >> In this session we disccussed the current status of the TripleO UI, and the
> >> >> Mistral based API implementation it depends on.
> >> >>
> >> >> Overall it's clear there is a lot of good progress in this area, but there
> >> >> are some key areas which require focus and additional work to enable a
> >> >> fully functional upstream TripleO UI:
> >> >>
> >> >> - The undercloud requires some configuration changes to enable the UI
> >> >>   necessary access to the undercloud services
> >> >>
> >> >> - The UI currently depends on the previous prototype API implementation,
> >> >>   and must be converted to the new Mistral based API (in-progress)
> >> >>
> >> >> - We need to improve velocity of the Mistral based implementation (need
> >> >>   more testing and reviewing), such that we can land it and folks can start
> >> >>   integrating with it.
> >> >>
> >> >> - There was agreement that the previously proposed validation API can be
> >> >>   implemented as another Mistral action, which will provide a way to run
> >> >>   validation related to the undercloud configuration/state.
> >> >>
> >> >> - There are some features we could add to Heat which would make
> >> >>   implementation cleaner (description/metadata in environment files, enable
> >> >>   multiple parameter groups.
> >> >>
> >> >> The session concluded with some discussion around the requirements related
> >> >> to network configuration.  Currently the templates offer considerable
> >> >> flexibility in this regard, and we need to decide how this is surfaced via
> >> >> the API such that it's easily consumable via TripleO Ux interfaces.
> >> >>
> >> >> -----------------------------------
> >> >> Work session (Reducing the CI pain)
> >> >> -----------------------------------
> >> >>
> >> >> This session covered a few topics, but mostly ended up focussed on the
> >> >> debate with infra regarding moving to 3rd party CI.  There are arguments on
> >> >> both sides here, and I'll perhaps let derekh or dprince reply with a more
> >> >> detailed discussion of them, but suffice to say there wasn't a clear
> >> >> conclusion, and discussion is ongoing.
> >> >>
> >> > It was mostly me pushing for tripleo to move to 3rd party CI.  I still think it
> >> > is the right place for tripleo however after hearing dprince's concerns I think
> >> > we have a compromise for the moment. I've gone a head and done the work to
> >> > upgrade tripleo-ci jenkins slave from Fedora-22 to the centos-7 DIB[1] produced by
> >> > openstack-infra. Please take a moment to review the patch as it exposed 3
> >> > issues.
> >> >
> >> > 1) CentOS 7 does not support nbd out of the box, and we can't compile a new
> >> > kernel ATM. So, I've worked around the problem by converting the qcow2 image to
> >> > raw format, update instack and reconverted it back to qcow2.  Ideally, if I can
> >> > find where the instack.qcow2 image is build, we also produce a raw format so we
> >> > don't have to do this every gate job.
> >>
> >> The conversion should be ok for the moment to allow use to make
> >> progress, longer term
> >> we'll probably need to change the libvirt domain definitions on the
> >> testenvs in order to
> >> be able to just generate and use a raw format.
> >>
> >> >
> >> > 2) Jenkins slave needs more HDD space. Using centos-7 we cache data to the slave
> >> > now, mostly packages and git repos.  As a result the HDD starts at 7.5GB and
> >> > because the current slaves use 20GB we quickly run out of space.  Ideally we
> >> > need 80GB[2] of space to be consistent with the other cloud provides we run
> >> > jenkins slaves on.
> >>
> >> This is where we'll likely hit the biggest problems, In order to bump
> >> the disk space allocated to the jenkins slaves and to simultaneously
> >> take advantage of the SSD's we're going to have to look into using the
> >> SSD's as a cache for the spinning disks. I havn't done this before but
> >> I hope we can look into it soon.
> >>
> >> >
> >> > 3) No AFS mirror in tripleo-ci[3]. To take advantage of the new centos-7 dib,
> >> > openstack-infra has an AFS mirroring infrastructure in place.  As a result,
> >> > we'll also need to launch one in tripleo-ci.  For the moment, I've disabled the
> >> > logic to configure the mirror.  Mirrors include pypi, npm, wheel, ubuntu trusty,
> >> > ubuntu precise, ceph.  We are bringing RPM mirrors online shortly.
> >>
> >> I'm not sure we'll get as much a benefit from this as the devstack
> >> based jobs do, as is some of the mirrors you mention wouldn't be used
> >> at all while others we would only make very light use of. Is it
> >> possible to selectively add mirrors to the AFS mirror, or add
> >> additional things that tripleo would be interested in? e.g. image
> >> cache
> >>
> > I think you'll actually benefit from this, mostly because you no longer have to
> > run your own mirror / squid servers in tripleo.  The way AFS mirrors work is
> > more like a cache.
> >
> > Currently our AFS volumes in rax-dfw are over 1TB of data now, but since our
> > jobs only access a small fraction of the data, most mirror AFS servers are only
> > using about 5GB of data locally.
> >
> > In the case of tripleo, it will even be less since you are not running the full
> > suite of job in your cloud.
> >
> > Right now, nothing would need to chance to selectively use mirrors, because
> > AFS will only cache what is used.  As for adding things specific to tripleo, it
> > could be possible, it is also possible other jobs will likely need the same bits
> > too.
> >
> > I strongly encourage us to setup an AFS mirror.
> 
> Ok, I'm still a little skeptical because our biggest bandwidth hogs
> arn't mentioned in the list of things mirrored , but that's not a good
> reason to get in your way, if it proves to be a help then great, if
> not at least we tried, so what do you need from me to try it out? If I
> create a d1.medium trusty instance with a floating IP, will that work
> for you? This should allow you to test things for now, longer term
> were going to have the same problems we do with larger jenkins
> instance so until we solve this we wont be able to consider this a
> permanent part of the infrastructure.
> 
I just need to know the flavor we are using, I'll be using our
opentack-infra/system-config launch-node script to provision the server.  Since
we need to loop it into our ansible / puppet wheel.

If you are okay with d1.medium for now, I can start it.

> >
> >> >
> >> > I'd really like to get some feedback on these 3 issue, I know they might not be
> >> > solved today because of the hardware move.  However, I think we are pretty close
> >> > now to getting triplo-ci more inline with some of the openstack-infra tooling.
> >> >
> >> > [1] https://review.openstack.org/#/c/312725/
> >> > [2] https://review.openstack.org/#/c/312992/
> >> > [3] https://review.openstack.org/#/c/312058/
> >> >
> >> >> The other output from this session was agreement that we'd move our jobs to
> >> >> a different cloud (managed by the RDO community) ahead of a planned
> >> >> relocation of our current hardware.  This has advantages in terms of
> >> >> maintenance overhead, and if it all goes well we can contribute our
> >> >> hardware to this cloud long term vs maintaining our own infrastructure.
> >> >>
> >> >>
> >> >> Overall it was an excellent week, and I thank all the session participants
> >> >> for their input and discussion.  Further notes can be found in the
> >> >> etherpads linked from [1] but feel free to reply if specific items require
> >> >> clarification (and/or I've missed anything!)
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Steve
> >> >>
> >> >> [1] https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
> >> >> [2] https://review.openstack.org/#/c/299628/
> >> >> [3] https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
> >> >> [4] https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
> >> >> [5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
> >> >> [6] http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
> >> >> [7] http://paste.fedoraproject.org/360836/87416814/
> >> >>
> >> >> __________________________________________________________________________
> >> >> OpenStack Development Mailing List (not for usage questions)
> >> >> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> >
> >> > __________________________________________________________________________
> >> > OpenStack Development Mailing List (not for usage questions)
> >> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >> __________________________________________________________________________
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list