[openstack-dev] [TripleO] Austin summit - session recap/summary

Paul Belanger pabelanger at redhat.com
Mon May 9 20:50:19 UTC 2016


On Fri, May 06, 2016 at 09:18:03AM -0400, Paul Belanger wrote:
> On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:
> > Hi all,
> > 
> > Some folks have requested a summary of our summit sessions, as has been
> > provided for some other projects.
> > 
> > I'll probably go into more detail on some of these topics either via
> > subsequent more focussed threads an/or some blog posts but what follows is
> > an overview of our summit sessions[1] with notable actions or decisions
> > highlighted.  I'm including some of my own thoughts and conclusions, folks
> > are welcome/encouraged to follow up with their own clarifications or
> > different perspectives :)
> > 
> > TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:
> > 
> > -------------------------------------
> > Upgrades - current status and roadmap
> > -------------------------------------
> > 
> > In this session we discussed the current state of upgrades - initial
> > support for full major version upgrades has been implemented, but the
> > implementation is monolithic, highly coupled to pacemaker, and inflexible
> > with regard to third-party extraconfig changes.
> > 
> > The main outcomes were that we will add support for more granular
> > definition of the upgrade lifecycle to the new composable services format,
> > and that we will explore moving towards the proposed lightweight HA
> > architecture to reduce the need for so much pacemaker specific logic.
> > 
> > We also agreed that investigating use of mistral to drive upgrade workflows
> > was a good idea - currently we have a mixture of scripts combined with Heat
> > to drive the upgrade process, and some refactoring into discrete mistral
> > workflows may provide a more maintainable solution.  Potential for using
> > the existing SoftwareDeployment approach directly via mistral (outside of
> > the heat templates) was also discussed as something to be further
> > investigated and prototyped.
> > 
> > We also touched on the CI implications of upgrades - we've got an upgrades
> > job now, but we need to ensure coverage of full release-to-release upgrades
> > (not just commit to commit).
> > 
> > -------------------------------
> > Containerization status/roadmap
> > -------------------------------
> > 
> > In this session we discussed the current status of containers in TripleO
> > (which is to say, the container based compute node which deploys containers
> > via Heat onto an an Atomic host node that is also deployed via Heat), and
> > what strategy is most appropriate to achieve a fully containerized TripleO
> > deployment.
> > 
> > Several folks from Kolla participated in the session, and there was
> > significant focus on where work may happen such that further collaboration
> > between communities is possible.  To some extent this discussion on where
> > (as opposed to how) proved a distraction and prevented much discussion on
> > supportable architectural implementation for TripleO, thus what follows is
> > mostly my perspective on the issues that exist:
> > 
> > Significant uncertainty exists wrt integration between Kolla and TripleO -
> > there's largely consensus that we want to consume the container images
> > defined by the Kolla community, but much less agreement that we can
> > feasably switch to the ansible-orchestrated deployment/config flow
> > supported by Kolla without breaking many of our primary operator interfaces
> > in a fundamentally unacceptable way, for example:
> > 
> > - The Mistral based API is being implemented on the expectation that the
> >   primary interface to TripleO deployments is a parameters schema exposed
> >   by a series of Heat templates - this is no longer true in a "split stack"
> >   model where we have to hand off to an alternate service orchestration tool.
> > 
> > - The tripleo-ui (based on the Mistral based API) consumes heat parameter
> >   schema to build it's UI, and Ansible doesn't support the necessary
> >   parameter schema definition (such as types and descriptions) to enable
> >   this pattern to be replicated.  Ansible also doesn't provide a HTTP API,
> >   so we'd still have to maintain and API surface for the (non python) UI to
> >   consume.
> > 
> > We also discussed ideas around integration with kubernetes (a hot topic on
> > the Kolla track this summit), but again this proved inconclusive beyond
> > that yes someone should try developing a PoC to stimulate further
> > discussion.  Again, significant challenges exist:
> > 
> > - We still need to maintain the Heat parameter interfaces for the API/UI,
> >   and there is also a strong preference to maintain puppet as a tool for
> >   generating service configuration (so that existing operator integrations
> >   via puppet continue to function) - this is a barrier to directly
> >   consuming the kolla-kubernetes effort directly.
> > 
> > - A COE layer like kubernetes is a poor fit for deployments where operators
> >   require strict control of service placement (e.g exactly which nodes a service
> >   runs on, IP address assignments to specific nodes etc) - this is already
> >   a strong requirement for TripleO users and we need to figure out if/how
> >   it's possible to control container placement per node/namespace.
> > 
> > - There are several uncertainties regarding the HA architecture, such as
> >   how do we achieve fencing for nodes (which is currently provided via
> >   pacemaker), in particular the HA model for real production deployments
> >   via kubernetes for stateful services such as rabbit/galera is unclear.
> > 
> > Overall a session with much discussion, but further prototyping and
> > discussion is required before we can define a definitive implementation
> > strategy (several folks are offering to be involved in this).
> > 
> > ---------------------------------------------
> > Work session (Composable Services and beyond)
> > ---------------------------------------------
> > 
> > In this session we discussed the status of the currently in-progress work
> > to decompose our monolithic manifests into per-service profiles[3] in
> > puppet-tripleo, then consume these profiles via per-service templates in
> > tripleo-heat-templates[4][5], and potential further work to enable fully
> > composable (including user defined) roles.
> > 
> > Overall there was agreement that the composable services work and puppet
> > refactoring are going well, but that we need to improve velocity and get
> > more reviewers helping to land the changes.  There was also agreement that
> > a sub-team should form temporarily to drive the remaining work[6], that
> > we should not land any new features in the "old" template architecture and
> > relatedly that tripleo cores should help rebase and convert currently
> > under-review changes to the new format where needed to ease the transition.
> > 
> > I described a possible approach to providing fully composable roles that
> > uses some template pre-processing (via jinja2)[7], a blueprint and initial
> > implementation will be posted soon, but overall the response was positive,
> > and it may provide a workable path to fully composable roles that won't
> > break upgrades of existing deployments.
> > 
> > ---------------------------------
> > Work session (API and TripleO UI)
> > ---------------------------------
> > 
> > In this session we disccussed the current status of the TripleO UI, and the
> > Mistral based API implementation it depends on.
> > 
> > Overall it's clear there is a lot of good progress in this area, but there
> > are some key areas which require focus and additional work to enable a
> > fully functional upstream TripleO UI:
> > 
> > - The undercloud requires some configuration changes to enable the UI
> >   necessary access to the undercloud services
> > 
> > - The UI currently depends on the previous prototype API implementation,
> >   and must be converted to the new Mistral based API (in-progress)
> > 
> > - We need to improve velocity of the Mistral based implementation (need
> >   more testing and reviewing), such that we can land it and folks can start
> >   integrating with it.
> > 
> > - There was agreement that the previously proposed validation API can be
> >   implemented as another Mistral action, which will provide a way to run
> >   validation related to the undercloud configuration/state.
> > 
> > - There are some features we could add to Heat which would make
> >   implementation cleaner (description/metadata in environment files, enable
> >   multiple parameter groups.
> > 
> > The session concluded with some discussion around the requirements related
> > to network configuration.  Currently the templates offer considerable
> > flexibility in this regard, and we need to decide how this is surfaced via
> > the API such that it's easily consumable via TripleO Ux interfaces.
> > 
> > -----------------------------------
> > Work session (Reducing the CI pain)
> > -----------------------------------
> > 
> > This session covered a few topics, but mostly ended up focussed on the
> > debate with infra regarding moving to 3rd party CI.  There are arguments on
> > both sides here, and I'll perhaps let derekh or dprince reply with a more
> > detailed discussion of them, but suffice to say there wasn't a clear
> > conclusion, and discussion is ongoing.
> > 
> It was mostly me pushing for tripleo to move to 3rd party CI.  I still think it
> is the right place for tripleo however after hearing dprince's concerns I think
> we have a compromise for the moment. I've gone a head and done the work to
> upgrade tripleo-ci jenkins slave from Fedora-22 to the centos-7 DIB[1] produced by
> openstack-infra. Please take a moment to review the patch as it exposed 3
> issues.
> 
I could use another set of eyes on patch [1] below.  I've done a few rechecks
now and cannot get tripleo-ci to pass consistently.  It appears I'm running into
random timeouts.

> 1) CentOS 7 does not support nbd out of the box, and we can't compile a new
> kernel ATM. So, I've worked around the problem by converting the qcow2 image to
> raw format, update instack and reconverted it back to qcow2.  Ideally, if I can
> find where the instack.qcow2 image is build, we also produce a raw format so we
> don't have to do this every gate job.
> 
> 2) Jenkins slave needs more HDD space. Using centos-7 we cache data to the slave
> now, mostly packages and git repos.  As a result the HDD starts at 7.5GB and
> because the current slaves use 20GB we quickly run out of space.  Ideally we
> need 80GB[2] of space to be consistent with the other cloud provides we run
> jenkins slaves on.
> 
> 3) No AFS mirror in tripleo-ci[3]. To take advantage of the new centos-7 dib,
> openstack-infra has an AFS mirroring infrastructure in place.  As a result,
> we'll also need to launch one in tripleo-ci.  For the moment, I've disabled the
> logic to configure the mirror.  Mirrors include pypi, npm, wheel, ubuntu trusty,
> ubuntu precise, ceph.  We are bringing RPM mirrors online shortly.
> 
I had a chance to look into this today. To move forward, we'd need a static
VM setup, with public IPv4 address and about 100GB of HDD (more the better). We
don't need much memory (2GB) and single core, we are just serving HTTP traffic.
Lastly, it should be running ubuntu-trusty so our puppet manifests in
openstack-infra work correctly.

Is this something we could stand up this week?

> I'd really like to get some feedback on these 3 issue, I know they might not be
> solved today because of the hardware move.  However, I think we are pretty close
> now to getting triplo-ci more inline with some of the openstack-infra tooling.
> 
> [1] https://review.openstack.org/#/c/312725/
> [2] https://review.openstack.org/#/c/312992/
> [3] https://review.openstack.org/#/c/312058/
> 
> > The other output from this session was agreement that we'd move our jobs to
> > a different cloud (managed by the RDO community) ahead of a planned
> > relocation of our current hardware.  This has advantages in terms of
> > maintenance overhead, and if it all goes well we can contribute our
> > hardware to this cloud long term vs maintaining our own infrastructure.
> > 
> > 
> > Overall it was an excellent week, and I thank all the session participants
> > for their input and discussion.  Further notes can be found in the
> > etherpads linked from [1] but feel free to reply if specific items require
> > clarification (and/or I've missed anything!)
> > 
> > Thanks,
> > 
> > Steve
> > 
> > [1] https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
> > [2] https://review.openstack.org/#/c/299628/
> > [3] https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
> > [4] https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
> > [5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
> > [6] http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
> > [7] http://paste.fedoraproject.org/360836/87416814/
> > 
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list