[openstack-dev] [TripleO] Austin summit - session recap/summary

Derek Higgins derekh at redhat.com
Thu May 19 17:06:25 UTC 2016


On 19 May 2016 5:38 pm, "Paul Belanger" <pabelanger at redhat.com> wrote:
>
> On Thu, May 19, 2016 at 03:50:15PM +0100, Derek Higgins wrote:
> > On 18 May 2016 at 13:34, Paul Belanger <pabelanger at redhat.com> wrote:
> > > On Wed, May 18, 2016 at 12:22:55PM +0100, Derek Higgins wrote:
> > >> On 6 May 2016 at 14:18, Paul Belanger <pabelanger at redhat.com> wrote:
> > >> > On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:
> > >> >> Hi all,
> > >> >>
> > >> >> Some folks have requested a summary of our summit sessions, as
has been
> > >> >> provided for some other projects.
> > >> >>
> > >> >> I'll probably go into more detail on some of these topics either
via
> > >> >> subsequent more focussed threads an/or some blog posts but what
follows is
> > >> >> an overview of our summit sessions[1] with notable actions or
decisions
> > >> >> highlighted.  I'm including some of my own thoughts and
conclusions, folks
> > >> >> are welcome/encouraged to follow up with their own clarifications
or
> > >> >> different perspectives :)
> > >> >>
> > >> >> TripleO had a total of 5 sessions in Austin I'll cover them
one-by-one:
> > >> >>
> > >> >> -------------------------------------
> > >> >> Upgrades - current status and roadmap
> > >> >> -------------------------------------
> > >> >>
> > >> >> In this session we discussed the current state of upgrades -
initial
> > >> >> support for full major version upgrades has been implemented, but
the
> > >> >> implementation is monolithic, highly coupled to pacemaker, and
inflexible
> > >> >> with regard to third-party extraconfig changes.
> > >> >>
> > >> >> The main outcomes were that we will add support for more granular
> > >> >> definition of the upgrade lifecycle to the new composable
services format,
> > >> >> and that we will explore moving towards the proposed lightweight
HA
> > >> >> architecture to reduce the need for so much pacemaker specific
logic.
> > >> >>
> > >> >> We also agreed that investigating use of mistral to drive upgrade
workflows
> > >> >> was a good idea - currently we have a mixture of scripts combined
with Heat
> > >> >> to drive the upgrade process, and some refactoring into discrete
mistral
> > >> >> workflows may provide a more maintainable solution.  Potential
for using
> > >> >> the existing SoftwareDeployment approach directly via mistral
(outside of
> > >> >> the heat templates) was also discussed as something to be further
> > >> >> investigated and prototyped.
> > >> >>
> > >> >> We also touched on the CI implications of upgrades - we've got an
upgrades
> > >> >> job now, but we need to ensure coverage of full
release-to-release upgrades
> > >> >> (not just commit to commit).
> > >> >>
> > >> >> -------------------------------
> > >> >> Containerization status/roadmap
> > >> >> -------------------------------
> > >> >>
> > >> >> In this session we discussed the current status of containers in
TripleO
> > >> >> (which is to say, the container based compute node which deploys
containers
> > >> >> via Heat onto an an Atomic host node that is also deployed via
Heat), and
> > >> >> what strategy is most appropriate to achieve a fully
containerized TripleO
> > >> >> deployment.
> > >> >>
> > >> >> Several folks from Kolla participated in the session, and there
was
> > >> >> significant focus on where work may happen such that further
collaboration
> > >> >> between communities is possible.  To some extent this discussion
on where
> > >> >> (as opposed to how) proved a distraction and prevented much
discussion on
> > >> >> supportable architectural implementation for TripleO, thus what
follows is
> > >> >> mostly my perspective on the issues that exist:
> > >> >>
> > >> >> Significant uncertainty exists wrt integration between Kolla and
TripleO -
> > >> >> there's largely consensus that we want to consume the container
images
> > >> >> defined by the Kolla community, but much less agreement that we
can
> > >> >> feasably switch to the ansible-orchestrated deployment/config flow
> > >> >> supported by Kolla without breaking many of our primary operator
interfaces
> > >> >> in a fundamentally unacceptable way, for example:
> > >> >>
> > >> >> - The Mistral based API is being implemented on the expectation
that the
> > >> >>   primary interface to TripleO deployments is a parameters schema
exposed
> > >> >>   by a series of Heat templates - this is no longer true in a
"split stack"
> > >> >>   model where we have to hand off to an alternate service
orchestration tool.
> > >> >>
> > >> >> - The tripleo-ui (based on the Mistral based API) consumes heat
parameter
> > >> >>   schema to build it's UI, and Ansible doesn't support the
necessary
> > >> >>   parameter schema definition (such as types and descriptions) to
enable
> > >> >>   this pattern to be replicated.  Ansible also doesn't provide a
HTTP API,
> > >> >>   so we'd still have to maintain and API surface for the (non
python) UI to
> > >> >>   consume.
> > >> >>
> > >> >> We also discussed ideas around integration with kubernetes (a hot
topic on
> > >> >> the Kolla track this summit), but again this proved inconclusive
beyond
> > >> >> that yes someone should try developing a PoC to stimulate further
> > >> >> discussion.  Again, significant challenges exist:
> > >> >>
> > >> >> - We still need to maintain the Heat parameter interfaces for the
API/UI,
> > >> >>   and there is also a strong preference to maintain puppet as a
tool for
> > >> >>   generating service configuration (so that existing operator
integrations
> > >> >>   via puppet continue to function) - this is a barrier to directly
> > >> >>   consuming the kolla-kubernetes effort directly.
> > >> >>
> > >> >> - A COE layer like kubernetes is a poor fit for deployments where
operators
> > >> >>   require strict control of service placement (e.g exactly which
nodes a service
> > >> >>   runs on, IP address assignments to specific nodes etc) - this
is already
> > >> >>   a strong requirement for TripleO users and we need to figure
out if/how
> > >> >>   it's possible to control container placement per node/namespace.
> > >> >>
> > >> >> - There are several uncertainties regarding the HA architecture,
such as
> > >> >>   how do we achieve fencing for nodes (which is currently
provided via
> > >> >>   pacemaker), in particular the HA model for real production
deployments
> > >> >>   via kubernetes for stateful services such as rabbit/galera is
unclear.
> > >> >>
> > >> >> Overall a session with much discussion, but further prototyping
and
> > >> >> discussion is required before we can define a definitive
implementation
> > >> >> strategy (several folks are offering to be involved in this).
> > >> >>
> > >> >> ---------------------------------------------
> > >> >> Work session (Composable Services and beyond)
> > >> >> ---------------------------------------------
> > >> >>
> > >> >> In this session we discussed the status of the currently
in-progress work
> > >> >> to decompose our monolithic manifests into per-service
profiles[3] in
> > >> >> puppet-tripleo, then consume these profiles via per-service
templates in
> > >> >> tripleo-heat-templates[4][5], and potential further work to
enable fully
> > >> >> composable (including user defined) roles.
> > >> >>
> > >> >> Overall there was agreement that the composable services work and
puppet
> > >> >> refactoring are going well, but that we need to improve velocity
and get
> > >> >> more reviewers helping to land the changes.  There was also
agreement that
> > >> >> a sub-team should form temporarily to drive the remaining
work[6], that
> > >> >> we should not land any new features in the "old" template
architecture and
> > >> >> relatedly that tripleo cores should help rebase and convert
currently
> > >> >> under-review changes to the new format where needed to ease the
transition.
> > >> >>
> > >> >> I described a possible approach to providing fully composable
roles that
> > >> >> uses some template pre-processing (via jinja2)[7], a blueprint
and initial
> > >> >> implementation will be posted soon, but overall the response was
positive,
> > >> >> and it may provide a workable path to fully composable roles that
won't
> > >> >> break upgrades of existing deployments.
> > >> >>
> > >> >> ---------------------------------
> > >> >> Work session (API and TripleO UI)
> > >> >> ---------------------------------
> > >> >>
> > >> >> In this session we disccussed the current status of the TripleO
UI, and the
> > >> >> Mistral based API implementation it depends on.
> > >> >>
> > >> >> Overall it's clear there is a lot of good progress in this area,
but there
> > >> >> are some key areas which require focus and additional work to
enable a
> > >> >> fully functional upstream TripleO UI:
> > >> >>
> > >> >> - The undercloud requires some configuration changes to enable
the UI
> > >> >>   necessary access to the undercloud services
> > >> >>
> > >> >> - The UI currently depends on the previous prototype API
implementation,
> > >> >>   and must be converted to the new Mistral based API (in-progress)
> > >> >>
> > >> >> - We need to improve velocity of the Mistral based implementation
(need
> > >> >>   more testing and reviewing), such that we can land it and folks
can start
> > >> >>   integrating with it.
> > >> >>
> > >> >> - There was agreement that the previously proposed validation API
can be
> > >> >>   implemented as another Mistral action, which will provide a way
to run
> > >> >>   validation related to the undercloud configuration/state.
> > >> >>
> > >> >> - There are some features we could add to Heat which would make
> > >> >>   implementation cleaner (description/metadata in environment
files, enable
> > >> >>   multiple parameter groups.
> > >> >>
> > >> >> The session concluded with some discussion around the
requirements related
> > >> >> to network configuration.  Currently the templates offer
considerable
> > >> >> flexibility in this regard, and we need to decide how this is
surfaced via
> > >> >> the API such that it's easily consumable via TripleO Ux
interfaces.
> > >> >>
> > >> >> -----------------------------------
> > >> >> Work session (Reducing the CI pain)
> > >> >> -----------------------------------
> > >> >>
> > >> >> This session covered a few topics, but mostly ended up focussed
on the
> > >> >> debate with infra regarding moving to 3rd party CI.  There are
arguments on
> > >> >> both sides here, and I'll perhaps let derekh or dprince reply
with a more
> > >> >> detailed discussion of them, but suffice to say there wasn't a
clear
> > >> >> conclusion, and discussion is ongoing.
> > >> >>
> > >> > It was mostly me pushing for tripleo to move to 3rd party CI.  I
still think it
> > >> > is the right place for tripleo however after hearing dprince's
concerns I think
> > >> > we have a compromise for the moment. I've gone a head and done the
work to
> > >> > upgrade tripleo-ci jenkins slave from Fedora-22 to the centos-7
DIB[1] produced by
> > >> > openstack-infra. Please take a moment to review the patch as it
exposed 3
> > >> > issues.
> > >> >
> > >> > 1) CentOS 7 does not support nbd out of the box, and we can't
compile a new
> > >> > kernel ATM. So, I've worked around the problem by converting the
qcow2 image to
> > >> > raw format, update instack and reconverted it back to qcow2.
Ideally, if I can
> > >> > find where the instack.qcow2 image is build, we also produce a raw
format so we
> > >> > don't have to do this every gate job.
> > >>
> > >> The conversion should be ok for the moment to allow use to make
> > >> progress, longer term
> > >> we'll probably need to change the libvirt domain definitions on the
> > >> testenvs in order to
> > >> be able to just generate and use a raw format.
> > >>
> > >> >
> > >> > 2) Jenkins slave needs more HDD space. Using centos-7 we cache
data to the slave
> > >> > now, mostly packages and git repos.  As a result the HDD starts at
7.5GB and
> > >> > because the current slaves use 20GB we quickly run out of space.
Ideally we
> > >> > need 80GB[2] of space to be consistent with the other cloud
provides we run
> > >> > jenkins slaves on.
> > >>
> > >> This is where we'll likely hit the biggest problems, In order to bump
> > >> the disk space allocated to the jenkins slaves and to simultaneously
> > >> take advantage of the SSD's we're going to have to look into using
the
> > >> SSD's as a cache for the spinning disks. I havn't done this before
but
> > >> I hope we can look into it soon.
> > >>
> > >> >
> > >> > 3) No AFS mirror in tripleo-ci[3]. To take advantage of the new
centos-7 dib,
> > >> > openstack-infra has an AFS mirroring infrastructure in place.  As
a result,
> > >> > we'll also need to launch one in tripleo-ci.  For the moment, I've
disabled the
> > >> > logic to configure the mirror.  Mirrors include pypi, npm, wheel,
ubuntu trusty,
> > >> > ubuntu precise, ceph.  We are bringing RPM mirrors online shortly.
> > >>
> > >> I'm not sure we'll get as much a benefit from this as the devstack
> > >> based jobs do, as is some of the mirrors you mention wouldn't be used
> > >> at all while others we would only make very light use of. Is it
> > >> possible to selectively add mirrors to the AFS mirror, or add
> > >> additional things that tripleo would be interested in? e.g. image
> > >> cache
> > >>
> > > I think you'll actually benefit from this, mostly because you no
longer have to
> > > run your own mirror / squid servers in tripleo.  The way AFS mirrors
work is
> > > more like a cache.
> > >
> > > Currently our AFS volumes in rax-dfw are over 1TB of data now, but
since our
> > > jobs only access a small fraction of the data, most mirror AFS
servers are only
> > > using about 5GB of data locally.
> > >
> > > In the case of tripleo, it will even be less since you are not
running the full
> > > suite of job in your cloud.
> > >
> > > Right now, nothing would need to chance to selectively use mirrors,
because
> > > AFS will only cache what is used.  As for adding things specific to
tripleo, it
> > > could be possible, it is also possible other jobs will likely need
the same bits
> > > too.
> > >
> > > I strongly encourage us to setup an AFS mirror.
> >
> > Ok, I'm still a little skeptical because our biggest bandwidth hogs
> > arn't mentioned in the list of things mirrored , but that's not a good
> > reason to get in your way, if it proves to be a help then great, if
> > not at least we tried, so what do you need from me to try it out? If I
> > create a d1.medium trusty instance with a floating IP, will that work
> > for you? This should allow you to test things for now, longer term
> > were going to have the same problems we do with larger jenkins
> > instance so until we solve this we wont be able to consider this a
> > permanent part of the infrastructure.
> >
> I just need to know the flavor we are using, I'll be using our
> opentack-infra/system-config launch-node script to provision the server.
Since
> we need to loop it into our ansible / puppet wheel.
>
> If you are okay with d1.medium for now, I can start it.

I'm happy to go for that for now to allow you to test this out, we'll
likely have to change something in future though.

>
> > >
> > >> >
> > >> > I'd really like to get some feedback on these 3 issue, I know they
might not be
> > >> > solved today because of the hardware move.  However, I think we
are pretty close
> > >> > now to getting triplo-ci more inline with some of the
openstack-infra tooling.
> > >> >
> > >> > [1] https://review.openstack.org/#/c/312725/
> > >> > [2] https://review.openstack.org/#/c/312992/
> > >> > [3] https://review.openstack.org/#/c/312058/
> > >> >
> > >> >> The other output from this session was agreement that we'd move
our jobs to
> > >> >> a different cloud (managed by the RDO community) ahead of a
planned
> > >> >> relocation of our current hardware.  This has advantages in terms
of
> > >> >> maintenance overhead, and if it all goes well we can contribute
our
> > >> >> hardware to this cloud long term vs maintaining our own
infrastructure.
> > >> >>
> > >> >>
> > >> >> Overall it was an excellent week, and I thank all the session
participants
> > >> >> for their input and discussion.  Further notes can be found in the
> > >> >> etherpads linked from [1] but feel free to reply if specific
items require
> > >> >> clarification (and/or I've missed anything!)
> > >> >>
> > >> >> Thanks,
> > >> >>
> > >> >> Steve
> > >> >>
> > >> >> [1]
https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
> > >> >> [2] https://review.openstack.org/#/c/299628/
> > >> >> [3]
https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
> > >> >> [4]
https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
> > >> >> [5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
> > >> >> [6]
http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
> > >> >> [7] http://paste.fedoraproject.org/360836/87416814/
> > >> >>
> > >> >>
__________________________________________________________________________
> > >> >> OpenStack Development Mailing List (not for usage questions)
> > >> >> Unsubscribe:
OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >> >
> > >> >
__________________________________________________________________________
> > >> > OpenStack Development Mailing List (not for usage questions)
> > >> > Unsubscribe:
OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >>
> > >>
__________________________________________________________________________
> > >> OpenStack Development Mailing List (not for usage questions)
> > >> Unsubscribe:
OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> > >
__________________________________________________________________________
> > > OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe:
OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
__________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160519/c5584407/attachment.html>


More information about the OpenStack-dev mailing list