[openstack-dev] [Infra] Newton Summit Infra Sessions Recap
fungi at yuggoth.org
Tue May 10 19:00:35 UTC 2016
I'm Cc'ing this to the openstack-infra ML but setting MFT to direct
subsequent discussion to the openstack-dev ML so we can hopefully
avoid further cross-posting as much as possible. If you're replying
on a particular session topic, please update the Subject so that the
subthreads are easier to keep straight.
A brief update was provided from contributors working on Maniphest
(Craige McWhirter) and Storyboard (Zara Zaimeche, Adam Coldrick),
followed by a rehash of general task tracking needs within the
community. This has shifted a bit since the Release team now has
automation covering some of their previous needs, so we confirmed
which features were still a must vs. which had fallen in priority
and whether that made a difference in choosing between
Thierry Carrez volunteered to write a spec and work toward a TC
resolution on consensus for moving the community to a suitable task
tracking platform. It's now in progress at
https://review.openstack.org/314185 . James Blair and I volunteered
as backups on that task.
There were also some related discussions with regard to dashboard
needs for the Product working group (stemming from their "Defining
scope of cross projects specs" session on Tuesday afternoon), and
some further ad hoc discussion during our sprint day about VMT
embargo bug workflow and related needs.
It was additionally confirmed that Infra could still deploy and
maintain a Pholio service for the UI/UX team's use even if Maniphest
did not end up in production as they are separate and distinct
tools, and that the existing deployment automation and configuration
management should remain suitable for that purpose.
Landing Page for Contributors
This ended up being a little about publication/maintenance
mechanisms, and mostly about picking a non-contentious hostname for
the new "contributing" portal. Thierry Carrez and James Blair had
strong opinions on naming, attempting to strike a balance between
clarity of scope and avoiding alienation of potential audiences.
This discussion continued after the session ended, well into the
lunch line, and eventually "project.openstack.org" was settled on as
being clearly related to the upstream project teams while not
overreaching into the domains of work being done by the foundation
and other groups outside the domain of the TC.
The initial plan is to just throw some ugly static HTML (maybe
locally generated with a templating engine and then committed) into
a repo and push that up to a vhost, but not publicize it until it
gets a little more polish. Ultimately, we want a "choose your own
adventure" sort of flow to the site, which avoids giving newcomers
too much information they don't need, so as to avoid confusion. Mike
Perez (who was unable to attend the session due to a conflict) has a
new contributor workflow/walkthrough targeted at low barrier to
entry audiences we might incorporate or borrow from.
Thierry volunteered to lead this, potentially with Mike's help, and
Jimmy McArthur offered to provide layout/formatting and information
engineering assistance to make it more visually appealing and easier
Launch-Node, Ansible and Puppet
The session was on further automating our server creation, making it
possible to trigger and drive new server creation from configuration
in Git. Spencer Krum volunteered to write a spec for the new
automation needed. The hope is that we might incorporate some of
this into the upcoming distro upgrade process for our servers, as a
means of vetting the proposed solution.
It was also suggested that there should also be a spec for hot/cold
orchestration in service of server replacement cut-over, though I
think we're still lacking a volunteer for that second spec.
We started with a rehash of earlier mailing list threads and a
summary of current state on the wiki server. While intended to be
primarily on our plans to get the wiki back into a working state, it
ended up being more about the long-term viability of running a wiki
for the OpenStack community.
Consensus within the room was that we want to continue the current
years-long effort of moving important content off the wiki,
deprecating it for specific use cases when there are more suitable
publication and information management mechanisms available. Some
current uses will still need new solutions engineered of course, but
over the coming year we'd like to get to a point where sufficient
content is moved off so that we can better determine whether its
continued existence is warranted.
I'll be starting a thread on the openstack at lists.openstack.org
mailing list in the next few days to cover the situation in greater
detail and get the community-facing discussion going for this.
Thierry Carrez volunteered to work on identifying remaining wiki use
cases, and on handling discussion/collecting feedback throughout the
wider community. Elizabeth K. Joseph volunteered to research
alternative solutions for the current under-served use cases as
they're identified, and determine feasibility of implementation.
That said, we still need to have the wiki in sufficient shape to be
able to continue serving its current purpose, to enable us to better
identify its remaining stakeholders, and to provide them with an
opportunity to comfortably transition to other platforms as we make
them available. Paul Belanger volunteered (with Spencer Krum and
James Blair assisting as backups) to get the current deployment into
configuration management, and upgraded into a usable and reasonably
spam-resistant state again.
For this session we mostly reviewed the security model for the
proposal.slave.openstack.org host and discussed job configuration
reviewing concerns related to that. Consensus was reached that jobs
running on privileged workers with access to sensitive credentials
need to be carefully vetted, with any commands/scripts they run
staying self-contained in the project-config repository and relying
only on distro-packaged tools and utilities unless absolutely
We also touched on the anti-pattern of jobs which propose Git
commits for review (the inherent risks and added review
effort/churn), and ways to attempt to educate contributors that this
is best avoided except in extreme situations where no other
sufficient option exists (such as publishing autogenerated artifacts
to more durable locations).
Another outcome of this is that Andreas Jaeger put together some
project-config specific reviewing guidelines:
In the future, that will be extended to mention the sorts of tribal
knowledge which came up in this session so that reviewers and
submitters are all on the same page.
This was a feature gap discussion, covering the remaining missing
pieces in the wake of our transition to ansible-driven puppet apply.
It was identified that we still need to improve logging, perhaps
file per host stored on the bastion server (James Blair and Philip
Schwartz volunteered to work out this part). Also puppetboard
failure reporting needs to be debugged (Spencer Krum volunteered to
do it), and we need to start automatically as well as retroactively
cleaning up defunct entries in puppetdb. Ansible hangs (which as it
turns out may have been related to very long SSH timeouts on
unavailable servers) and memory consumption/OOM conditions need to
be solved. Further, it should now be safe to turn off our
puppetmaster service when someone gets time to push the change
ripping that out of our manifests.
OpenID/SSO for Community Systems
This was a review of our current SSO situation across various
community services, and a discussion of whether it makes sense to
continue working toward moving Infra systems from Launchpad's OpenID
provider to an OpenStackID instance or if we should instead be
looking at running Ipsilon to provide OpenID SSO. Pros and cons of
the three systems were compared.
A next step for Ipsilon would be a proof of implementation
interfaced to the foundation's current Silverstripe account backend,
to be able to demonstrate whether some of the systems we're already
using with openstackid.org (groups.openstack.org,
translate.openstack.org, refstack.openstack.org) can interoperate
with it correctly (James Blair is working on this). If this turns
out to be feasible, we also need a spec describing such an
implementation and transition plan.
If sticking with OpenStackID in the long run, alternative
scaling/split options need to be explored so that the development
pace of the foundation Web site and summit scheduling/mobile app and
its out-of-spec OpenID extensions can be safely isolated from
continued stable use by other community systems.
For either system, we also need to make sure that high-availability
(preferably cross-provider) concerns are addressed.
Distro Upgrade Plans
We talked about what systems were still running Ubuntu 12.04 LTS,
and classified them as to whether they could be replaced/migrated to
14.04 LTS trivially vs. which require a planned maintenance window
to minimize disruption. Early in the cycle (date to hopefully be
settled in today's Infra meeting) some root admins will collaborate
on replacing the servers which can be upgraded without impact, and
then in the ensuing weeks we'll attempt to work through the
remainder in a series of planned and announced outages. We also
covered availability of 16.04 LTS and agreed that while we might
deploy some new systems on it, our existing production servers
likely won't move to that until sometime in 2017.
We briefly touched on use of 16.04 LTS in CI jobs for Newton
(retaining 14.04 LTS for jobs running on stable/liberty and
stable/mitaka changes), but as we've been through a similar
transition before there wasn't much planning aside from "yep, we
should get that set up ASAP."
That concludes my recollection of these sessions over the course of
the week--thanks for reading this far--feel free to follow up (on
the openstack-dev ML please) with any corrections/additions!
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 949 bytes
Desc: Digital signature
More information about the OpenStack-dev