[openstack-dev] [charms] PTG summary
James Page
james.page at ubuntu.com
Wed Sep 20 12:07:08 UTC 2017
Hi All
Here’s a summary of the charm related discussion from PTG last week.
# Cross Project Discussions
## Skip Level Upgrades
This topic was discussed at the start of the week, in the context of
supporting upgrades across multiple OpenStack releases for operators. What
was immediately evident was this was really a discussion around
‘fast-forward’ upgrades, rather than actually skipping any specific
OpenStack series as part of a cloud upgrade. Deployments would still need
to step through each OpenStack release series in turn, so the discussion
centred around how to make this much easier for operators and deployment
tools to consume than it has been to-date.
There was general agreement on the principles that all steps required to
update a service between series should be supported whilst the service is
offline – i.e. all database migrations can be completed without the
services actually running; This would allow multiple upgrade steps to be
completed without having to start services up on interim steps. Note that a
lot of projects all ready support this approach, but its never been agreed
as a general policy as part of the ‘supports-upgrade‘ tag which was one of
the actions resulting from this discussion.
In the context of the OpenStack Charms, we already follow something along
these lines for minimising the amount of service disruption in the control
plane during OpenStack upgrades; with implementation of this approach
across all projects, we can avoid having to start up services on each
series step as we do today, further optimising the upgrade process
delivered by the charms for services that don’t support rolling upgrades.
## Policy in Code
Most services in OpenStack rely on a policy.{json,yaml} file to define the
policy for role based access into API endpoints – for example, what
operations require admin level permissions for the cloud. Moving all policy
default definitions to code rather than in a configuration file is a goal
for the Queens development cycle.
This approach will make adapting policies as part of an OpenStack Charm
based deployment much easier, as we only have to manage the delta on top of
the defaults, rather than having to manage the entire policy file for each
OpenStack release. Notably Nova and Keystone have already moved to this
approach during previous development cycles.
## Deployment (SIG)
During the first two days, some cross deployment tool discussions where
held for a variety of topics; of specific interest for the OpenStack Charms
was the discussion around health/status middleware for projects so that the
general health of a service can be assessed via its API – this would cover
in-depth checks such as access to database and messaging resources, as well
as access to other services that the checked service might depend on – for
example, can Nova access Keystone’s API for authentication of tokens etc.
There was general agreement that this was a good idea, and it will be
proposed as a community goal for the OpenStack project.
# OpenStack Charms Devroom
## Keystone: v3 API as default
The OpenStack Charms have optionally supported Keystone v3 for some time;
The Keystone v2 API is officially deprecated, so we had discussion around
approach for switching the default API deployed by the charms going
forwards; in summary
New deployments should default to the v3 API and associated policy
definitions
Existing deployments that get upgraded to newer charm releases should not
switch automatically to v3, limiting the impact of services built around v2
based deployments already in production.
The charms already support switching from v2 to v3, so v2 deployments can
upgrade as and when they are ready todo so.
At some point in time, we’ll have to automatically switch v2 deployments to
v3 on OpenStack series upgrade, but that does not have to happen yet.
## Keystone: Fernet Token support
The charms currently only support UUID based tokens (since PKI was dropped
from Keystone); The preferred format is now Fernet so we should implement
this in the charms – we should be able to leverage the existing PKI key
management code to an extent to support Fernet tokens.
## Stable Branch Life-cycles
Currently the OpenStack Charms team actively maintains two branches – the
current development focus in the master branch, and the most recent stable
branch – which right now is stable/17.08. At the point of the next
release, the stable/17.08 branch is no longer maintained, being superseded
by the new stable/XX.XX branch. This is reflected in the promulgated
charms in the Juju charm store as well. Older versions of charms remain
consumable (albeit there appears to be some trimming of older revisions
which needs investigating). If a bug is discovered in a charm version from
a inactive stable branch, the only course of action is to upgrade the the
latest stable version for fixes, which may also include new features and
behavioural changes.
There are some technical challenges with regard to consumption of multiple
stable branches from the charm store – we discussed using a different team
namespace for an ‘old-stable’ style consumption model which is not that
elegant, but would work. Maintaining more branches means more resource
effort for cherry-picks and reviews which is not feasible with the
currently amount of time the development team has for these activities so
no change for the time being!
## Service Restart Coordination at Scale
tl;dr no one wants enabling debug logging to take out their rabbits
When running the OpenStack Charms at scale, parallel restarts of daemons
for services with large numbers of units (we specifically discussed
hundreds of compute units) can generate a high load on underlying control
plane infrastructure as daemons drop and re-connect to message and database
services potentially resulting in service outages. We discussed a few
approaches to mitigate this specific problem, but ended up with focus on
how we could implement a feature which batched up restarts of services into
chunks based on a user provided configuration option.
We also had some good conversation around how unit level overrides for some
configuration options would be useful – supporting the use case where a
user wants to enable debug logging for a single unit of a service (maybe
its causing problems) without having to restart services across all units
to support this. This is not directly supported by Juju today – but we’ll
make the request!
## Cross Model Relations – Use Cases
We brainstormed some ideas about how we might make use of the new
cross-model relation features being developed for future Juju versions;
some general ideas:
Multiple Region Cloud Deployments
Keystone + MySQL and Dashboard in one model (supporting all regions)
Each region (including region specific control plane services) deployed
into a different model and controller, potentially using different MAAS
deployments in different DC’s.
Keystone Federation Support
Use of Keystone deployments in different models/controllers to build out
federated deployments, with one lead Keystone acting as the identity
provider to other peon Keystones in different regions or potentially
completely different OpenStack Clouds.
We’ll look to use the existing relations for some of these ideas, so as
the implementation of this feature in Juju becomes more mature we can be
well positioned to support its use in OpenStack deployments.
## Deployment Duration
We had some discussion about the length of time taken to deploy a fully HA
OpenStack Cloud onto hardware using the OpenStack Charms and how we might
improve this by optimising hook executions.
There was general agreement that scope exists in the charms to improve
general hook execution time – specifically in charms such as RabbitMQ and
Percona XtraDB Cluster which create and distribute credentials to consuming
applications.
We also need to ensure that we’re tracking any improvements made with good
baseline metrics on charm hook execution times on reference hardware
deployments so that any proposed changes to charms can be assessed in terms
of positive or negative impact on individual unit hook execution time and
overall deployment duration – so expect some work in CI over the next
development cycle to support this.
As a follow up to the PTG, the team is looking at whether we can use the
presence of a VIP configuration option to signal to the charm to postpone
any presentation of access relation data to the point after which HA
configuration has been completed and the service can be accessed across
multiple units using the VIP. This would potentially reduce the number
(and associated cost) of interim hook executions due to pre-HA relation
data being presented to consuming applications.
## Mini Sprints
On the Thursday of the PTG, we held a few mini-sprints to get some early
work done on features for the Queens cycle; specifically we hacked on:
Ceph -> Ceph Mon charm migration – how we migrate existing ceph charm
deployments to ceph-mon and ceph-osd.
Service Discovery for optimisation of configuration of OpenStack services.
OpenStack Loadbalancer for API service call load balancing in more complex
network topology.
Good progress was made in most areas with some reviews already up.
We had a good turnout with 10 charm developers in the devroom – thanks to
everyone who attended and a special call-out to Billy Olsen who showed up
with team T-Shirts for everyone!
We have some new specs already up for review, and I expect to see a few
more over the next two weeks!
Cheers
James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170920/22448bd3/attachment.html>
More information about the OpenStack-dev
mailing list