[openstack-dev] [Cinder] Ocata Design Summit Recap
sean.mcginnis at gmx.com
Tue Nov 1 22:21:49 UTC 2016
The Cinder team had a very productive week at the Design Summit, IMO. There
were a lot of great discussions throughout the week, both in Cinder sessions
and in the hallways. A huge thank you to all who were able to participate, and
a reminder for those that couldn't attend that the PTG is coming up at the end
of February. Everyone is welcome to participate!
Here is a brief recap of the sessions from last week. It no where near does it
justice, but hopefully gives enough of an overview to get an idea of what was
discussed and pointers where to follow up if interested.
Cinder Test Working Group Status
scottda has been driving a weekly meeting to focus on testing. This session
was a recap and update on the progress made by the topics discussed in that
A lot of our needs have required multi-node gate testing, which has now
merged. There are a lot of features (active/active HA, multi-backend, etc)
that will require multiple nodes. We can now move forward with getting
coverage on some of these areas.
We also briefly discussed the test focus for the Ocata release and the desire
(at least by me) to get area owners assigned for things like backup, upgrades,
migrations, etc. to review our test coverage and look for areas to improve, as
well as to facilitate manual testing beyond what we have been able to automate.
The etherpad used for this discussion is the same as used for the weekly
Cinder Stable Driver Fixes
This was an opportunity to pull in other areas (infra, stable) and get feedback
from packagers and deployers to continue the discussion started in the mailing
list about providing somewhere central to backport driver fixes.
This is not about changing stable policy.
The main point being many (or most) vendors in Cinder have their own forks of
the cinder repo used just to provide backported driver fixes that can't be
backported to the upstream repo because the branch is either in security-only
or has end of lifed. This makes it difficult for packagers and end users to
find needed bug fixes with things scattered all over the place. The desire is
to have one central place for them to go to get these fixes, even if it's no
longer supported by the community.
We mostly agreed on creating new branches in the upstream repo to keep these.
We will only accept bug fixes isolated to the driver. We will not accept new
features or changes outside the driver that could impact other drivers. The
branches will be named such that it is obvious they are not supported stable
branches. There will be no third party CI requirement and probably very minimal
gate testing (py27 unit tests?) just to help catch obvious errors.
This will basically act as a file share to help make it easier for those that
need it to find patches.
Etherpad from discussion:
Stand Alone Cinder Service
We've discussed this in the past, but there is still interest in expanding
this. Especially for vendors involved in Cinder, this has some appeal. We do
get questions from users about using Cinder on its own as a way to manage
heterogeneous storage environments.
By making Cinder easier to install and configure as a stand alone service, or
at least keeping external dependencies down to only core requirements such as
keystone, it makes it easier for users to take advantage of the capabilities
we have implemented in Cinder as a backing for things like Docker, Kubernetes,
and other systems that need storage integration and would otherwise require
vendors to implement yet another driver to plugin in and enable their storage
for these environments.
No real action items were identified out of this session. It really was meant
more as a discussion and a means to get everyone thinking about things. The
notes from the session can be found here:
Pike (and beyond) Planning
We've decided to make Ocata mostly about bug fixes and just finishing out the
outstanding features that we have been working on but have not been able to
This was mainly a brainstorming session followed by some discussion to get
ideas and start thinking about priorities as we move past Ocata. The etherpad
for the discussion can be found here:
Gorka walked everyone through some of the issues we have with the current
(cheesecake) implementation of replication. Most of the discussion was around
how to handle things consistently when we failover and standardize on what we
are doing to failback.
There are some things that are difficult to control when failed over. There are
operations that we do not want to allow when failed over, but since some things
don't go through the scheduler we don't have one clean place to enforce these
restrictions. A spec will be written with a proposal of how to address this.
Failback was added to some drivers even though it wasn't a capability called
out in the original replication spec. This has become somewhat of a de facto
standard as other drivers have followed suite. We discussed the current
approach and agreed to update documentation for other driver developers to
follow to make sure this is done consistently.
Other topics included handling non-replicated volume and snapshot status on
failover and what needs to be done to handle support for A/A HA.
More details can be found in the etherpad here:
We dug into John's two proposals for refactoring our attach and detach calls
to clean up the Cinder-Nova interaction as a first necessary step before being
able to move ahead with supporting multiattach.
Both approaches clean up the code for these requests. In either case, there
will need to be work done on the Nova side to handle these calls and to clean
up some non-optimal handling there.
We hope to land changes in Cinder even if Nova can't do anything with them yet.
The hope is having them in place and available will make it easier to implement
the support on that side.
The preference is for the first proposal (v1) but there were some concerns on
the Nova side on doing this more simplified interface. The v2 proposal is an
attempt to address some of those concerns. After further discussion Friday with
the Nova team, I think most of the concerns with the v1 approach were better
understood and alleviated, so we will likely try to proceed with v1.
These are just PoC patches right now. More work needs to be done to add
microversioning and do some PoC changes in Nova to test out that there are no
unexpected side effects. We do hope to land these changes in Ocata.
Erlon went through his finding of what works and what doesn't depending on the
many different ways you can configure NFS shares.
This has been the challenge with getting snapshot support for the NFS driver.
This (and its derivitives) are the only drivers that do not support snapshot
and prevent us from fully declaring this as a core functionality for any Cinder
There was discussion around whether we actually need to support all of the
possible config options, or document the requirements for being able to use
them. We agreed with the various limitations, plus with the differences between
implementation, documenting the requirements for being able to do snapshots
was the best option.
We discusses scale out backup, a generic backup driver implementation, and the
state of backup testing, with most of the discussion around the proposal for a
generic backup driver. The spec can be found here:
There is general interest in this, but it was agreed the spec needs a little
more detail added so we are all clear on what is being proposed. Updates will
be made to elaborate on some specific details.
Various topics covered Friday afternoon.
- Changes to Cinder-Nova API
- HA A/A completion
- Ongoing user messaging work
- CG volume group migration
- NFS snapshots
- Test improvements
- Bug fixes
Linux Specific Commands
Agreed to be open to changes to allow isolating platform specific commands to
enable easier support for platforms like Solaris.
Generic Volume Groups
Need to migrate driver support for consistency groups to consistent type
volume groups. Needs to be updated for Pike.
Devref updates needed to support using generic volume groups.
Vendor Assisted Volume Rollback
There have been multiple proposals to support rolling back volumes to a
snapshot. Part of the issue has been some of the limitations some backends
have with being able to do this, either by not having native support or the
possibility of destroying some snapshots by rolling back.
Agreed to proceed with this if a generic implementation can be supported and
we limit this to only allowing rollback to the last snapshot.
Outstanding bug causes issues for the zmq driver. We agreed fixing this is a
bug fix and not a new feature, so we will try to fix this in Ocata.
Display provider_id and provider_location for admin
Some of this is internal data that shouldn't be exposed, even to the admin.
It was agreed we would allow showing provider_id to make it easier for admins
to do some correlation of volumes between Cinder and the backend array, but
provider_location should not be exposed.
Policies in code
Proposal to use the new approach of implementing all default policies in the
code to be extracted into a sample policy file only necessary if the admin
wants to override any of the defaults.
This is still a new capability and there are some concerns that there will be
issues with upgrades that need to be thought through yet. We will hold off on
this for now.
Private volume types
Private volume types were introduced in Kilo. There are no tempest tests
covering this and no it appears there problems with the implementation now.
A bug is filed here:
Needs to be investigated and fixed, ideally with at least unit tests, and
hopefully also tempest tests, to make sure it works.
API to change log levels at runtime
Sometimes an operator needs to increase the log level while troubleshooting,
then drop it back down once done. Right now that requires editing the config
file and restarting services. Proposal is to expose this capability via an API
call to make it easier for the operator to do this.
This would be a usability improvement and will be accepted in Ocata if an
implementation is ready.
Of course there were various other smaller topics and side conversations as
well. Overall very productive in my opinion.
More information about the OpenStack-dev