[openstack-dev] [manila] Barcelona Design Summit summary
Ben Swartzlander
ben at swartzlander.org
Thu Nov 3 14:38:43 UTC 2016
Thanks to gouthamr for doing these writeups and for recording!
We had a great turn out at the manila Fishbowl and working sessions.
Important notes and Action Items are below:
===========================
Fishbowl 1: Race Conditions
===========================
Thursday 27th Oct / 11:00 - 11:40 / AC Hotel -Salon Barcelona - P1
Etherpad: https://etherpad.openstack.org/p/ocata-manila-race-conditions
Video: https://www.youtube.com/watch?v=__P7zQobAQw
Gist:
* We've some race conditions that have worsened over time:
* Deleting a share while snapshotting the share
* Two simultaneous delete-share calls
* Two simultaneous create-snapshot calls
* Though the end result of the race conditions is not terrible, we can
leave resources in untenable states, requiring administrative cleanup in
the worst scenario
* Any type of resource interaction must be protected in the database
with a test-and-set using the appropriate status fields
* Any test-and-set must be protected with a lock
* Locks must not be held over long running tasks: i.e, RPC Casts, driver
invocations etc.
* We need more granular state transitions: micro/transitional states
must be added per resource and judiciously used for state locking
* Ex: Shares need a 'snapshotting' state
* Ex: Share servers need states to signify setup phases, a la nova
compute instances
Discussion Item:
* Locks in the manila-api service (or specifically, extending usage of
locks across all manila services)
* Desirable because:
* Adding test-and-set logic at the database layer may render code
unmaintainable complicated as opposed to using locking abstractions
(oslo.concurrency / tooz)
* Cinder has evolved an elegant test-and-set solution but we may not
be able to benefit from that implementation because of the lack of being
able to do multi-table updates and because the code references OVO which
manila doesn't yet support.
* Un-desirable because:
* Most distributors (RedHat/Suse/Kubernetes-based/MOS) want to run
more than one API service in active-active H/A.
* If a true distributed locking mechanism isn't used/supported, the
current file-locks would be useless in the above scenario.
* Running file locks on shared file systems is a possibility, but
applies configuration/set-up burden
* Having all the locks on the share service would allow scale out of
the API service and the share manager is really the place where things
are going wrong
* With a limited form of test-and-set, atomic state changes can still
be achieved for the API service.
Agreed:
* File locks will not help
Action Items:
(bswartz): Will propose a spec for the locking strategy
(volunteers): Act on the spec ^ and help add more transitional states
and locks (or test-and-set if any)
(gouthamr): state transition diagrams for shares/share
instances/replicas, access rules / instance access rules
(volunteers): Review ^ and add state transition diagrams for
snapshots/snapshot instances, share servers
(mkoderer): will help with determining race conditions within
manila-share with tests
=====================================
Fishbowl 2: Data Service / Jobs Table
=====================================
Thursday 27th Oct / 11:50 - 12:30 / AC Hotel - Salon Barcelona - P1
Etherpad:
https://etherpad.openstack.org/p/ocata-manila-data-service-jobs-table
Video: https://www.youtube.com/watch?v=Sajy2Qjqbmk
Gist:
* Currently, a synchronous RPC call is made from the API to the
share-manager/data-service that's performing a migration to get the
progress of a migration
* We need a way to record progress of long running tasks: migration,
backup, data copy etc.
* We need to introduce a jobs table so that the respective service
performing the long running task can write to the database and the API
relies on the database
Discussion Items:
* There was a suggestion to extend the jobs table to all tasks on the
share: snapshotting, creating share from snapshot, extending, shrinking,
etc.
* We agreed not to do this because the table can easily go out of
control; and there isn't a solid use case to register all jobs. Maybe
asynchronous user messages is a better answer to this feature request
* "restartable" jobs would benefit from the jobs table
* service heartbeats could be used to react to services dying while
running long running jobs
* When running the data service in active-active mode, a service going
down can pass on its jobs to the other data service
Action Items:
(ganso): Will determine the structure of the jobs table model in his spec
(ganso): Will determine the benefit of the data service reacting to
additions in the database rather than acting upon RPC requests
=====================================
Working Sessions 1: High Availability
=====================================
Thursday 27th Oct / 14:40 - 15:20 / CCIB - Centre de Convencions
Internacional de Barcelona - P1 - Room 130
Etherpad: https://etherpad.openstack.org/p/ocata-manila-high-availability
Video: https://www.youtube.com/watch?v=xFk8ShK6qxU
Gist:
* We have a patch to introduce the tooz abstraction library to manila,
it currently creates a tooz coordinator for the manila-share service and
demonstrates replacing oslo concurrency locks to tooz locks:
https://review.openstack.org/#/c/318336/
* The heartbeat seems to have issues, needs debugging
* The owner/committer have tested this patch with both FileDriver and
Kazoo/Zookeeper as tooz backends. We need to test other tooz backends
* Distributors do not package dependencies for all tooz backends
* We plan to introduce leader election via tooz. We plan to use this in
cleanups, designate the service that performs polling (migration,
replication of shares and snapshots, share server cleanup)
* Code needs to be written to integrate the use of tooz/dlm via the
manila devstack plugin so it can be gate tested
Action Items:
(gouthamr): Will document how to set up tooz with 2 or more share services
(bswartz): Will set up a sub group of contributors to code/test H/A
solutions in this release
===============================
Working Session 2: Access Rules
===============================
Thursday 28th Oct / 11:00 - 11:40 / CCIB - Centre de Convencions
Internacional de Barcelona - P1 - Room 134
Etherpad: https://etherpad.openstack.org/p/ocata-manila-high-availability
Video: https://www.youtube.com/watch?v=62EllNOZ3aw
Gist:
* We have had a number of bugs with our access rules implementation
since two important design changes were made in manila: introduction of
share instances and unifying allow_access/deny_access driver interfaces
* The most significant of the bugs is the presence of race conditions
that we have tried fixing multiple times but haven't ironed out.
*
https://blueprints.launchpad.net/manila/+spec/fix-and-improve-access-rules
attempts to rectify the issues with reintroducing per share instance per
access rule statuses and adding transitional statuses and performing
pessimistic locking around state transitions. Database models and API
needs cleanup wrt to the way they're currently accessed.
Discussion Items:
* While applying access rules in bulk, it is useful to identify which
exact rules were not able to be applied. This was functionality that
manila had but lost during the update_access work.
* State transitions must still be protected with locks
* IPv6 is being enabled across OpenStack, manila needs to support
exporting shares with IPv6 and access control to IPv6 based clients
* # LINK: https://review.openstack.org/#/c/312321/
* # LINK: https://review.openstack.org/#/c/362786/
* # LINK: https://review.openstack.org/#/c/328932/
* No significant requirement seems to exist for access groups
Action Items:
(gouthamr): Add a spec for the access rules work
(ganso): Will update the devref regarding update_access being a driver
required feature.
(bswartz): Will start an ML discussion regarding IPv6 support across all
vendor drivers.
(vponomaryov): Will add scenario tests around enforcing that newly
created shares are not accessible
============================================================================
Working Session 3: Tempest Direction and ways ahead for Manila tempest tests
============================================================================
Thursday 28th Oct / 11:50 - 12:30 / CCIB - Centre de Convencions
Internacional de Barcelona - P1 - Room 134
Etherpad: https://etherpad.openstack.org/p/ocata-manila-tempest-direction
Video: https://www.youtube.com/watch?v=A5_6b369ACY
Gist:
* Manila was the first project to use and implement the tempest plugin
interface
* The fact that the tests are in-tree is creating issues for future
direction in tempest
* TC Proposal to split projects and tempest tests will be re-proposed
for Pike: https://review.openstack.org/#/c/369749/
Discussion Items:
* Cons of the tempest tests being in-tree
* Tempest tests can't be 'branchless'
* Developers change tests as and when new code lands, effectively no
longer testing backwards-compatibility, even where it is promised and
makes sense
* Manila needs to be installed (bringing in its bulky requirements)
even when only manila_tempest_tests is desired
* Pros of tempest tests being in-tree
* Tests and code can land at the same time or in close consequence
ensuring feature quality, i.e, ease and sanity of development and code
review
* Bug-fixes that change API behavior need multiple changes: First
skip the relevant tests in the test project, make the change in the
project and then add the tests in the test project: Three patches
instead of one today.
* Manila's share clients registration to enable discovery of the
clients: https://review.openstack.org/#/c/334596/
* Manila is still using 'unstable' imports from tempest which make it
peg to the tempest commit in its tree
* Dynamic Credentials are not part of the stable interface tempest.lib
Action Items:
(mkoderer/dmellado): Will fix dependencies on tempest within
manila_tempest_tests. We only need to use stuff from tempest.lib
(mkoderer/dmellado): Will work on getting more requirements for
manila_tempest_tests within tempest.lib
===================
Contributors Meetup
===================
Friday 28th Oct / 14:00 - 18:30 / CCIB - Centre de Convencions
Internacional de Barcelona - P1 - Room 134
Etherpad: https://etherpad.openstack.org/p/ocata-manila-contributor-meetup
Video: https://www.youtube.com/watch?v=SP10HgUGOnI (See video
description for links to specific discussions)
Discussion Items:
* Exporting shares with multiple protocols:
* Many vendor drivers, including first party can export shares with
NFS and CIFS
* CEPHFS can support NFS and CEPHFS
* The API and driver interactions may be complex to be standardized
across all available vendor
* Almost all vendor driver developers in the room said they had the
capability to support more than one protocol combination: ex: NFS/CIFS
or CEPHFS/NFS (in development)
* There's an open spec: https://review.openstack.org/#/c/329392/
* We may introduce this if we have the ability to preserve API
breaking backwards compatibility, and provide for a common
implementation across different drivers in terms of management of
exports and ACLs
* Vendor drivers *cannot* implement this such that users' shares
behave differently between vendors
* Discrepancy in share protocols and access types mapping: Shares
exported with NFS must only support IP based rules and not user based
rules - Our API is currently very inconsistent because of some drivers
supporting IP rules for CIFS shares or user rules for NFS shares
* Shares exported with nfs must only support IP rules
* Shares exported with cifs/smb must only support user rules
* Scenario tests must be run on third party CIs
* Scenario testing infrastructure is WIP for supporting drivers
besides the generic driver
* Scenario tests will be run with the API tests in the same jobs for
upstream drivers:
* New set of scenario tests are proposed:
https://review.openstack.org/#/c/374731/
* The goal is to define a broad set of scenarios that test behavior
that is expected across all backends
* Alternate snapshot semantics
* We are removing the overload on snapshot_support to mean two
things: snapshot_support (can take snapshots),
create_share_from_snapshot_support (can create a new share from snapshot
of given share)
* create_share_from_snapshot_support will be added to existing share
types (db migration) and all drivers' capabilities (via detection of
interface methods)
* snapshot_support will not be a required extra-spec anymore
* Not specifying snapshot_support when creating the share type would
mean a "don't care" behavior with respect to picking a backend, but
provides a meaningful behavior to the tenant and administrator. See
Spec: https://review.openstack.org/#/c/391049/
* Specs deadlines and process
* https://review.openstack.org/#/c/374883/ (now merged) details what
the process is for proposing features in manila
* Driver features do not require specs
* High priority specs will be added to the specs repo categorized by
release
* A merged spec requires code review attention across the community
* Generic Driver Enhancements
* a new lightweight test image is proposed:
https://review.openstack.org/#/c/392307
* This will be tested on our gate for the generic driver storage
virtual machines
* We're hoping this lightweight service image solves scalability
concerns in the gate
* Distro-specific logic in the generic driver would be consolidated
and refactored
* Experimental APIs
* Driver support for experimental features isn't catching up, because
product managers (allegedly) feel like experimental features are in too
much of a churn to devote development effort on.
* This is not the meaning of an experimental feature. Our intention
is to provide user sanity wrt to the API
* We want feedback from actual users who don't use manila from
gerrit; which is why APIs are Experimental
* Adoption across drivers for features with 'experimental' APIs can
only be driven by vendors themselves, the community can only make the
feature better and work towards generalizing the design as much.
* Container Driver
* Currently supports only CIFS because Ganesha has issues with access
control
* Ganesha has now been fixed (v2.4), container driver needs support
NFS via Ganesha now
* Improving Ganesha in manila
* Ganesha's bug fix for access control merged recently
* rraja will improve ganesha library within manila
* VMT/ Security / Vulnerability Managed tag for manila
* We had our first security bug
* We need a security focused sub team that would help manila achieve
the vulnerability-managed tag
* Tom Barron will lead this effort
* Share Migration
* nondisruptive parameter does not default to True (along the lines
of preserve-metadata or writable)
* preserve-snapshots parameter will be added in ocata
* Changing protocols via migration
* access rules need to be cleared
* This will be more appropriate as a "share modify" operation,
rather than be allowed via the share migration API
* Spec for Ocata improvements: https://review.openstack.org/#/c/392291/
Action Items:
(cknight/bswartz): Start an ML discussion regarding the discrepancy in
share protocols and access types mapping. Make changes to the allow
access API to disallow the discrepancies.
(volunteer/s): Document what common capabilities support being reported
as lists and which cannot be.
(bswartz): Send ML post about deadlines for Ocata
(markstur): Determine if IBM/GPFS driver can make use of the improved
ganesha library within manila
Discussion items we didn't get to:
* ensure share
* manage API requirements
* manila-ui displaying only enabled share protocols
* app-catalog use case
* share replica quotas
More information about the OpenStack-dev
mailing list