[openstack-dev] [manila] Barcelona Design Summit summary

Ben Swartzlander ben at swartzlander.org
Thu Nov 3 14:38:43 UTC 2016

Thanks to gouthamr for doing these writeups and for recording!

We had a great turn out at the manila Fishbowl and working sessions. 
Important notes and Action Items are below:

Fishbowl 1: Race Conditions
Thursday 27th Oct / 11:00 - 11:40 / AC Hotel -Salon Barcelona - P1
Etherpad: https://etherpad.openstack.org/p/ocata-manila-race-conditions
Video: https://www.youtube.com/watch?v=__P7zQobAQw

* We've some race conditions that have worsened over time:
   * Deleting a share while snapshotting the share
   * Two simultaneous delete-share calls
   * Two simultaneous create-snapshot calls
* Though the end result of the race conditions is not terrible, we can 
leave resources in untenable states, requiring administrative cleanup in 
the worst scenario
* Any type of resource interaction must be protected in the database 
with a test-and-set using the appropriate status fields
* Any test-and-set must be protected with a lock
* Locks must not be held over long running tasks: i.e, RPC Casts, driver 
invocations etc.
* We need more granular state transitions: micro/transitional states 
must be added per resource and judiciously used for state locking
* Ex: Shares need a 'snapshotting' state
* Ex: Share servers need states to signify setup phases, a la nova 
compute instances

Discussion Item:
* Locks in the manila-api service (or specifically, extending usage of 
locks across all manila services)
* Desirable because:
   * Adding test-and-set logic at the database layer may render code 
unmaintainable complicated as opposed to using locking abstractions 
(oslo.concurrency / tooz)
   * Cinder has evolved an elegant test-and-set solution but we may not 
be able to benefit from that implementation because of the lack of being 
able to do multi-table updates and because the code references OVO which 
manila doesn't yet support.
* Un-desirable because:
   * Most distributors (RedHat/Suse/Kubernetes-based/MOS) want to run 
more than one API service in active-active H/A.
   * If a true distributed locking mechanism isn't used/supported, the 
current file-locks would be useless in the above scenario.
   * Running file locks on shared file systems is a possibility, but 
applies configuration/set-up burden
   * Having all the locks on the share service would allow scale out of 
the API service and the share manager is really the place where things 
are going wrong
   * With a limited form of test-and-set, atomic state changes can still 
be achieved for the API service.

* File locks will not help

Action Items:
(bswartz): Will propose a spec for the locking strategy
(volunteers): Act on the spec ^ and help add more transitional states 
and locks (or test-and-set if any)
(gouthamr): state transition diagrams for shares/share 
instances/replicas, access rules / instance access rules
(volunteers): Review ^ and add state transition diagrams for 
snapshots/snapshot instances, share servers
(mkoderer): will help with determining race conditions within 
manila-share with tests

Fishbowl 2: Data Service / Jobs Table
Thursday 27th Oct / 11:50 - 12:30 / AC Hotel - Salon Barcelona - P1
Video: https://www.youtube.com/watch?v=Sajy2Qjqbmk

* Currently, a synchronous RPC call is made from the API to the 
share-manager/data-service that's performing a migration to get the 
progress of a migration
* We need a way to record progress of long running tasks: migration, 
backup, data copy etc.
* We need to introduce a jobs table so that the respective service 
performing the long running task can write to the database and the API 
relies on the database

Discussion Items:
* There was a suggestion to extend the jobs table to all tasks on the 
share: snapshotting, creating share from snapshot, extending, shrinking, 
* We agreed not to do this because the table can easily go out of 
control; and there isn't a solid use case to register all jobs. Maybe 
asynchronous user messages is a better answer to this feature request
* "restartable" jobs would benefit from the jobs table
* service heartbeats could be used to react to services dying while 
running long running jobs
* When running the data service in active-active mode, a service going 
down can pass on its jobs to the other data service

Action Items:
(ganso): Will determine the structure of the jobs table model in his spec
(ganso): Will determine the benefit of the data service reacting to 
additions in the database rather than acting upon RPC requests

Working Sessions 1: High Availability
Thursday 27th Oct / 14:40 - 15:20 / CCIB - Centre de Convencions 
Internacional de Barcelona - P1 - Room 130
Etherpad: https://etherpad.openstack.org/p/ocata-manila-high-availability
Video: https://www.youtube.com/watch?v=xFk8ShK6qxU

* We have a patch to introduce the tooz abstraction library to manila, 
it currently creates a tooz coordinator for the manila-share service and 
demonstrates replacing oslo concurrency locks to tooz locks: 
* The heartbeat seems to have issues, needs debugging
* The owner/committer have tested this patch with both FileDriver and 
Kazoo/Zookeeper as tooz backends. We need to test other tooz backends
* Distributors do not package dependencies for all tooz backends
* We plan to introduce leader election via tooz. We plan to use this in 
cleanups, designate the service that performs polling (migration, 
replication of shares and snapshots, share server cleanup)
* Code needs to be written to integrate the use of tooz/dlm via the 
manila devstack plugin so it can be gate tested

Action Items:
(gouthamr): Will document how to set up tooz with 2 or more share services
(bswartz): Will set up a sub group of contributors to code/test H/A 
solutions in this release

Working Session 2: Access Rules
Thursday 28th Oct / 11:00 - 11:40 / CCIB - Centre de Convencions 
Internacional de Barcelona - P1 - Room 134
Etherpad: https://etherpad.openstack.org/p/ocata-manila-high-availability
Video: https://www.youtube.com/watch?v=62EllNOZ3aw

* We have had a number of bugs with our access rules implementation 
since two important design changes were made in manila: introduction of 
share instances and unifying allow_access/deny_access driver interfaces
* The most significant of the bugs is the presence of race conditions 
that we have tried fixing multiple times but haven't ironed out.
attempts to rectify the issues with reintroducing per share instance per 
access rule statuses and adding transitional statuses and performing 
pessimistic locking around state transitions. Database models and API 
needs cleanup wrt to the way they're currently accessed.

Discussion Items:
* While applying access rules in bulk, it is useful to identify which 
exact rules were not able to be applied. This was functionality that 
manila had but lost during the update_access work.
* State transitions must still be protected with locks
* IPv6 is being enabled across OpenStack, manila needs to support 
exporting shares with IPv6 and access control to IPv6 based clients
   * # LINK: https://review.openstack.org/#/c/312321/
   * # LINK: https://review.openstack.org/#/c/362786/
   * # LINK: https://review.openstack.org/#/c/328932/
* No significant requirement seems to exist for access groups

Action Items:
(gouthamr): Add a spec for the access rules work
(ganso): Will update the devref regarding update_access being a driver 
required feature.
(bswartz): Will start an ML discussion regarding IPv6 support across all 
vendor drivers.
(vponomaryov): Will add scenario tests around enforcing that newly 
created shares are not accessible

Working Session 3: Tempest Direction and ways ahead for Manila tempest tests
Thursday 28th Oct / 11:50 - 12:30 / CCIB - Centre de Convencions 
Internacional de Barcelona - P1 - Room 134
Etherpad: https://etherpad.openstack.org/p/ocata-manila-tempest-direction
Video: https://www.youtube.com/watch?v=A5_6b369ACY

* Manila was the first project to use and implement the tempest plugin 
* The fact that the tests are in-tree is creating issues for future 
direction in tempest
* TC Proposal to split projects and tempest tests will be re-proposed 
for Pike: https://review.openstack.org/#/c/369749/

Discussion Items:
* Cons of the tempest tests being in-tree
   * Tempest tests can't be 'branchless'
   * Developers change tests as and when new code lands, effectively no 
longer testing backwards-compatibility, even where it is promised and 
makes sense
   * Manila needs to be installed (bringing in its bulky requirements) 
even when only manila_tempest_tests is desired
* Pros of tempest tests being in-tree
   * Tests and code can land at the same time or in close consequence 
ensuring feature quality, i.e, ease and sanity of development and code 
   * Bug-fixes that change API behavior need multiple changes: First 
skip the relevant tests in the test project, make the change in the 
project and then add the tests in the test project: Three patches 
instead of one today.
* Manila's share clients registration to enable discovery of the 
clients: https://review.openstack.org/#/c/334596/
* Manila is still using 'unstable' imports from tempest which make it 
peg to the tempest commit in its tree
* Dynamic Credentials are not part of the stable interface tempest.lib

Action Items:
(mkoderer/dmellado): Will fix dependencies on tempest within 
manila_tempest_tests. We only need to use stuff from tempest.lib
(mkoderer/dmellado): Will work on getting more requirements for 
manila_tempest_tests within tempest.lib

Contributors Meetup
Friday 28th Oct / 14:00 - 18:30 / CCIB - Centre de Convencions 
Internacional de Barcelona - P1 - Room 134
Etherpad: https://etherpad.openstack.org/p/ocata-manila-contributor-meetup
Video: https://www.youtube.com/watch?v=SP10HgUGOnI   (See video 
description for links to specific discussions)

Discussion Items:
* Exporting shares with multiple protocols:
   * Many vendor drivers, including first party can export shares with 
   * CEPHFS can support NFS and CEPHFS
   * The API and driver interactions may be complex to be standardized 
across all available vendor
   * Almost all vendor driver developers in the room said they had the 
capability to support more than one protocol combination: ex: NFS/CIFS 
or CEPHFS/NFS (in development)
   * There's an open spec: https://review.openstack.org/#/c/329392/
   * We may introduce this if we have the ability to preserve API 
breaking backwards compatibility, and provide for a common 
implementation across different drivers in terms of management of 
exports and ACLs
   * Vendor drivers *cannot* implement this such that users' shares 
behave differently between vendors
* Discrepancy in share protocols and access types mapping: Shares 
exported with NFS must only support IP based rules and not user based 
rules - Our API is currently very inconsistent because of some drivers 
supporting IP rules for CIFS shares or user rules for NFS shares
   * Shares exported with nfs must only support IP rules
   * Shares exported with cifs/smb must only support user rules
* Scenario tests must be run on third party CIs
   * Scenario testing infrastructure is WIP for supporting drivers 
besides the generic driver
   * Scenario tests will be run with the API tests in the same jobs for 
upstream drivers:
   * New set of scenario tests are proposed: 
   * The goal is to define a broad set of scenarios that test behavior 
that is expected across all backends
* Alternate snapshot semantics
   * We are removing the overload on snapshot_support to mean two 
things: snapshot_support (can take snapshots), 
create_share_from_snapshot_support (can create a new share from snapshot 
of given share)
   * create_share_from_snapshot_support will be added to existing share 
types (db migration) and all drivers' capabilities (via detection of 
interface methods)
   * snapshot_support will not be a required extra-spec anymore
   * Not specifying snapshot_support when creating the share type would 
mean a "don't care" behavior with respect to picking a backend, but 
provides a meaningful behavior to the tenant and administrator. See 
Spec: https://review.openstack.org/#/c/391049/
* Specs deadlines and process
   * https://review.openstack.org/#/c/374883/ (now merged) details what 
the process is for proposing features in manila
   * Driver features do not require specs
   * High priority specs will be added to the specs repo categorized by 
   * A merged spec requires code review attention across the community
* Generic Driver Enhancements
   * a new lightweight test image is proposed: 
   * This will be tested on our gate for the generic driver storage 
virtual machines
   * We're hoping this lightweight service image solves scalability 
concerns in the gate
   * Distro-specific logic in the generic driver would be consolidated 
and refactored
* Experimental APIs
   * Driver support for experimental features isn't catching up, because 
product managers (allegedly) feel like experimental features are in too 
much of a churn to devote development effort on.
   * This is not the meaning of an experimental feature. Our intention 
is to provide user sanity wrt to the API
   * We want feedback from actual users who don't use manila from 
gerrit; which is why APIs are Experimental
   * Adoption across drivers for features with 'experimental' APIs can 
only be driven by vendors themselves, the community can only make the 
feature better and work towards generalizing the design as much.
* Container Driver
   * Currently supports only CIFS because Ganesha has issues with access 
   * Ganesha has now been fixed (v2.4), container driver needs support 
NFS via Ganesha now
* Improving Ganesha in manila
   * Ganesha's bug fix for access control merged recently
   * rraja will improve ganesha library within manila
* VMT/ Security / Vulnerability Managed tag for manila
   * We had our first security bug
   * We need a security focused sub team that would help manila achieve 
the vulnerability-managed tag
   * Tom Barron will lead this effort
* Share Migration
   * nondisruptive parameter does not default to True (along the lines 
of preserve-metadata or writable)
   * preserve-snapshots parameter will be added in ocata
   * Changing protocols via migration
     * access rules need to be cleared
     * This will be more appropriate as a "share modify" operation, 
rather than be allowed via the share migration API
   * Spec for Ocata improvements: https://review.openstack.org/#/c/392291/

Action Items:
(cknight/bswartz): Start an ML discussion regarding the discrepancy in 
share protocols and access types mapping. Make changes to the allow 
access API to disallow the discrepancies.
(volunteer/s): Document what common capabilities support being reported 
as lists and which cannot be.
(bswartz): Send ML post about deadlines for Ocata
(markstur): Determine if IBM/GPFS driver can make use of the improved 
ganesha library within manila

Discussion items we didn't get to:
* ensure share
* manage API requirements
* manila-ui displaying only enabled share protocols
* app-catalog use case
* share replica quotas

More information about the OpenStack-dev mailing list