[openstack-dev] [Manila] Design summit notes

Ben Swartzlander ben at swartzlander.org
Tue Nov 10 21:25:55 UTC 2015


I wasn't going to write a wrap-up email this time around since so many 
people were able to attend in person, but enough people missed it that I 
changed my mind and decided to write down my own impressions of the 
sessions.

Wednesday working session: Migration Improvements
-------------------------------------------------
During this session we covered the status of the migration feature so 
far (it's merged but experimental) and the existing gaps:
1) Will fail on shares on most backends with share servers
2) Controller node in the data path -- needs data copy service
3) No implementations of optimized migration yet
4) Confusion around task_state vs state
5) Need to disable most existing operations during a migration
6) Possibly need to change driver interface for getting mount info

Basically there is a lot of work left to be done on migration but we're 
happy with the direction it's going. If we can address the gaps we could 
make the APIs supported in Mitaka. We're eager to get to building the 
valuable APIs on top of migration, but we can't do that until migration 
itself is solid.

I also suggested that migration might benefit from an API change to 
allow a 2-phase migration which would allow the user (the admin in this 
case) to control when the final cutover happens instead of letting it 
happen by surprise.

Wednesday working session: Access Allow/Deny Driver Interface
-------------------------------------------------------------
During this session I proposed a new driver interface for 
allowing/denying access to shares which is a single "sync access" API 
that the manager would call and pass all of the rules to. The main 
benefits of the change would be:
1) More reliable cleanup of errors
2) Support for atomically updating multiple rules
3) Simpler/more efficient implementation on some backends

Most vendors agreed that the new interface would be superior, and 
generally speaking most vendors said that the new interface would be 
simpler and more efficient than the existing one.

There were some who were unsure and one who specifically said an access 
sync would be inefficient compared to the current allow/deny semantics. 
We need to see if we can provide enough information in the new interface 
to allow them to be more efficient (such as providing the new rules AND 
the diff against the old rules).

It was also pointed out that error reporting would be different using 
this new interface, because errors applying rules couldn't be associated 
with the specific rule that caused them. We need a solution to that problem.

Thursday fishbowl session: Share Replication
--------------------------------------------
There was a demo of the POC code from NetApp and general education on 
the new design. Since the idea was new to some and the implementation 
was new to everyone, there was not a lot of feedback.

We did discuss a few issues, such as whether multiple shares should be 
allowed in a single AZ.

We agreed that this replication feature will be exposed to the same kind 
of race conditions that exist in active/active HA, so there is 
additional pressure to solve the distributed locking problem. 
Fortunately the community seems to be converging on a solution to that 
problem -- the tooz library.

We agreed that support for replication in a first party driver is 
essential for the feature to be accepted -- otherwise developers who 
don't have proprietary storage systems would be unable to develop/test 
on the feature.

Thursday fishbowl session: Alternative Snapshot Semantics
---------------------------------------------------------
During this session I proposed 2 new things you can do with snapshots:
1) Revert a share to a snapshot
2) Exporting snapshots directly as readonly shares

For reverting snapshots, we agreed that the operation should preserve 
all existing snapshots. If a backend is unable to revert without 
deleting snapshots, it should not advertise the capability.

For mounting snapshots, it was pointed out that we need to define the 
access rules for the share. I proposed simply inheriting the rules for 
the parent share with rw rules squashed to ro. That approach has 
downsides though because it links to the access on the snapshot and the 
share (which may no be desired) and also forces us to pass a list of 
snapshots into the access calls so the driver can update snapshot access 
when updating share access.

Sage proposed creating a new concept of a readonly share and simply 
overloading the existing create-share-from-snapshot logic with a 
-readonly flag which gives us the semantics we want with much 
complexity. The downside to that approach is that we will have to add 
code to handle readonly shares.

There was an additional proposal to allow the create/delete snapshot 
calls without any other snapshot-related calls because some backends 
have in-band snapshot semantics. This is probably unnecessary because 
every backend that has snapshots is likely to support at least one of 
the proposed semantics so we don't need another mode.

Thursday working session: Export Location Metadata
--------------------------------------------------
In this session we discussed the idea Jason proposed back in the winter 
midcycle meetup to allow drivers to tag export locations with various 
types of metadata which is meaningful to clients of Manila. There were 
several proposed use cases discussed.

The main thing we agreed to was that the metadata itself seems like a 
good thing as long as the metadata keys and values are standardized. We 
didn't like the possibility of vendor-defined or admin-defined metadata 
appearing in the export location API.

One use case discussed what the idea of preferred/non-preferred export 
locations. This use case makes sense and nobody was against it.

Another use case discussed was "readonly" export locations which might 
allow certain drivers to use different export locations for writing and 
reading. There was debate about how much sense this made.

The third discussed use case was Jason's original suggestion: locality 
information of different export locations to enable clients to determine 
which export location is closer to them. We were generally opposed to 
this idea because it's not clear how locality information would be 
expressed. We didn't like admin-defined values, and the only standard 
thing we have is AZs, which are already exposed through a different 
mechanism.

Some time was spent in this session discussing the forthcoming Ceph 
driver and its special requirements. We discussed the possbility of 
dual-protocol access to shares (Ceph+NFS in this case). Dual protocol 
access was previously rejected (NFS+CIFS) due to concerns about 
interoperability. We still need to decide if we want to allow Ceph+NFS 
as a special case based on the idea that Ceph shares would always 
support NFS and there's unlikely to ever be a second Ceph driver. If we 
allow this, then the share protocol would make sense to expose as a 
metadata value.

Lastly we discussed the Ceph key-sharing requirement where peeople want 
to use Manila to discover the Ceph secret. That would require adding 
some new metadata, but on the access rules, not on the export locations.

Thursday working session: Interactions Between New Features
--------------------------------------------------
In this session we considered the possible interactions between share 
migration, consistency groups, and share replication (the 3 new 
experimental features).

We quickly concluded that in the current code, bad things are likely to 
happen if you use 2 of these features at the same time, so as a top 
priority we must prevent that and return a sensible error message 
instead of allowing undefined behavior.

We spent the rest of the session discussing how the feature should 
interact and concluded that enabling the behaviors we want require 
significant new code for all of the pairings.

Migration+CGs: requires the concept of CG instance or some way of 
tracking which share instances make up the original CG and the migrated 
CG. Alternatively, requires the ability to disband and reconstruct CGs. 
A blueprint with and actual design is needed.

Migration+Replication: can probably be implemented by simply migrating 
the primary (active) replica to the destination and re-replicating from 
there. This requires significant new code in the migration paths though 
because they'll need to rebuild the replication topology on the 
destination side. Also, for safety the migration should not complete 
until the destination side is fully replicated to avoid the chance of a 
failure immediately after migration causing a loss of access. There may 
be opportunities for optimized versions of the above, especially when 
cross-AZ bandwidth is limited and the migration is within an AZ. More 
though is needed, and a blueprint should spell out the design.

Replication+CGs: it doesn't make a lot of sense to replicate individual 
shares from a CG -- more likely users will want to replicate the whole 
CG. This is an assumption though and we have no supporting data. Either 
way, replication at the granularity of a CG would require more logic to 
schedule the replicas CGs before scheduling the share replicas. This is 
likely to be significant new code and need a blueprint.

Replication+CGs+Migration: this was proposed as a joke, but it's a 
serious concern. The above designs should consider what happens if we 
have a replicated CG and we wish to migrate it. If the above designs are 
done carefully we should get correct behavior here for free.

Friday contributor meetup
-------------------------
On Friday we quickly reviewed the above topics for those that missed 
earlier sessions, then we launched into the laundry list of topics that 
we weren't able to fit into design sessions/working sessions.

QoS: Zhongjun proposed a QoS implementation similar to Cinder's. After a 
brief discussion, there was not agreement that Cinder's model was a good 
one to follow, as QoS was introduced to Cinder before some other later 
enhancements such as standardized extra specs. We're inclined to use 
standardized extra specs instead of QoS policies in Manila. We still 
need to agree on which QoS-related extra specs we should standardize. 
There are 2 criteria for standardizing on QoS-related extra specs (1) 
they should mean the same thing across vendors e.g. 
max-bytes-per-second, and (2) they should be something that's widely 
implemented. I expect we'll see lots of vendor-specific QoS-related 
extra specs, and we need to make sure it's possible to mix vendors in 
the same share-type and assign a QoS policy that's equivalent across both.

Minimum required features: The main open question about minimum features 
was access control types for each protocol. We agreed that for CIFS, 
user-based access control is required, and IP-based access control is 
not. Furthermore, support for read-only access rules is required.

Improving gate performance/stability: We briefly discussed the gate 
issues that plagued us during Liberty and our plan to address them, 
which is to add more first party drivers that don't depend on 
quickly-evolving projects. The big offender during Liberty was Neutron, 
although Nova and Cinder have both bitten us in the past. To be clear, 
the existing Generic driver is not going away, and will still be QA'd, 
but we would rather not make it the gating driver.

Manila HA: We briefly discussed tooz and agreed that we will use it in 
Manila to address race conditions that exist in active-active HA 
scenarios, as well as race conditions introduce by the replication code.

Multiple share servers: There was some confusion about what a share 
server is. Some assumed that it would required multiple Manila "share 
servers" to implement a highly available shared storage system. In 
reality, the "share server" in Manila is a collection of resources which 
can include an arbitrary amount of underlying physical or virtual 
resources -- enough to implement a highly available solution with a 
single share server.

Rolling Upgrades: We punted on this again. There was a discussion about 
how slow the progress in Cinder appears to be and we don't want to get 
trapped in limbo for multiple cycles. However if the work will 
unavoidably take multiple cycles then we need to know that so we can 
plan accordingly. Support for rolling upgrades is still viewed as 
desirable, but the team is worried about the apparent implementation cost.

More mount automation: We covered what was done during Liberty (2 
approaches) and discussed briefly the new approach Sage suggested. The 
nova-metadata approach that Clinton proposed in Vancouver which was not 
accepted by the Nova team will probably get a warmer reception based on 
the additional integration Sage is proposing. We should re-propose it 
and continue with our existing plans. There was also broad support for 
Sage's NFS-over-vsock proposal and Nova share-attach semantics.

Key-based authentication: We went into more detail on John's Ceph-driver 
requirements, and why he thinks it makes sense to use communicate 
secrets from the Ceph backend to the end user through Manila. We didn't 
really reach a decision, but nobody was strongly against the change John 
proposed. I think we're still interested in alternative proposals if 
anyone has one.

Make all deleted columns booleans: We discussed soft deletes and the 
fact that they're not implemented consistently in Manila today. We also 
questioned the value of soft deletes and reasoning for why we use them. 
Some believed there was a performance benefit, and others suggested that 
it had more to do with preservation of history for auditing and 
troubleshooting. Depending on the real motivation, this proposal may 
need to be scrapped or modified.

Replication 2.0: We discussed the remaining work for the share 
replication feature before it could be accepted. There were 2 main 
issues: support for share servers, and a first party driver 
implementation of the feature. There was some dispute about the value of 
a first party implementation but the strongest argument in favor was the 
need for community members who didn't have access to proprietary 
hardware to be able to maintain the replication code.

Functional tests for network plugins: We discussed the fact that the 
existing network plugins aren't covered by tests in the gate. For 
nova-network, we decided that was acceptable since that functionality is 
deprecated. For the neutron network plugin and the standalone network 
plugin, we need additional test jobs that exercise them.

Capability lists: The addition of standardized extra specs pointed out a 
gap, which is that with some extra specs a backend should be able to 
advertise both the positive and negative capability. In the past vendors 
have achieved that with gross pairs of vendor-specific extra specs (e.g. 
netapp_dedupe and netapp_no_dedupe). We agreed it would make more sense 
for the backend to simply advertise dedupe=[False,True]. Changes to the 
filter scheduler are needed to allow capability lists, so Clinton 
volunteered to implement those changes.

Client microversions: There wasn't much to discuss on the topic of 
microversions. The client patches are in progress. Cinder seems to be 
headed down a similar path. We are happy with the feature so far.

Fix delete operations: We discussed what the force-delete APIs should do 
and what they shouldn't do. It was agreed that force-delete means 
"remove it from the manila DB no matter what, and make a best effort 
attempt to delete it off the backend". Some changes are needed to 
support that semantic. We also discussed the common problem of "stuck" 
shares and how to clean them up. We agreed that admins should typically 
use the reset-state API and retry an ordinary delete. The force-delete 
approach can leave garbage behind on the backend. The reset-state and 
retry-delete approach should never leave garbage behind, so it's safer 
to use.

Remove all extensions: We discussed the current effort to move 
extensions into core. This work is mostly done and not much discussion 
was needed.

Removing task state: We agreed to remove the task-state column 
introduced by migration and use ordinary states for migration.

Interaction with Nova attach file system API: We went into more detail 
on Sage's Nova file-system-attach proposal and concluded that it should 
"just work" without changes from Manila.


-Ben Swartzlander




More information about the OpenStack-dev mailing list