[openstack-dev] [Manila] Design summit notes

Silvan Kaiser silvan at quobyte.com
Wed Nov 11 06:25:37 UTC 2015


Ben, thanks a lot for this wrap-up, much appreciated!


2015-11-10 22:25 GMT+01:00 Ben Swartzlander <ben at swartzlander.org>:

> I wasn't going to write a wrap-up email this time around since so many
> people were able to attend in person, but enough people missed it that I
> changed my mind and decided to write down my own impressions of the
> sessions.
>
> Wednesday working session: Migration Improvements
> -------------------------------------------------
> During this session we covered the status of the migration feature so far
> (it's merged but experimental) and the existing gaps:
> 1) Will fail on shares on most backends with share servers
> 2) Controller node in the data path -- needs data copy service
> 3) No implementations of optimized migration yet
> 4) Confusion around task_state vs state
> 5) Need to disable most existing operations during a migration
> 6) Possibly need to change driver interface for getting mount info
>
> Basically there is a lot of work left to be done on migration but we're
> happy with the direction it's going. If we can address the gaps we could
> make the APIs supported in Mitaka. We're eager to get to building the
> valuable APIs on top of migration, but we can't do that until migration
> itself is solid.
>
> I also suggested that migration might benefit from an API change to allow
> a 2-phase migration which would allow the user (the admin in this case) to
> control when the final cutover happens instead of letting it happen by
> surprise.
>
> Wednesday working session: Access Allow/Deny Driver Interface
> -------------------------------------------------------------
> During this session I proposed a new driver interface for allowing/denying
> access to shares which is a single "sync access" API that the manager would
> call and pass all of the rules to. The main benefits of the change would be:
> 1) More reliable cleanup of errors
> 2) Support for atomically updating multiple rules
> 3) Simpler/more efficient implementation on some backends
>
> Most vendors agreed that the new interface would be superior, and
> generally speaking most vendors said that the new interface would be
> simpler and more efficient than the existing one.
>
> There were some who were unsure and one who specifically said an access
> sync would be inefficient compared to the current allow/deny semantics. We
> need to see if we can provide enough information in the new interface to
> allow them to be more efficient (such as providing the new rules AND the
> diff against the old rules).
>
> It was also pointed out that error reporting would be different using this
> new interface, because errors applying rules couldn't be associated with
> the specific rule that caused them. We need a solution to that problem.
>
> Thursday fishbowl session: Share Replication
> --------------------------------------------
> There was a demo of the POC code from NetApp and general education on the
> new design. Since the idea was new to some and the implementation was new
> to everyone, there was not a lot of feedback.
>
> We did discuss a few issues, such as whether multiple shares should be
> allowed in a single AZ.
>
> We agreed that this replication feature will be exposed to the same kind
> of race conditions that exist in active/active HA, so there is additional
> pressure to solve the distributed locking problem. Fortunately the
> community seems to be converging on a solution to that problem -- the tooz
> library.
>
> We agreed that support for replication in a first party driver is
> essential for the feature to be accepted -- otherwise developers who don't
> have proprietary storage systems would be unable to develop/test on the
> feature.
>
> Thursday fishbowl session: Alternative Snapshot Semantics
> ---------------------------------------------------------
> During this session I proposed 2 new things you can do with snapshots:
> 1) Revert a share to a snapshot
> 2) Exporting snapshots directly as readonly shares
>
> For reverting snapshots, we agreed that the operation should preserve all
> existing snapshots. If a backend is unable to revert without deleting
> snapshots, it should not advertise the capability.
>
> For mounting snapshots, it was pointed out that we need to define the
> access rules for the share. I proposed simply inheriting the rules for the
> parent share with rw rules squashed to ro. That approach has downsides
> though because it links to the access on the snapshot and the share (which
> may no be desired) and also forces us to pass a list of snapshots into the
> access calls so the driver can update snapshot access when updating share
> access.
>
> Sage proposed creating a new concept of a readonly share and simply
> overloading the existing create-share-from-snapshot logic with a -readonly
> flag which gives us the semantics we want with much complexity. The
> downside to that approach is that we will have to add code to handle
> readonly shares.
>
> There was an additional proposal to allow the create/delete snapshot calls
> without any other snapshot-related calls because some backends have in-band
> snapshot semantics. This is probably unnecessary because every backend that
> has snapshots is likely to support at least one of the proposed semantics
> so we don't need another mode.
>
> Thursday working session: Export Location Metadata
> --------------------------------------------------
> In this session we discussed the idea Jason proposed back in the winter
> midcycle meetup to allow drivers to tag export locations with various types
> of metadata which is meaningful to clients of Manila. There were several
> proposed use cases discussed.
>
> The main thing we agreed to was that the metadata itself seems like a good
> thing as long as the metadata keys and values are standardized. We didn't
> like the possibility of vendor-defined or admin-defined metadata appearing
> in the export location API.
>
> One use case discussed what the idea of preferred/non-preferred export
> locations. This use case makes sense and nobody was against it.
>
> Another use case discussed was "readonly" export locations which might
> allow certain drivers to use different export locations for writing and
> reading. There was debate about how much sense this made.
>
> The third discussed use case was Jason's original suggestion: locality
> information of different export locations to enable clients to determine
> which export location is closer to them. We were generally opposed to this
> idea because it's not clear how locality information would be expressed. We
> didn't like admin-defined values, and the only standard thing we have is
> AZs, which are already exposed through a different mechanism.
>
> Some time was spent in this session discussing the forthcoming Ceph driver
> and its special requirements. We discussed the possbility of dual-protocol
> access to shares (Ceph+NFS in this case). Dual protocol access was
> previously rejected (NFS+CIFS) due to concerns about interoperability. We
> still need to decide if we want to allow Ceph+NFS as a special case based
> on the idea that Ceph shares would always support NFS and there's unlikely
> to ever be a second Ceph driver. If we allow this, then the share protocol
> would make sense to expose as a metadata value.
>
> Lastly we discussed the Ceph key-sharing requirement where peeople want to
> use Manila to discover the Ceph secret. That would require adding some new
> metadata, but on the access rules, not on the export locations.
>
> Thursday working session: Interactions Between New Features
> --------------------------------------------------
> In this session we considered the possible interactions between share
> migration, consistency groups, and share replication (the 3 new
> experimental features).
>
> We quickly concluded that in the current code, bad things are likely to
> happen if you use 2 of these features at the same time, so as a top
> priority we must prevent that and return a sensible error message instead
> of allowing undefined behavior.
>
> We spent the rest of the session discussing how the feature should
> interact and concluded that enabling the behaviors we want require
> significant new code for all of the pairings.
>
> Migration+CGs: requires the concept of CG instance or some way of tracking
> which share instances make up the original CG and the migrated CG.
> Alternatively, requires the ability to disband and reconstruct CGs. A
> blueprint with and actual design is needed.
>
> Migration+Replication: can probably be implemented by simply migrating the
> primary (active) replica to the destination and re-replicating from there.
> This requires significant new code in the migration paths though because
> they'll need to rebuild the replication topology on the destination side.
> Also, for safety the migration should not complete until the destination
> side is fully replicated to avoid the chance of a failure immediately after
> migration causing a loss of access. There may be opportunities for
> optimized versions of the above, especially when cross-AZ bandwidth is
> limited and the migration is within an AZ. More though is needed, and a
> blueprint should spell out the design.
>
> Replication+CGs: it doesn't make a lot of sense to replicate individual
> shares from a CG -- more likely users will want to replicate the whole CG.
> This is an assumption though and we have no supporting data. Either way,
> replication at the granularity of a CG would require more logic to schedule
> the replicas CGs before scheduling the share replicas. This is likely to be
> significant new code and need a blueprint.
>
> Replication+CGs+Migration: this was proposed as a joke, but it's a serious
> concern. The above designs should consider what happens if we have a
> replicated CG and we wish to migrate it. If the above designs are done
> carefully we should get correct behavior here for free.
>
> Friday contributor meetup
> -------------------------
> On Friday we quickly reviewed the above topics for those that missed
> earlier sessions, then we launched into the laundry list of topics that we
> weren't able to fit into design sessions/working sessions.
>
> QoS: Zhongjun proposed a QoS implementation similar to Cinder's. After a
> brief discussion, there was not agreement that Cinder's model was a good
> one to follow, as QoS was introduced to Cinder before some other later
> enhancements such as standardized extra specs. We're inclined to use
> standardized extra specs instead of QoS policies in Manila. We still need
> to agree on which QoS-related extra specs we should standardize. There are
> 2 criteria for standardizing on QoS-related extra specs (1) they should
> mean the same thing across vendors e.g. max-bytes-per-second, and (2) they
> should be something that's widely implemented. I expect we'll see lots of
> vendor-specific QoS-related extra specs, and we need to make sure it's
> possible to mix vendors in the same share-type and assign a QoS policy
> that's equivalent across both.
>
> Minimum required features: The main open question about minimum features
> was access control types for each protocol. We agreed that for CIFS,
> user-based access control is required, and IP-based access control is not.
> Furthermore, support for read-only access rules is required.
>
> Improving gate performance/stability: We briefly discussed the gate issues
> that plagued us during Liberty and our plan to address them, which is to
> add more first party drivers that don't depend on quickly-evolving
> projects. The big offender during Liberty was Neutron, although Nova and
> Cinder have both bitten us in the past. To be clear, the existing Generic
> driver is not going away, and will still be QA'd, but we would rather not
> make it the gating driver.
>
> Manila HA: We briefly discussed tooz and agreed that we will use it in
> Manila to address race conditions that exist in active-active HA scenarios,
> as well as race conditions introduce by the replication code.
>
> Multiple share servers: There was some confusion about what a share server
> is. Some assumed that it would required multiple Manila "share servers" to
> implement a highly available shared storage system. In reality, the "share
> server" in Manila is a collection of resources which can include an
> arbitrary amount of underlying physical or virtual resources -- enough to
> implement a highly available solution with a single share server.
>
> Rolling Upgrades: We punted on this again. There was a discussion about
> how slow the progress in Cinder appears to be and we don't want to get
> trapped in limbo for multiple cycles. However if the work will unavoidably
> take multiple cycles then we need to know that so we can plan accordingly.
> Support for rolling upgrades is still viewed as desirable, but the team is
> worried about the apparent implementation cost.
>
> More mount automation: We covered what was done during Liberty (2
> approaches) and discussed briefly the new approach Sage suggested. The
> nova-metadata approach that Clinton proposed in Vancouver which was not
> accepted by the Nova team will probably get a warmer reception based on the
> additional integration Sage is proposing. We should re-propose it and
> continue with our existing plans. There was also broad support for Sage's
> NFS-over-vsock proposal and Nova share-attach semantics.
>
> Key-based authentication: We went into more detail on John's Ceph-driver
> requirements, and why he thinks it makes sense to use communicate secrets
> from the Ceph backend to the end user through Manila. We didn't really
> reach a decision, but nobody was strongly against the change John proposed.
> I think we're still interested in alternative proposals if anyone has one.
>
> Make all deleted columns booleans: We discussed soft deletes and the fact
> that they're not implemented consistently in Manila today. We also
> questioned the value of soft deletes and reasoning for why we use them.
> Some believed there was a performance benefit, and others suggested that it
> had more to do with preservation of history for auditing and
> troubleshooting. Depending on the real motivation, this proposal may need
> to be scrapped or modified.
>
> Replication 2.0: We discussed the remaining work for the share replication
> feature before it could be accepted. There were 2 main issues: support for
> share servers, and a first party driver implementation of the feature.
> There was some dispute about the value of a first party implementation but
> the strongest argument in favor was the need for community members who
> didn't have access to proprietary hardware to be able to maintain the
> replication code.
>
> Functional tests for network plugins: We discussed the fact that the
> existing network plugins aren't covered by tests in the gate. For
> nova-network, we decided that was acceptable since that functionality is
> deprecated. For the neutron network plugin and the standalone network
> plugin, we need additional test jobs that exercise them.
>
> Capability lists: The addition of standardized extra specs pointed out a
> gap, which is that with some extra specs a backend should be able to
> advertise both the positive and negative capability. In the past vendors
> have achieved that with gross pairs of vendor-specific extra specs (e.g.
> netapp_dedupe and netapp_no_dedupe). We agreed it would make more sense for
> the backend to simply advertise dedupe=[False,True]. Changes to the filter
> scheduler are needed to allow capability lists, so Clinton volunteered to
> implement those changes.
>
> Client microversions: There wasn't much to discuss on the topic of
> microversions. The client patches are in progress. Cinder seems to be
> headed down a similar path. We are happy with the feature so far.
>
> Fix delete operations: We discussed what the force-delete APIs should do
> and what they shouldn't do. It was agreed that force-delete means "remove
> it from the manila DB no matter what, and make a best effort attempt to
> delete it off the backend". Some changes are needed to support that
> semantic. We also discussed the common problem of "stuck" shares and how to
> clean them up. We agreed that admins should typically use the reset-state
> API and retry an ordinary delete. The force-delete approach can leave
> garbage behind on the backend. The reset-state and retry-delete approach
> should never leave garbage behind, so it's safer to use.
>
> Remove all extensions: We discussed the current effort to move extensions
> into core. This work is mostly done and not much discussion was needed.
>
> Removing task state: We agreed to remove the task-state column introduced
> by migration and use ordinary states for migration.
>
> Interaction with Nova attach file system API: We went into more detail on
> Sage's Nova file-system-attach proposal and concluded that it should "just
> work" without changes from Manila.
>
>
> -Ben Swartzlander
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Dr. Silvan Kaiser
Quobyte GmbH
Hardenbergplatz 2, 10623 Berlin - Germany
+49-30-814 591 800 - www.quobyte.com<http://www.quobyte.com/>
Amtsgericht Berlin-Charlottenburg, HRB 149012B
Management board: Dr. Felix Hupfeld, Dr. Björn Kolbeck, Dr. Jan Stender

-- 

--
*Quobyte* GmbH
Hardenbergplatz 2 - 10623 Berlin - Germany
+49-30-814 591 800 - www.quobyte.com
Amtsgericht Berlin-Charlottenburg, HRB 149012B
management board: Dr. Felix Hupfeld, Dr. Björn Kolbeck, Dr. Jan Stender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151111/ac20baa1/attachment.html>


More information about the OpenStack-dev mailing list