<div dir="ltr"><div>Ben, thanks a lot for this wrap-up, much appreciated!</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-11-10 22:25 GMT+01:00 Ben Swartzlander <span dir="ltr"><<a href="mailto:ben@swartzlander.org" target="_blank">ben@swartzlander.org</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I wasn't going to write a wrap-up email this time around since so many people were able to attend in person, but enough people missed it that I changed my mind and decided to write down my own impressions of the sessions.<br>

<br>

Wednesday working session: Migration Improvements<br>

-------------------------------------------------<br>

During this session we covered the status of the migration feature so far (it's merged but experimental) and the existing gaps:<br>

1) Will fail on shares on most backends with share servers<br>

2) Controller node in the data path -- needs data copy service<br>

3) No implementations of optimized migration yet<br>

4) Confusion around task_state vs state<br>

5) Need to disable most existing operations during a migration<br>

6) Possibly need to change driver interface for getting mount info<br>

<br>

Basically there is a lot of work left to be done on migration but we're happy with the direction it's going. If we can address the gaps we could make the APIs supported in Mitaka. We're eager to get to building the valuable APIs on top of migration, but we can't do that until migration itself is solid.<br>

<br>

I also suggested that migration might benefit from an API change to allow a 2-phase migration which would allow the user (the admin in this case) to control when the final cutover happens instead of letting it happen by surprise.<br>

<br>

Wednesday working session: Access Allow/Deny Driver Interface<br>

-------------------------------------------------------------<br>

During this session I proposed a new driver interface for allowing/denying access to shares which is a single "sync access" API that the manager would call and pass all of the rules to. The main benefits of the change would be:<br>

1) More reliable cleanup of errors<br>

2) Support for atomically updating multiple rules<br>

3) Simpler/more efficient implementation on some backends<br>

<br>

Most vendors agreed that the new interface would be superior, and generally speaking most vendors said that the new interface would be simpler and more efficient than the existing one.<br>

<br>

There were some who were unsure and one who specifically said an access sync would be inefficient compared to the current allow/deny semantics. We need to see if we can provide enough information in the new interface to allow them to be more efficient (such as providing the new rules AND the diff against the old rules).<br>

<br>

It was also pointed out that error reporting would be different using this new interface, because errors applying rules couldn't be associated with the specific rule that caused them. We need a solution to that problem.<br>

<br>

Thursday fishbowl session: Share Replication<br>

--------------------------------------------<br>

There was a demo of the POC code from NetApp and general education on the new design. Since the idea was new to some and the implementation was new to everyone, there was not a lot of feedback.<br>

<br>

We did discuss a few issues, such as whether multiple shares should be allowed in a single AZ.<br>

<br>

We agreed that this replication feature will be exposed to the same kind of race conditions that exist in active/active HA, so there is additional pressure to solve the distributed locking problem. Fortunately the community seems to be converging on a solution to that problem -- the tooz library.<br>

<br>

We agreed that support for replication in a first party driver is essential for the feature to be accepted -- otherwise developers who don't have proprietary storage systems would be unable to develop/test on the feature.<br>

<br>

Thursday fishbowl session: Alternative Snapshot Semantics<br>

---------------------------------------------------------<br>

During this session I proposed 2 new things you can do with snapshots:<br>

1) Revert a share to a snapshot<br>

2) Exporting snapshots directly as readonly shares<br>

<br>

For reverting snapshots, we agreed that the operation should preserve all existing snapshots. If a backend is unable to revert without deleting snapshots, it should not advertise the capability.<br>

<br>

For mounting snapshots, it was pointed out that we need to define the access rules for the share. I proposed simply inheriting the rules for the parent share with rw rules squashed to ro. That approach has downsides though because it links to the access on the snapshot and the share (which may no be desired) and also forces us to pass a list of snapshots into the access calls so the driver can update snapshot access when updating share access.<br>

<br>

Sage proposed creating a new concept of a readonly share and simply overloading the existing create-share-from-snapshot logic with a -readonly flag which gives us the semantics we want with much complexity. The downside to that approach is that we will have to add code to handle readonly shares.<br>

<br>

There was an additional proposal to allow the create/delete snapshot calls without any other snapshot-related calls because some backends have in-band snapshot semantics. This is probably unnecessary because every backend that has snapshots is likely to support at least one of the proposed semantics so we don't need another mode.<br>

<br>

Thursday working session: Export Location Metadata<br>

--------------------------------------------------<br>

In this session we discussed the idea Jason proposed back in the winter midcycle meetup to allow drivers to tag export locations with various types of metadata which is meaningful to clients of Manila. There were several proposed use cases discussed.<br>

<br>

The main thing we agreed to was that the metadata itself seems like a good thing as long as the metadata keys and values are standardized. We didn't like the possibility of vendor-defined or admin-defined metadata appearing in the export location API.<br>

<br>

One use case discussed what the idea of preferred/non-preferred export locations. This use case makes sense and nobody was against it.<br>

<br>

Another use case discussed was "readonly" export locations which might allow certain drivers to use different export locations for writing and reading. There was debate about how much sense this made.<br>

<br>

The third discussed use case was Jason's original suggestion: locality information of different export locations to enable clients to determine which export location is closer to them. We were generally opposed to this idea because it's not clear how locality information would be expressed. We didn't like admin-defined values, and the only standard thing we have is AZs, which are already exposed through a different mechanism.<br>

<br>

Some time was spent in this session discussing the forthcoming Ceph driver and its special requirements. We discussed the possbility of dual-protocol access to shares (Ceph+NFS in this case). Dual protocol access was previously rejected (NFS+CIFS) due to concerns about interoperability. We still need to decide if we want to allow Ceph+NFS as a special case based on the idea that Ceph shares would always support NFS and there's unlikely to ever be a second Ceph driver. If we allow this, then the share protocol would make sense to expose as a metadata value.<br>

<br>

Lastly we discussed the Ceph key-sharing requirement where peeople want to use Manila to discover the Ceph secret. That would require adding some new metadata, but on the access rules, not on the export locations.<br>

<br>

Thursday working session: Interactions Between New Features<br>

--------------------------------------------------<br>

In this session we considered the possible interactions between share migration, consistency groups, and share replication (the 3 new experimental features).<br>

<br>

We quickly concluded that in the current code, bad things are likely to happen if you use 2 of these features at the same time, so as a top priority we must prevent that and return a sensible error message instead of allowing undefined behavior.<br>

<br>

We spent the rest of the session discussing how the feature should interact and concluded that enabling the behaviors we want require significant new code for all of the pairings.<br>

<br>

Migration+CGs: requires the concept of CG instance or some way of tracking which share instances make up the original CG and the migrated CG. Alternatively, requires the ability to disband and reconstruct CGs. A blueprint with and actual design is needed.<br>

<br>

Migration+Replication: can probably be implemented by simply migrating the primary (active) replica to the destination and re-replicating from there. This requires significant new code in the migration paths though because they'll need to rebuild the replication topology on the destination side. Also, for safety the migration should not complete until the destination side is fully replicated to avoid the chance of a failure immediately after migration causing a loss of access. There may be opportunities for optimized versions of the above, especially when cross-AZ bandwidth is limited and the migration is within an AZ. More though is needed, and a blueprint should spell out the design.<br>

<br>

Replication+CGs: it doesn't make a lot of sense to replicate individual shares from a CG -- more likely users will want to replicate the whole CG. This is an assumption though and we have no supporting data. Either way, replication at the granularity of a CG would require more logic to schedule the replicas CGs before scheduling the share replicas. This is likely to be significant new code and need a blueprint.<br>

<br>

Replication+CGs+Migration: this was proposed as a joke, but it's a serious concern. The above designs should consider what happens if we have a replicated CG and we wish to migrate it. If the above designs are done carefully we should get correct behavior here for free.<br>

<br>

Friday contributor meetup<br>

-------------------------<br>

On Friday we quickly reviewed the above topics for those that missed earlier sessions, then we launched into the laundry list of topics that we weren't able to fit into design sessions/working sessions.<br>

<br>

QoS: Zhongjun proposed a QoS implementation similar to Cinder's. After a brief discussion, there was not agreement that Cinder's model was a good one to follow, as QoS was introduced to Cinder before some other later enhancements such as standardized extra specs. We're inclined to use standardized extra specs instead of QoS policies in Manila. We still need to agree on which QoS-related extra specs we should standardize. There are 2 criteria for standardizing on QoS-related extra specs (1) they should mean the same thing across vendors e.g. max-bytes-per-second, and (2) they should be something that's widely implemented. I expect we'll see lots of vendor-specific QoS-related extra specs, and we need to make sure it's possible to mix vendors in the same share-type and assign a QoS policy that's equivalent across both.<br>

<br>

Minimum required features: The main open question about minimum features was access control types for each protocol. We agreed that for CIFS, user-based access control is required, and IP-based access control is not. Furthermore, support for read-only access rules is required.<br>

<br>

Improving gate performance/stability: We briefly discussed the gate issues that plagued us during Liberty and our plan to address them, which is to add more first party drivers that don't depend on quickly-evolving projects. The big offender during Liberty was Neutron, although Nova and Cinder have both bitten us in the past. To be clear, the existing Generic driver is not going away, and will still be QA'd, but we would rather not make it the gating driver.<br>

<br>

Manila HA: We briefly discussed tooz and agreed that we will use it in Manila to address race conditions that exist in active-active HA scenarios, as well as race conditions introduce by the replication code.<br>

<br>

Multiple share servers: There was some confusion about what a share server is. Some assumed that it would required multiple Manila "share servers" to implement a highly available shared storage system. In reality, the "share server" in Manila is a collection of resources which can include an arbitrary amount of underlying physical or virtual resources -- enough to implement a highly available solution with a single share server.<br>

<br>

Rolling Upgrades: We punted on this again. There was a discussion about how slow the progress in Cinder appears to be and we don't want to get trapped in limbo for multiple cycles. However if the work will unavoidably take multiple cycles then we need to know that so we can plan accordingly. Support for rolling upgrades is still viewed as desirable, but the team is worried about the apparent implementation cost.<br>

<br>

More mount automation: We covered what was done during Liberty (2 approaches) and discussed briefly the new approach Sage suggested. The nova-metadata approach that Clinton proposed in Vancouver which was not accepted by the Nova team will probably get a warmer reception based on the additional integration Sage is proposing. We should re-propose it and continue with our existing plans. There was also broad support for Sage's NFS-over-vsock proposal and Nova share-attach semantics.<br>

<br>

Key-based authentication: We went into more detail on John's Ceph-driver requirements, and why he thinks it makes sense to use communicate secrets from the Ceph backend to the end user through Manila. We didn't really reach a decision, but nobody was strongly against the change John proposed. I think we're still interested in alternative proposals if anyone has one.<br>

<br>

Make all deleted columns booleans: We discussed soft deletes and the fact that they're not implemented consistently in Manila today. We also questioned the value of soft deletes and reasoning for why we use them. Some believed there was a performance benefit, and others suggested that it had more to do with preservation of history for auditing and troubleshooting. Depending on the real motivation, this proposal may need to be scrapped or modified.<br>

<br>

Replication 2.0: We discussed the remaining work for the share replication feature before it could be accepted. There were 2 main issues: support for share servers, and a first party driver implementation of the feature. There was some dispute about the value of a first party implementation but the strongest argument in favor was the need for community members who didn't have access to proprietary hardware to be able to maintain the replication code.<br>

<br>

Functional tests for network plugins: We discussed the fact that the existing network plugins aren't covered by tests in the gate. For nova-network, we decided that was acceptable since that functionality is deprecated. For the neutron network plugin and the standalone network plugin, we need additional test jobs that exercise them.<br>

<br>

Capability lists: The addition of standardized extra specs pointed out a gap, which is that with some extra specs a backend should be able to advertise both the positive and negative capability. In the past vendors have achieved that with gross pairs of vendor-specific extra specs (e.g. netapp_dedupe and netapp_no_dedupe). We agreed it would make more sense for the backend to simply advertise dedupe=[False,True]. Changes to the filter scheduler are needed to allow capability lists, so Clinton volunteered to implement those changes.<br>

<br>

Client microversions: There wasn't much to discuss on the topic of microversions. The client patches are in progress. Cinder seems to be headed down a similar path. We are happy with the feature so far.<br>

<br>

Fix delete operations: We discussed what the force-delete APIs should do and what they shouldn't do. It was agreed that force-delete means "remove it from the manila DB no matter what, and make a best effort attempt to delete it off the backend". Some changes are needed to support that semantic. We also discussed the common problem of "stuck" shares and how to clean them up. We agreed that admins should typically use the reset-state API and retry an ordinary delete. The force-delete approach can leave garbage behind on the backend. The reset-state and retry-delete approach should never leave garbage behind, so it's safer to use.<br>

<br>

Remove all extensions: We discussed the current effort to move extensions into core. This work is mostly done and not much discussion was needed.<br>

<br>

Removing task state: We agreed to remove the task-state column introduced by migration and use ordinary states for migration.<br>

<br>

Interaction with Nova attach file system API: We went into more detail on Sage's Nova file-system-attach proposal and concluded that it should "just work" without changes from Manila.<br>

<br>

<br>

-Ben Swartzlander<br>

<br>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div><span style="color:rgb(0,0,0);font-size:small">Dr. Silvan Kaiser</span></div><div dir="ltr"><span style="color:rgb(0,0,0);font-size:small">Quobyte GmbH</span><br style="color:rgb(0,0,0);font-size:small"><span style="color:rgb(0,0,0);font-size:small">Hardenbergplatz 2, 10623 Berlin - Germany</span><br style="color:rgb(0,0,0);font-size:small"><span style="color:rgb(0,0,0);font-size:small">+49-30-814 591 800 - </span><a href="http://www.quobyte.com/" style="color:rgb(17,85,204);text-decoration:none;font-size:small" target="_blank">www.quobyte.com</a><span style="color:rgb(0,0,0);font-size:small"><</span><a href="http://www.quobyte.com/" style="color:rgb(17,85,204);text-decoration:none;font-size:small" target="_blank">http://www.quobyte.com/</a><span style="color:rgb(0,0,0);font-size:small">></span><br style="color:rgb(0,0,0);font-size:small"><span style="color:rgb(0,0,0);font-size:small">Amtsgericht Berlin-Charlottenburg, HRB 149012B</span><br style="color:rgb(0,0,0);font-size:small"><span style="color:rgb(0,0,0);font-size:small">Management board: Dr. Felix Hupfeld, Dr. Björn Kolbeck, Dr. Jan Stender</span><br></div></div></div></div></div></div></div></div>

</div>


<br>

<font face="Arial" style="font-family:Arial,Helvetica,sans-serif"><div><font face="Arial"><br><font size="2">--</font></font></div><font size="2"><font color="#808080"><b>Quobyte</b> </font><font color="#808080">GmbH<br>Hardenbergplatz 2 - 10623 Berlin - Germany<br></font></font></font><font color="#808080"><font face="Arial" size="2"><font face="Arial, Helvetica, sans-serif">+49-30-</font>814 591 800 <font face="Arial, Helvetica, sans-serif">- </font><a href="http://www.quobyte.com/" style="font-family:Arial,Helvetica,sans-serif" target="_blank">www.quobyte.com</a></font></font><div style="font-family:Arial,Helvetica,sans-serif"><font size="2"><font face="Arial" color="#808080">Amtsgericht Berlin-Charlottenburg, HRB 149012B<br>management board: Dr. Felix Hupfeld, Dr. Björn Kolbeck, Dr. Jan Stender</font></font></div>