<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Sep 26, 2015 at 4:32 PM, John Spray <span dir="ltr"><<a href="mailto:jspray@redhat.com" target="_blank">jspray@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Sat, Sep 26, 2015 at 1:27 AM, Ben Swartzlander <<a href="mailto:ben@swartzlander.org">ben@swartzlander.org</a>> wrote:<br>

> On 09/24/2015 09:49 AM, John Spray wrote:<br>

>><br>

>> Hi all,<br>

>><br>

>> I've recently started work on a CephFS driver for Manila.  The (early)<br>

>> code is here:<br>

>> <a href="https://github.com/openstack/manila/compare/master...jcsp:ceph" rel="noreferrer" target="_blank">https://github.com/openstack/manila/compare/master...jcsp:ceph</a><br>

><br>

><br>

> Awesome! This is something that's been talking about for quite some time and<br>

> I'm pleased to see progress on making it a reality.<br>

><br>

>> It requires a special branch of ceph which is here:<br>

>> <a href="https://github.com/ceph/ceph/compare/master...jcsp:wip-manila" rel="noreferrer" target="_blank">https://github.com/ceph/ceph/compare/master...jcsp:wip-manila</a><br>

>><br>

>> This isn't done yet (hence this email rather than a gerrit review),<br>

>> but I wanted to give everyone a heads up that this work is going on,<br>

>> and a brief status update.<br>

>><br>

>> This is the 'native' driver in the sense that clients use the CephFS<br>

>> client to access the share, rather than re-exporting it over NFS.  The<br>

</span>>> idea is that this driver will be useful for anyone who has suchq<br>

<span class="">>> clients, as well as acting as the basis for a later NFS-enabled<br>

>> driver.<br>

><br>

><br>

> This makes sense, but have you given thought to the optimal way to provide<br>

> NFS semantics for those who prefer that? Obviously you can pair the existing<br>

> Manila Generic driver with Cinder running on ceph, but I wonder how that<br>

> wound compare to some kind of ganesha bridge that translates between NFS and<br>

> cephfs. It that something you've looked into?<br>

<br>

</span>The Ceph FSAL in ganesha already exists, some work is going on at the<br>

moment to get it more regularly built and tested.  There's some<br>

separate design work to be done to decide exactly how that part of<br>

things is going to work, including discussing with all the right<br>

people, but I didn't want to let that hold up getting the initial<br>

native driver out there.<br>

<span class=""><br>

>> The export location returned by the driver gives the client the Ceph<br>

>> mon IP addresses, the share path, and an authentication token.  This<br>

>> authentication token is what permits the clients access (Ceph does not<br>

>> do access control based on IP addresses).<br>

>><br>

>> It's just capable of the minimal functionality of creating and<br>

>> deleting shares so far, but I will shortly be looking into hooking up<br>

>> snapshots/consistency groups, albeit for read-only snapshots only<br>

>> (cephfs does not have writeable shapshots).  Currently deletion is<br>

>> just a move into a 'trash' directory, the idea is to add something<br>

>> later that cleans this up in the background: the downside to the<br>

>> "shares are just directories" approach is that clearing them up has a<br>

>> "rm -rf" cost!<br>

><br>

><br>

> All snapshots are read-only... The question is whether you can take a<br>

> snapshot and clone it into something that's writable. We're looking at<br>

> allowing for different kinds of snapshot semantics in Manila for Mitaka.<br>

> Even if there's no create-share-from-snapshot functionality a readable<br>

> snapshot is still useful and something we'd like to enable.<br>

<br>

</span>Enabling creation of snapshots is pretty trivial, the slightly more<br>

interesting part will be accessing them.  CephFS doesn't provide a<br>

rollback mechanism, so<br>

<span class=""><br>

> The deletion issue sounds like a common one, although if you don't have the<br>

> thing that cleans them up in the background yet I hope someone is working on<br>

> that.<br>

<br>

</span>Yeah, that would be me -- the most important sentence in my original<br>

email was probably "this isn't done yet" :-)<br>

<span class=""><br>

>> A note on the implementation: cephfs recently got the ability (not yet<br>

>> in master) to restrict client metadata access based on path, so this<br>

>> driver is simply creating shares by creating directories within a<br>

>> cluster-wide filesystem, and issuing credentials to clients that<br>

>> restrict them to their own directory.  They then mount that subpath,<br>

>> so that from the client's point of view it's like having their own<br>

>> filesystem.  We also have a quota mechanism that I'll hook in later to<br>

>> enforce the share size.<br>

><br>

><br>

> So quotas aren't enforced yet? That seems like a serious issue for any<br>

> operator except those that want to support "infinite" size shares. I hope<br>

> that gets fixed soon as well.<br>

<br>

</span>Same again, just not done yet.  Well, actually since I wrote the<br>

original email I added quota support to my branch, so never mind!<br>

<span class=""><br>

>> Currently the security here requires clients (i.e. the ceph-fuse code<br>

>> on client hosts, not the userspace applications) to be trusted, as<br>

>> quotas are enforced on the client side.  The OSD access control<br>

>> operates on a per-pool basis, and creating a separate pool for each<br>

>> share is inefficient.  In the future it is expected that CephFS will<br>

>> be extended to support file layouts that use RADOS namespaces, which<br>

>> are cheap, such that we can issue a new namespace to each share and<br>

>> enforce the separation between shares on the OSD side.<br>

><br>

><br>

> I think it will be important to document all of these limitations. I<br>

> wouldn't let them stop you from getting the driver done, but if I was a<br>

> deployer I'd want to know about these details.<br>

<br>

</span>Yes, definitely.  I'm also adding an optional flag when creating<br>

volumes to give them their own RADOS pool for data, which would make<br>

the level of isolation much stronger, at the cost of using more<br>

resources per volume.  Creating separate pools has a substantial<br>

overhead, but in sites with a relatively small number of shared<br>

filesystems it could be desirable.  We may also want to look into<br>

making this a layered thing with a pool per tenant, and then<br>

less-isolated shares within that pool.  (pool in this paragraph means<br>

the ceph concept, not the manila concept).<br>

<br>

At some stage I would like to add the ability to have physically<br>

separate filesystems within ceph (i.e. filesystems don't share the<br>

same MDSs), which would add a second optional level of isolation for<br>

metadata as well as data<br>

<br>

Overall though, there's going to be sort of a race here between the<br>

native ceph multitenancy capability, and the use of NFS to provide<br>

similar levels of isolation.<br></blockquote><div><br></div><div>Thanks for the explanation, this helps understand things nicely, tho' I have<br></div><div>a small doubt. When you say separate filesystems within ceph cluster, you<br></div><div>meant the same as mapping them to different RADOS namespaces, and each<br></div><div>namespace will have its own MDS, thus providing addnl isolation on top of<br></div><div>having 1 pool per tenant ? <br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class=""><br>

>> However, for many people the ultimate access control solution will be<br>

>> to use a NFS gateway in front of their CephFS filesystem: it is<br>

>> expected that an NFS-enabled cephfs driver will follow this native<br>

>> driver in the not-too-distant future.<br>

><br>

><br>

> Okay this answers part of my above question, but how to you expect the NFS<br>

> gateway to work? Ganesha has been used successfully in the past.<br>

<br>

</span>Ganesha is the preferred server right now.  There is probably going to<br>

need to be some level of experimentation needed to confirm that it's<br>

working and performing sufficiently well compared with knfs on top of<br>

the cephfs kernel client.  Personally though, I have a strong<br>

preference for userspace solutions where they work well enough.<br>

<br>

The broader question is exactly where in the system the NFS gateways<br>

run, and how they get configured -- that's the very next conversation<br>

to have after the guts of this driver are done.  We are interested in<br>

approaches that bring the CephFS protocol as close to the guests as<br>

possible before bridging it to NFS, possibly even running ganesha<br>

instances locally on the hypervisors, but I don't think we're ready to<br>

draw a clear picture of that just yet, and I suspect we will end up<br>

wanting to enable multiple methods, including the lowest common<br>

denominator "run a VM with a ceph client and ganesha" case.<br></blockquote><div><br></div><div>By the lowest denominator case, you mean the manila concept<br></div><div>of running the share server inside a service VM or something else ?<br><br></div><div>thanx,<br></div><div>deepak<br></div></div><br></div></div>