[openstack-dev] [Manila] CephFS native driver

John Spray jspray at redhat.com
Sat Sep 26 11:02:34 UTC 2015


On Sat, Sep 26, 2015 at 1:27 AM, Ben Swartzlander <ben at swartzlander.org> wrote:
> On 09/24/2015 09:49 AM, John Spray wrote:
>>
>> Hi all,
>>
>> I've recently started work on a CephFS driver for Manila.  The (early)
>> code is here:
>> https://github.com/openstack/manila/compare/master...jcsp:ceph
>
>
> Awesome! This is something that's been talking about for quite some time and
> I'm pleased to see progress on making it a reality.
>
>> It requires a special branch of ceph which is here:
>> https://github.com/ceph/ceph/compare/master...jcsp:wip-manila
>>
>> This isn't done yet (hence this email rather than a gerrit review),
>> but I wanted to give everyone a heads up that this work is going on,
>> and a brief status update.
>>
>> This is the 'native' driver in the sense that clients use the CephFS
>> client to access the share, rather than re-exporting it over NFS.  The
>> idea is that this driver will be useful for anyone who has suchq
>> clients, as well as acting as the basis for a later NFS-enabled
>> driver.
>
>
> This makes sense, but have you given thought to the optimal way to provide
> NFS semantics for those who prefer that? Obviously you can pair the existing
> Manila Generic driver with Cinder running on ceph, but I wonder how that
> wound compare to some kind of ganesha bridge that translates between NFS and
> cephfs. It that something you've looked into?

The Ceph FSAL in ganesha already exists, some work is going on at the
moment to get it more regularly built and tested.  There's some
separate design work to be done to decide exactly how that part of
things is going to work, including discussing with all the right
people, but I didn't want to let that hold up getting the initial
native driver out there.

>> The export location returned by the driver gives the client the Ceph
>> mon IP addresses, the share path, and an authentication token.  This
>> authentication token is what permits the clients access (Ceph does not
>> do access control based on IP addresses).
>>
>> It's just capable of the minimal functionality of creating and
>> deleting shares so far, but I will shortly be looking into hooking up
>> snapshots/consistency groups, albeit for read-only snapshots only
>> (cephfs does not have writeable shapshots).  Currently deletion is
>> just a move into a 'trash' directory, the idea is to add something
>> later that cleans this up in the background: the downside to the
>> "shares are just directories" approach is that clearing them up has a
>> "rm -rf" cost!
>
>
> All snapshots are read-only... The question is whether you can take a
> snapshot and clone it into something that's writable. We're looking at
> allowing for different kinds of snapshot semantics in Manila for Mitaka.
> Even if there's no create-share-from-snapshot functionality a readable
> snapshot is still useful and something we'd like to enable.

Enabling creation of snapshots is pretty trivial, the slightly more
interesting part will be accessing them.  CephFS doesn't provide a
rollback mechanism, so

> The deletion issue sounds like a common one, although if you don't have the
> thing that cleans them up in the background yet I hope someone is working on
> that.

Yeah, that would be me -- the most important sentence in my original
email was probably "this isn't done yet" :-)

>> A note on the implementation: cephfs recently got the ability (not yet
>> in master) to restrict client metadata access based on path, so this
>> driver is simply creating shares by creating directories within a
>> cluster-wide filesystem, and issuing credentials to clients that
>> restrict them to their own directory.  They then mount that subpath,
>> so that from the client's point of view it's like having their own
>> filesystem.  We also have a quota mechanism that I'll hook in later to
>> enforce the share size.
>
>
> So quotas aren't enforced yet? That seems like a serious issue for any
> operator except those that want to support "infinite" size shares. I hope
> that gets fixed soon as well.

Same again, just not done yet.  Well, actually since I wrote the
original email I added quota support to my branch, so never mind!

>> Currently the security here requires clients (i.e. the ceph-fuse code
>> on client hosts, not the userspace applications) to be trusted, as
>> quotas are enforced on the client side.  The OSD access control
>> operates on a per-pool basis, and creating a separate pool for each
>> share is inefficient.  In the future it is expected that CephFS will
>> be extended to support file layouts that use RADOS namespaces, which
>> are cheap, such that we can issue a new namespace to each share and
>> enforce the separation between shares on the OSD side.
>
>
> I think it will be important to document all of these limitations. I
> wouldn't let them stop you from getting the driver done, but if I was a
> deployer I'd want to know about these details.

Yes, definitely.  I'm also adding an optional flag when creating
volumes to give them their own RADOS pool for data, which would make
the level of isolation much stronger, at the cost of using more
resources per volume.  Creating separate pools has a substantial
overhead, but in sites with a relatively small number of shared
filesystems it could be desirable.  We may also want to look into
making this a layered thing with a pool per tenant, and then
less-isolated shares within that pool.  (pool in this paragraph means
the ceph concept, not the manila concept).

At some stage I would like to add the ability to have physically
separate filesystems within ceph (i.e. filesystems don't share the
same MDSs), which would add a second optional level of isolation for
metadata as well as data

Overall though, there's going to be sort of a race here between the
native ceph multitenancy capability, and the use of NFS to provide
similar levels of isolation.

>> However, for many people the ultimate access control solution will be
>> to use a NFS gateway in front of their CephFS filesystem: it is
>> expected that an NFS-enabled cephfs driver will follow this native
>> driver in the not-too-distant future.
>
>
> Okay this answers part of my above question, but how to you expect the NFS
> gateway to work? Ganesha has been used successfully in the past.

Ganesha is the preferred server right now.  There is probably going to
need to be some level of experimentation needed to confirm that it's
working and performing sufficiently well compared with knfs on top of
the cephfs kernel client.  Personally though, I have a strong
preference for userspace solutions where they work well enough.

The broader question is exactly where in the system the NFS gateways
run, and how they get configured -- that's the very next conversation
to have after the guts of this driver are done.  We are interested in
approaches that bring the CephFS protocol as close to the guests as
possible before bridging it to NFS, possibly even running ganesha
instances locally on the hypervisors, but I don't think we're ready to
draw a clear picture of that just yet, and I suspect we will end up
wanting to enable multiple methods, including the lowest common
denominator "run a VM with a ceph client and ganesha" case.

>> This will be my first openstack contribution, so please bear with me
>> while I come up to speed with the submission process.  I'll also be in
>> Tokyo for the summit next month, so I hope to meet other interested
>> parties there.
>
>
> Welcome and I look forward you meeting you in Tokyo!

Likewise!

John

>
> -Ben
>
>
>
>> All the best,
>> John
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list