[openstack-dev] [Manila] CephFS native driver

Deepak Shetty dpkshetty at gmail.com
Thu Oct 1 07:26:45 UTC 2015


On Sat, Sep 26, 2015 at 4:32 PM, John Spray <jspray at redhat.com> wrote:

> On Sat, Sep 26, 2015 at 1:27 AM, Ben Swartzlander <ben at swartzlander.org>
> wrote:
> > On 09/24/2015 09:49 AM, John Spray wrote:
> >>
> >> Hi all,
> >>
> >> I've recently started work on a CephFS driver for Manila.  The (early)
> >> code is here:
> >> https://github.com/openstack/manila/compare/master...jcsp:ceph
> >
> >
> > Awesome! This is something that's been talking about for quite some time
> and
> > I'm pleased to see progress on making it a reality.
> >
> >> It requires a special branch of ceph which is here:
> >> https://github.com/ceph/ceph/compare/master...jcsp:wip-manila
> >>
> >> This isn't done yet (hence this email rather than a gerrit review),
> >> but I wanted to give everyone a heads up that this work is going on,
> >> and a brief status update.
> >>
> >> This is the 'native' driver in the sense that clients use the CephFS
> >> client to access the share, rather than re-exporting it over NFS.  The
> >> idea is that this driver will be useful for anyone who has suchq
> >> clients, as well as acting as the basis for a later NFS-enabled
> >> driver.
> >
> >
> > This makes sense, but have you given thought to the optimal way to
> provide
> > NFS semantics for those who prefer that? Obviously you can pair the
> existing
> > Manila Generic driver with Cinder running on ceph, but I wonder how that
> > wound compare to some kind of ganesha bridge that translates between NFS
> and
> > cephfs. It that something you've looked into?
>
> The Ceph FSAL in ganesha already exists, some work is going on at the
> moment to get it more regularly built and tested.  There's some
> separate design work to be done to decide exactly how that part of
> things is going to work, including discussing with all the right
> people, but I didn't want to let that hold up getting the initial
> native driver out there.
>
> >> The export location returned by the driver gives the client the Ceph
> >> mon IP addresses, the share path, and an authentication token.  This
> >> authentication token is what permits the clients access (Ceph does not
> >> do access control based on IP addresses).
> >>
> >> It's just capable of the minimal functionality of creating and
> >> deleting shares so far, but I will shortly be looking into hooking up
> >> snapshots/consistency groups, albeit for read-only snapshots only
> >> (cephfs does not have writeable shapshots).  Currently deletion is
> >> just a move into a 'trash' directory, the idea is to add something
> >> later that cleans this up in the background: the downside to the
> >> "shares are just directories" approach is that clearing them up has a
> >> "rm -rf" cost!
> >
> >
> > All snapshots are read-only... The question is whether you can take a
> > snapshot and clone it into something that's writable. We're looking at
> > allowing for different kinds of snapshot semantics in Manila for Mitaka.
> > Even if there's no create-share-from-snapshot functionality a readable
> > snapshot is still useful and something we'd like to enable.
>
> Enabling creation of snapshots is pretty trivial, the slightly more
> interesting part will be accessing them.  CephFS doesn't provide a
> rollback mechanism, so
>
> > The deletion issue sounds like a common one, although if you don't have
> the
> > thing that cleans them up in the background yet I hope someone is
> working on
> > that.
>
> Yeah, that would be me -- the most important sentence in my original
> email was probably "this isn't done yet" :-)
>
> >> A note on the implementation: cephfs recently got the ability (not yet
> >> in master) to restrict client metadata access based on path, so this
> >> driver is simply creating shares by creating directories within a
> >> cluster-wide filesystem, and issuing credentials to clients that
> >> restrict them to their own directory.  They then mount that subpath,
> >> so that from the client's point of view it's like having their own
> >> filesystem.  We also have a quota mechanism that I'll hook in later to
> >> enforce the share size.
> >
> >
> > So quotas aren't enforced yet? That seems like a serious issue for any
> > operator except those that want to support "infinite" size shares. I hope
> > that gets fixed soon as well.
>
> Same again, just not done yet.  Well, actually since I wrote the
> original email I added quota support to my branch, so never mind!
>
> >> Currently the security here requires clients (i.e. the ceph-fuse code
> >> on client hosts, not the userspace applications) to be trusted, as
> >> quotas are enforced on the client side.  The OSD access control
> >> operates on a per-pool basis, and creating a separate pool for each
> >> share is inefficient.  In the future it is expected that CephFS will
> >> be extended to support file layouts that use RADOS namespaces, which
> >> are cheap, such that we can issue a new namespace to each share and
> >> enforce the separation between shares on the OSD side.
> >
> >
> > I think it will be important to document all of these limitations. I
> > wouldn't let them stop you from getting the driver done, but if I was a
> > deployer I'd want to know about these details.
>
> Yes, definitely.  I'm also adding an optional flag when creating
> volumes to give them their own RADOS pool for data, which would make
> the level of isolation much stronger, at the cost of using more
> resources per volume.  Creating separate pools has a substantial
> overhead, but in sites with a relatively small number of shared
> filesystems it could be desirable.  We may also want to look into
> making this a layered thing with a pool per tenant, and then
> less-isolated shares within that pool.  (pool in this paragraph means
> the ceph concept, not the manila concept).
>
> At some stage I would like to add the ability to have physically
> separate filesystems within ceph (i.e. filesystems don't share the
> same MDSs), which would add a second optional level of isolation for
> metadata as well as data
>
> Overall though, there's going to be sort of a race here between the
> native ceph multitenancy capability, and the use of NFS to provide
> similar levels of isolation.
>

Thanks for the explanation, this helps understand things nicely, tho' I have
a small doubt. When you say separate filesystems within ceph cluster, you
meant the same as mapping them to different RADOS namespaces, and each
namespace will have its own MDS, thus providing addnl isolation on top of
having 1 pool per tenant ?


>
> >> However, for many people the ultimate access control solution will be
> >> to use a NFS gateway in front of their CephFS filesystem: it is
> >> expected that an NFS-enabled cephfs driver will follow this native
> >> driver in the not-too-distant future.
> >
> >
> > Okay this answers part of my above question, but how to you expect the
> NFS
> > gateway to work? Ganesha has been used successfully in the past.
>
> Ganesha is the preferred server right now.  There is probably going to
> need to be some level of experimentation needed to confirm that it's
> working and performing sufficiently well compared with knfs on top of
> the cephfs kernel client.  Personally though, I have a strong
> preference for userspace solutions where they work well enough.
>
> The broader question is exactly where in the system the NFS gateways
> run, and how they get configured -- that's the very next conversation
> to have after the guts of this driver are done.  We are interested in
> approaches that bring the CephFS protocol as close to the guests as
> possible before bridging it to NFS, possibly even running ganesha
> instances locally on the hypervisors, but I don't think we're ready to
> draw a clear picture of that just yet, and I suspect we will end up
> wanting to enable multiple methods, including the lowest common
> denominator "run a VM with a ceph client and ganesha" case.
>

By the lowest denominator case, you mean the manila concept
of running the share server inside a service VM or something else ?

thanx,
deepak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151001/700f62b2/attachment.html>


More information about the OpenStack-dev mailing list