[openstack-dev] [Manila] CephFS native driver
jspray at redhat.com
Thu Oct 1 10:10:39 UTC 2015
On Thu, Oct 1, 2015 at 8:26 AM, Deepak Shetty <dpkshetty at gmail.com> wrote:
>> > I think it will be important to document all of these limitations. I
>> > wouldn't let them stop you from getting the driver done, but if I was a
>> > deployer I'd want to know about these details.
>> Yes, definitely. I'm also adding an optional flag when creating
>> volumes to give them their own RADOS pool for data, which would make
>> the level of isolation much stronger, at the cost of using more
>> resources per volume. Creating separate pools has a substantial
>> overhead, but in sites with a relatively small number of shared
>> filesystems it could be desirable. We may also want to look into
>> making this a layered thing with a pool per tenant, and then
>> less-isolated shares within that pool. (pool in this paragraph means
>> the ceph concept, not the manila concept).
>> At some stage I would like to add the ability to have physically
>> separate filesystems within ceph (i.e. filesystems don't share the
>> same MDSs), which would add a second optional level of isolation for
>> metadata as well as data
>> Overall though, there's going to be sort of a race here between the
>> native ceph multitenancy capability, and the use of NFS to provide
>> similar levels of isolation.
> Thanks for the explanation, this helps understand things nicely, tho' I have
> a small doubt. When you say separate filesystems within ceph cluster, you
> meant the same as mapping them to different RADOS namespaces, and each
> namespace will have its own MDS, thus providing addnl isolation on top of
> having 1 pool per tenant ?
Physically separate filesystems would be using separate MDSs, and
separate RADOS pools. For ultra isolation, the RADOS pools would also
be configured to map to different OSDs.
Separate RADOS namespaces do not provide physical separation (multiple
namespaces exist within one pool, hence on the same OSDs), but they
would provide server-side security for preventing clients seeing into
one anothers data pools. The terminology is confusing because RADOS
namespace is a distinct ceph specific concept from filesystem
CephFS doesn't currently have either the "separate MDSs" isolation, or
the support for using RADOS namespaces in layouts. They're both
pretty well understood and not massively complex to implement though,
so it's pretty much just a matter of time.
This is all very ceph-implementation-specific stuff, so apologies if
it's not crystal clear at this stage.
>> >> However, for many people the ultimate access control solution will be
>> >> to use a NFS gateway in front of their CephFS filesystem: it is
>> >> expected that an NFS-enabled cephfs driver will follow this native
>> >> driver in the not-too-distant future.
>> > Okay this answers part of my above question, but how to you expect the
>> > NFS
>> > gateway to work? Ganesha has been used successfully in the past.
>> Ganesha is the preferred server right now. There is probably going to
>> need to be some level of experimentation needed to confirm that it's
>> working and performing sufficiently well compared with knfs on top of
>> the cephfs kernel client. Personally though, I have a strong
>> preference for userspace solutions where they work well enough.
>> The broader question is exactly where in the system the NFS gateways
>> run, and how they get configured -- that's the very next conversation
>> to have after the guts of this driver are done. We are interested in
>> approaches that bring the CephFS protocol as close to the guests as
>> possible before bridging it to NFS, possibly even running ganesha
>> instances locally on the hypervisors, but I don't think we're ready to
>> draw a clear picture of that just yet, and I suspect we will end up
>> wanting to enable multiple methods, including the lowest common
>> denominator "run a VM with a ceph client and ganesha" case.
> By the lowest denominator case, you mean the manila concept
> of running the share server inside a service VM or something else ?
Yes, that's exactly what I mean. To be clear, by "lowest common
denominator" I don't mean least good, I mean most generic.
More information about the OpenStack-dev