[openstack-dev] [Manila] CephFS native driver

Shinobu Kinjo skinjo at redhat.com
Fri Sep 25 12:35:13 UTC 2015


Thanks!
Keep me in the loop.

Shinobu

----- Original Message -----
From: "John Spray" <jspray at redhat.com>
To: "Shinobu Kinjo" <skinjo at redhat.com>
Cc: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Sent: Friday, September 25, 2015 6:54:09 PM
Subject: Re: [openstack-dev] [Manila] CephFS native driver

On Fri, Sep 25, 2015 at 10:16 AM, Shinobu Kinjo <skinjo at redhat.com> wrote:
> Thank you for your reply.
>
>> The main distinction here is that for
>> native CephFS clients, they get a shared filesystem where all the
>> clients can talk to all the Ceph OSDs directly, and avoid the
>> potential bottleneck of an NFS->local fs->RBD server.
>
> As you know each pass from clients to rados is:
>
>  1) CephFS
>   [Apps] -> [VFS] -> [Kernel Driver] -> [Ceph-Kernel Client]
>    -> [MON], [MDS], [OSD]
>
>  2) RBD
>   [Apps] -> [VFS] -> [librbd] -> [librados] -> [MON], [OSD]
>
> Considering above, there could be more bottleneck in 1) than 2),
> I think.
>
> What do you think?

The bottleneck I'm talking about is when you share the filesystem
between many guests.  In the RBD image case, you would have a single
NFS server, through which all the data and metadata would have to
flow: that becomes a limiting factor.  In the CephFS case, the clients
can talk to the MDS and OSD daemons individually, without having to
flow through one NFS server.

The preference depends on the use case: the benefits of a shared
filesystem like CephFS don't become apparent until you have lots of
guests using the same shared filesystem.  I'd expect people to keep
using Cinder+RBD for cases where a filesystem is just exposed to one
guest at a time.

>>  3.What are you thinking of integration with OpenStack using
>>   a new implementation?
>>   Since it's going to be new kind of, there should be differ-
>>   ent architecture.
>
> Sorry, it's just too ambiguous. Frankly how are you going to
> implement such a new future, was my question.
>
> Make sense?

Right now this is just about building Manila drivers to enable use of
Ceph, rather than re-architecting anything.  A user would create a
conventional Ceph cluster and a conventional OpenStack cluster, this
is just about enabling the use of the two together via Manila (i.e. to
do for CephFS/Manila what is already done for RBD/Cinder).

I expect there will be more discussion later about exactly what the
NFS layer will look like, though we can start with the simple case of
creating a guest VM that acts as a gateway.

>>  4.Is this implementation intended for OneStack integration
>>   mainly?
>
> Yes, that's just my typo -;
>
>  OneStack -> OpenStack

Naturally the Manila part is just for openstack.  However, some of the
utility parts (e.g. the "VolumeClient" class) might get re-used in
other systems that require a similar concept (like containers, other
clouds).

John

>
>
>> This piece of work is specifically about Manila; general improvements
>> in Ceph integration would be a different topic.
>
> That's interesting to me.
>
> Shinobu
>
> ----- Original Message -----
> From: "John Spray" <jspray at redhat.com>
> To: "Shinobu Kinjo" <skinjo at redhat.com>
> Cc: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
> Sent: Friday, September 25, 2015 5:51:36 PM
> Subject: Re: [openstack-dev] [Manila] CephFS native driver
>
> On Fri, Sep 25, 2015 at 8:04 AM, Shinobu Kinjo <skinjo at redhat.com> wrote:
>> So here are questions from my side.
>> Just question.
>>
>>
>>  1.What is the biggest advantage comparing others such as RDB?
>>   We should be able to implement what you are going to do in
>>   existing module, shouldn't we?
>
> I guess you mean compared to using a local filesystem on top of RBD,
> and exporting it over NFS?  The main distinction here is that for
> native CephFS clients, they get a shared filesystem where all the
> clients can talk to all the Ceph OSDs directly, and avoid the
> potential bottleneck of an NFS->local fs->RBD server.
>
> Workloads requiring a local filesystem would probably continue to map
> a cinder block device and use that.  The Manila driver is intended for
> use cases that require a shared filesystem.
>
>>  2.What are you going to focus on with a new implementation?
>>   It seems to be to use NFS in front of that implementation
>>   with more transparently.
>
> The goal here is to make cephfs accessible to people by making it easy
> to provision it for their applications, just like Manila in general.
> The motivation for putting an NFS layer in front of CephFS is to make
> it easier for people to adopt, because they won't need to install any
> ceph-specific code in their guests.  It will also be easier to
> support, because any ceph client bugfixes would not need to be
> installed within guests (if we assume existing nfs clients are bug
> free :-))
>
>>  3.What are you thinking of integration with OpenStack using
>>   a new implementation?
>>   Since it's going to be new kind of, there should be differ-
>>   ent architecture.
>
> Not sure I understand this question?
>
>>  4.Is this implementation intended for OneStack integration
>>   mainly?
>
> Nope (I had not heard of onestack before).
>
>> Since velocity of OpenStack feature expansion is much more than
>> it used to be, it's much more important to think of performance.
>
>> Is a new implementation also going to improve Ceph integration
>> with OpenStack system?
>
> This piece of work is specifically about Manila; general improvements
> in Ceph integration would be a different topic.
>
> Thanks,
> John
>
>>
>> Thank you so much for your explanation in advance.
>>
>> Shinobu
>>
>> ----- Original Message -----
>> From: "John Spray" <jspray at redhat.com>
>> To: openstack-dev at lists.openstack.org, "Ceph Development" <ceph-devel at vger.kernel.org>
>> Sent: Thursday, September 24, 2015 10:49:17 PM
>> Subject: [openstack-dev] [Manila] CephFS native driver
>>
>> Hi all,
>>
>> I've recently started work on a CephFS driver for Manila.  The (early)
>> code is here:
>> https://github.com/openstack/manila/compare/master...jcsp:ceph
>>
>> It requires a special branch of ceph which is here:
>> https://github.com/ceph/ceph/compare/master...jcsp:wip-manila
>>
>> This isn't done yet (hence this email rather than a gerrit review),
>> but I wanted to give everyone a heads up that this work is going on,
>> and a brief status update.
>>
>> This is the 'native' driver in the sense that clients use the CephFS
>> client to access the share, rather than re-exporting it over NFS.  The
>> idea is that this driver will be useful for anyone who has such
>> clients, as well as acting as the basis for a later NFS-enabled
>> driver.
>>
>> The export location returned by the driver gives the client the Ceph
>> mon IP addresses, the share path, and an authentication token.  This
>> authentication token is what permits the clients access (Ceph does not
>> do access control based on IP addresses).
>>
>> It's just capable of the minimal functionality of creating and
>> deleting shares so far, but I will shortly be looking into hooking up
>> snapshots/consistency groups, albeit for read-only snapshots only
>> (cephfs does not have writeable shapshots).  Currently deletion is
>> just a move into a 'trash' directory, the idea is to add something
>> later that cleans this up in the background: the downside to the
>> "shares are just directories" approach is that clearing them up has a
>> "rm -rf" cost!
>>
>> A note on the implementation: cephfs recently got the ability (not yet
>> in master) to restrict client metadata access based on path, so this
>> driver is simply creating shares by creating directories within a
>> cluster-wide filesystem, and issuing credentials to clients that
>> restrict them to their own directory.  They then mount that subpath,
>> so that from the client's point of view it's like having their own
>> filesystem.  We also have a quota mechanism that I'll hook in later to
>> enforce the share size.
>>
>> Currently the security here requires clients (i.e. the ceph-fuse code
>> on client hosts, not the userspace applications) to be trusted, as
>> quotas are enforced on the client side.  The OSD access control
>> operates on a per-pool basis, and creating a separate pool for each
>> share is inefficient.  In the future it is expected that CephFS will
>> be extended to support file layouts that use RADOS namespaces, which
>> are cheap, such that we can issue a new namespace to each share and
>> enforce the separation between shares on the OSD side.
>>
>> However, for many people the ultimate access control solution will be
>> to use a NFS gateway in front of their CephFS filesystem: it is
>> expected that an NFS-enabled cephfs driver will follow this native
>> driver in the not-too-distant future.
>>
>> This will be my first openstack contribution, so please bear with me
>> while I come up to speed with the submission process.  I'll also be in
>> Tokyo for the summit next month, so I hope to meet other interested
>> parties there.
>>
>> All the best,
>> John
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list