[Openstack-operators] RAID / stripe block storage volumes

Joe Topjian joe at topjian.net
Tue Feb 9 02:18:47 UTC 2016


Yep. Don't get me wrong -- I agree 100% with everything you've said
throughout this thread. Applications that have native replication are
awesome. Swift is crazy awesome. :)

I understand that some may see the use of mdadm, Cinder-assisted
replication, etc as supporting "pet" environments, and I agree to some
extent. But I do think there are applicable use-cases where those services
could be very helpful.

As one example, I know of large cloud-based environments which handle very
large data sets and are entirely stood up through configuration management
systems. However, due to the sheer size of data being handled, rebuilding
or resyncing a portion of the environment could take hours. Failing over to
a replicated volume is instant.In addition, being able to both stripe and
replicate goes a very long way in making the most out of commodity block
storage environments (for example, avoiding packing problems and such).

Should these types of applications be reading / writing directly to Swift,
HDFS, or handling replication themselves? Sure, in a perfect world. Does
Gluster fill all gaps I've mentioned? Kind of.

I guess I'm just trying to survey the options available for applications
and environments that would otherwise be very flexible and resilient if it
wasn't for their awkward use of storage. :)

On Mon, Feb 8, 2016 at 6:18 PM, Robert Starmer <robert at kumul.us> wrote:

> Besides, wouldn't it be better to actually do application layer backup
> restore, or application level distribution for replication?  That
> architecture at least let's the application determine and deal with corrupt
> data transmission rather than the DRBD like model where you corrupt one
> data-set, you corrupt them all...
>
> Hence my comment about having some form of object storage (SWIFT is
> perhaps even a good example of this architeccture, the proxy replicates,
> checks MD5, etc. to verify good data, rather than just replicating blocks
> of data).
>
>
>
> On Mon, Feb 8, 2016 at 7:15 PM, Robert Starmer <robert at kumul.us> wrote:
>
>> I have not run into anyone replicating volumes or creating redundancy at
>> the VM level (beyond, as you point out, HDFS, etc.).
>>
>> R
>>
>> On Mon, Feb 8, 2016 at 6:54 PM, Joe Topjian <joe at topjian.net> wrote:
>>
>>> This is a great conversation and I really appreciate everyone's input.
>>> Though, I agree, we wandered off the original question and that's my fault
>>> for mentioning various storage backends.
>>>
>>> For the sake of conversation, let's just say the user has no knowledge
>>> of the underlying storage technology. They're presented with a Block
>>> Storage service and the rest is up to them. What known, working options
>>> does the user have to build their own block storage resilience? (Ignoring
>>> "obvious" solutions where the application has native replication, such as
>>> Galera, elasticsearch, etc)
>>>
>>> I have seen references to Cinder supporting replication, but I'm not
>>> able to find a lot of information about it. The support matrix[1] lists
>>> very few drivers that actually implement replication -- is this true or is
>>> there a trove of replication docs that I just haven't been able to find?
>>>
>>> Amazon AWS publishes instructions on how to use mdadm with EBS[2]. One
>>> might interpret that to mean mdadm is a supported solution within EC2 based
>>> instances.
>>>
>>> There are also references to DRBD and EC2, though I could not find
>>> anything as "official" as mdadm and EC2.
>>>
>>> Does anyone have experience (or know users) doing either? (specifically
>>> with libvirt/KVM, but I'd be curious to know in general)
>>>
>>> Or is it more advisable to create multiple instances where data is
>>> replicated instance-to-instance rather than a single instance with multiple
>>> volumes and have data replicated volume-to-volume (by way of a single
>>> instance)? And if so, why? Is a lack of stable volume-to-volume replication
>>> a limitation of certain hypervisors?
>>>
>>> Or has this area just not been explored in depth within OpenStack
>>> environments yet?
>>>
>>> 1: https://wiki.openstack.org/wiki/CinderSupportMatrix
>>> 2: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html
>>>
>>>
>>> On Mon, Feb 8, 2016 at 4:10 PM, Robert Starmer <robert at kumul.us> wrote:
>>>
>>>> I'm not against Ceph, but even 2 machines (and really 2 machines with
>>>> enough storage to be meaningful, e.g. not the all blade environments I've
>>>> built some o7k  systems on) may not be available for storage, so there are
>>>> cases where that's not necessarily the solution. I built resiliency in one
>>>> environment with a 2 node controller/Glance/db system with Gluster, which
>>>> enabled enough middleware resiliency to meet the customers recovery
>>>> expectations. Regardless, even with a cattle application model, the
>>>> infrastructure middleware still needs to be able to provide some level of
>>>> resiliency.
>>>>
>>>> But we've kind-of wandered off of the original question. I think that
>>>> to bring this back on topic, I think users can build resilience in their
>>>> own storage construction, but I still think there are use cases where the
>>>> middleware either needs to use it's own resiliency layer, and/or may end up
>>>> providing it for the end user.
>>>>
>>>> R
>>>>
>>>> On Mon, Feb 8, 2016 at 3:51 PM, Fox, Kevin M <Kevin.Fox at pnnl.gov>
>>>> wrote:
>>>>
>>>>> We've used ceph to address the storage requirement in small clouds
>>>>> pretty well. it works pretty well with only two storage nodes with
>>>>> replication set to 2, and because of the radosgw, you can share your small
>>>>> amount of storage between the object store and the block store avoiding the
>>>>> need to overprovision swift-only or cinder-only to handle usage unknowns.
>>>>> Its just one pool of storage.
>>>>>
>>>>> Your right, using lvm is like telling your users, don't do pets, but
>>>>> then having pets at the heart of your system. when you loose one, you loose
>>>>> a lot. With a small ceph, you can take out one of the nodes, burn it to the
>>>>> ground and put it back, and it just works. No pets.
>>>>>
>>>>> Do consider ceph for the small use case.
>>>>>
>>>>> Thanks,
>>>>> Kevin
>>>>>
>>>>> ------------------------------
>>>>> *From:* Robert Starmer [robert at kumul.us]
>>>>> *Sent:* Monday, February 08, 2016 1:30 PM
>>>>> *To:* Ned Rhudy
>>>>> *Cc:* OpenStack Operators
>>>>>
>>>>> *Subject:* Re: [Openstack-operators] RAID / stripe block storage
>>>>> volumes
>>>>>
>>>>> Ned's model is the model I meant by "multiple underlying storage
>>>>> services".  Most of the systems I've built are LV/LVM only,  a few added
>>>>> Ceph as an alternative/live-migration option, and one where we used Gluster
>>>>> due to size.  Note that the environments I have worked with in general are
>>>>> small (~20 compute), so huge Ceph environments aren't common.  I am also
>>>>> working on a project where the storage backend is entirely NFS...
>>>>>
>>>>> And I think users are more and more educated to assume that there is
>>>>> nothing guaranteed.  There is the realization, at least for a good set of
>>>>> the customers I've worked with (and I try to educate the non-believers),
>>>>> that the way you get best effect from a system like OpenStack is to
>>>>> consider everything disposable. The one gap I've seen is that there are
>>>>> plenty of folks who don't deploy SWIFT, and without some form of object
>>>>> store, there's still the question of where you place your datasets so that
>>>>> they can be quickly recovered (and how do you keep them up to date if you
>>>>> do have one).  With VMs, there's the concept that you can recover quickly
>>>>> because the "dataset" e.g. your OS, is already there for you, and in plenty
>>>>> of small environments, that's only as true as the glance repository (guess
>>>>> what's usually backing that when there's no SWIFT around...).
>>>>>
>>>>> So I see the issue as a holistic one. How do you show operators/users
>>>>> that they should consider everything disposable if we only look at the
>>>>> current running instance as the "thing"   Somewhere you still likely need
>>>>> some form of distributed resilience (and yes, I can see using the
>>>>> distributed Canonical, Centos, RedHat, Fedora, Debian, etc. mirrors as your
>>>>> distributed Image backup but what about the database content, etc.).
>>>>>
>>>>> Robert
>>>>>
>>>>> On Mon, Feb 8, 2016 at 1:44 PM, Ned Rhudy (BLOOMBERG/ 731 LEX) <
>>>>> erhudy at bloomberg.net> wrote:
>>>>>
>>>>>> In our environments, we offer two types of storage. Tenants can
>>>>>> either use Ceph/RBD and trade speed/latency for reliability and protection
>>>>>> against physical disk failures, or they can launch instances that are
>>>>>> realized as LVs on an LVM VG that we create on top of a RAID 0 spanning all
>>>>>> but the OS disk on the hypervisor. This lets the users elect to go all-in
>>>>>> on speed and sacrifice reliability for applications where replication/HA is
>>>>>> handled at the app level, if the data on the instance is sourced from
>>>>>> elsewhere, or if they just don't care much about the data.
>>>>>>
>>>>>> There are some further changes to our approach that we would like to
>>>>>> make down the road, but in general our users seem to like the current
>>>>>> system and being able to forgo reliability or speed as their circumstances
>>>>>> demand.
>>>>>>
>>>>>> From: joe at topjian.net
>>>>>> Subject: Re: [Openstack-operators] RAID / stripe block storage volumes
>>>>>>
>>>>>> Hi Robert,
>>>>>>
>>>>>> Can you elaborate on "multiple underlying storage services"?
>>>>>>
>>>>>> The reason I asked the initial question is because historically we've
>>>>>> made our block storage service resilient to failure. Historically we also
>>>>>> made our compute environment resilient to failure, too, but over time,
>>>>>> we've seen users become more educated to cope with compute failure. As a
>>>>>> result, we've been able to become more lenient with regard to building
>>>>>> resilient compute environments.
>>>>>>
>>>>>> We've been discussing how possible it would be to translate that same
>>>>>> idea to block storage. Rather than have a large HA storage cluster (whether
>>>>>> Ceph, Gluster, NetApp, etc), is it possible to offer simple single LVM
>>>>>> volume servers and push the failure handling on to the user?
>>>>>>
>>>>>> Of course, this doesn't work for all types of use cases and
>>>>>> environments. We still have projects which require the cloud to own most
>>>>>> responsibility for failure than the users.
>>>>>>
>>>>>> But for environments were we offer general purpose / best effort
>>>>>> compute and storage, what methods are available to help the user be
>>>>>> resilient to block storage failures?
>>>>>>
>>>>>> Joe
>>>>>>
>>>>>> On Mon, Feb 8, 2016 at 12:09 PM, Robert Starmer <robert at kumul.us>
>>>>>> wrote:
>>>>>>
>>>>>>> I've always recommended providing multiple underlying storage
>>>>>>> services to provide this rather than adding the overhead to the VM.  So,
>>>>>>> not in any of my systems or any I've worked with.
>>>>>>>
>>>>>>> R
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Feb 5, 2016 at 5:56 PM, Joe Topjian <joe at topjian.net> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Does anyone have users RAID'ing or striping multiple block storage
>>>>>>>> volumes from within an instance?
>>>>>>>>
>>>>>>>> If so, what was the experience? Good, bad, possible but with
>>>>>>>> caveats?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Joe
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> OpenStack-operators mailing list
>>>>>>>> OpenStack-operators at lists.openstack.org
>>>>>>>>
>>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> OpenStack-operators mailing listOpenStack-operators at lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> OpenStack-operators mailing list
>>>>>> OpenStack-operators at lists.openstack.org
>>>>>>
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-operators mailing list
>>>> OpenStack-operators at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160208/c170cf86/attachment.html>


More information about the OpenStack-operators mailing list