[Openstack-operators] Experience with Cinder volumes as root disks?

Van Leeuwen, Robert rovanleeuwen at ebay.com
Wed Aug 2 06:54:49 UTC 2017


>>> Mike Smith <mismith at overstock.com>
>>On the plus side, Cinder does allow you to do QOS to limit I/O, whereas I do not believe that’s an option with Nova ephemeral.

You can specify the IOPS limits in the flavor.
Drawbacks:
* You might end up with a lot of different flavors because of IOPS requirements
* Modifying an existing flavor won’t retroactively apply it to existing instances
   You can hack it directly in the database but the instances will still either need to be rebooted or you need to run a lot of virsh command.
   (not sure if this is any better for cinder)

>>> Mike Smith <mismith at overstock.com>
>> And, again depending on the Cinder solution employed, the disk I/O for this kind of setup can be significantly better t
>>han some other options including Nova ephemeral with a Ceph backend.
IMHO specifically ceph performance scales out very well (e.g. lots of 100 IOPS instances) but scaling up might be an issue (e.g. running a significant database with lots of sync writes doing 10K IOPS)
Even with an optimally tuned SSD/nvme clusters it still might not be as fast as you would like it to be.

>>>Kimball, Conrad <conrad.kimball at boeing.com>
>> and while it is possible to boot an image onto a new volume this is clumsy
As mentioned you can make RBD the default backend for ephemeral so you no longer need to specify boot from volume.
Another option would be to use some other automation tools to bring up our instances.
I recommend looking at e.g. terraform or some other way to automate deployments.
Running a single command to install a whole environment, and boot from volume if necessary, is really great and makes sure things are reproducible.
Our tech savvy users like it but if you have people who can just understand the web interface it might be a challenge ;)

Some more points regarding ephemeral local storage:

Pros ephemeral local storage:
* No SPOF for your cloud (e.g. if a ceph software upgrade goes wrong the whole cloud will hang)
* Assuming SSDs: great performance
* Discourages pets, people will get used to instances being down for maintenance or unrecoverable due to hardware failure and will build and automate accordingly
* No volume storage to manage, assuming you will not offer it anyway

Cons ephemeral local storage:
* IMHO live migration with block migrations is not really useable
(the instance will behave a bit slow for some time and e.g. the whole Cassandra or Elasticsearch cluster performance will tank)
* No independent scaling of compute and space. E.g. with ephemeral you might have lots of disk left but no mem/cpu on the compute node or the other way around.
* Hardware failure will mean loss of that local data for at least a period of time assuming recoverable at all. With enough compute nodes this will become weekly/daily events.
* Some pets (e.g. Jenkins boxes) are hard to get rid of even if you control the application landscape to a great degree.

I think that if you have a lot of “pets” or other reasons e.g. a server/rack/availability zone cannot go down for maintenance you probably want to run from volume storage.
You get your data highly available and can do live-migrations for maintenance.
Note that you still have to do some manual work to boot instances somewhere else if a hypervisor goes down but that’s being worked on IIRC.


>>>Kimball, Conrad <conrad.kimball at boeing.com>
>> Bottom line:  it depends what you need, as both options work well and there are people doing both out there in the wild.
Totally agree.


Cheers,
Robert van Leeuwen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170802/c6e8bdcc/attachment-0001.html>


More information about the OpenStack-operators mailing list