[Openstack-operators] [openstack][nova] Several questions/experiences about _base directory on a big production environment

Tim Bell Tim.Bell at cern.ch
Thu Apr 3 19:01:42 UTC 2014


Sounds like a great topic for the ops unconference architecture show and tell.... A realistic assessment of what shared storage on OpenStack can currently do (and the prolems it causes) would be very useful input to https://etherpad.openstack.org/p/ATL-ops-unconference-RFC

Tim

From: Joe Topjian [mailto:joe at topjian.net]
Sent: 03 April 2014 05:29
To: matt
Cc: openstack-operators at lists.openstack.org
Subject: Re: [Openstack-operators] [openstack][nova] Several questions/experiences about _base directory on a big production environment

Is it Ceph live migration that you don't think is mature for production or live migration in general? If the latter, I'd like to understand why you feel that way.

Looping back to Alejandro's original message: I share his pain of _base issues. It's happened to me before and it sucks.

We use shared storage for a production cloud of ours. The cloud has a 24x7 SLA and shared storage with live migration helps us achieve that. It's not a silver bullet, but it has saved us so many hours of work.

The remove_unused_base_images option is stable and works. I still disagree with the default value being "true", but I can vouch that it has worked without harm for the past year in an environment where it previously shot me in the foot.

With that option enabled, you should not have to go into _base at all. Any work that we do in _base is manual audits and the rare time when the database might be inconsistent with what's really hosted.

To mitigate against potential _base issues, we just try to be as careful as possible -- measure 5 times before cutting. Our standard procedure is to move the files we plan on removing to a temporary directory and wait a few days to see if any users raise an alarm.

Diego has a great point about not using qemu backing files: if your backend storage implements deduplication and/or compression, you should see the same savings as what _base is trying to achieve.

We're in the process of building a new public cloud and made the decision to not implement shared storage. I have a queue of blog posts that I'd love to write and the thoughts behind this decision is one of them. Very briefly, the decision was based on the SLA that the public cloud will have combined with our feeling that "cattle" instances are more acceptable to the average end-user nowadays.

That's not to say that I'm "done" with shared storage. IMO, it all depends on the environment. One great thing about OpenStack is that it can be tailored to work in so many different environments.


On Wed, Apr 2, 2014 at 5:48 PM, matt <matt at nycresistor.com<mailto:matt at nycresistor.com>> wrote:
there's shared storage on a centralized network filesystem... then there's shared storage on a distributed network filesystem.  thus the age old openafs vs nfs war is reborn.
i'd check out ceph block device for live migration... but saying that... live migration has not achieved a maturity level that i'd even consider trying it in production.
-matt

On Wed, Apr 2, 2014 at 7:40 PM, Chris Friesen <chris.friesen at windriver.com<mailto:chris.friesen at windriver.com>> wrote:
So if you're recommending not using shared storage, what's your answer to people asking for live-migration?  (Given that block migration is supposed to be going away.)

Chris


On 04/02/2014 05:08 PM, George Shuklin wrote:
Every time anyone start to consolidate resources (shared storage,
virtual chassis for router, etc), it consolidate all failures to one.
One failure and every consolidated system participating in festival.

Then they starts to increase fault tolerance of consolidated system,
raising administrative plank to the sky, requesting more and more
hardware for the clustering, requesting enterprise-grade, "no one was
fired buying enterprise <bullshit-brand-name-here>". As result -
consolidated system works with same MTBF as non-consolidated, saving
"costs" compare to even more enterprise-grade super-solution with cost
of few percent countries GDP, and actually costs more than
non-consolidated solution.

Failure for x86 is ALWAYS option. Processor can not repeat instructions,
no comparator between few parallel processors, and so on. Compare to
mainframes. So, if failure is an option, that means, reduce importance
of that failure, it scope.

If one of 1k hosts goes down for three hours this is sad. But it much
much much better than central system every of 1k hosts depends on goes
down just for 11 seconds (3h*3600/1000).

So answer is simple: do not aggregate. But _base to slower drives if you
want to save costs, but do not consolidate failures.

On 04/02/2014 09:04 PM, Alejandro Comisario wrote:
Hi guys ...
We have a pretty big openstack environment and we use a shared NFS to
populate backing file directory ( the famous _base directory located
on /var/lib/nova/instances/_base ) due to a human error, the backing
file used by thousands of guests was deleted, causing this guests to
go read-only filesystem in a second.

Till that moment we were convinced to use the _base directory as a
shared NFS because:

* spawning a new ami gives total visibility to the whole cloud making
instances take nothing to boot despite the nova region
* ease glance workload
* easiest management no having to replicate files constantly not
pushing bandwidth usage internally

But after this really big issue, and after what took us to recover
from this, we were thinking about how to protect against this kind of
"single point of failure".
Our first aproach this days was to put Read Only the NFS share, making
impossible for computes ( and humans ) to write to that directory,
giving permision to just one compute whos the one responsible to spawn
an instance from a new ami and write the file to the directory, still
... the storage keeps being the SPOF.

So, we are handling the possibility of having the used backing files
LOCAL on every compute ( +1K hosts ) and reduce the failure chances to
the minimum, obviously, with a pararell talk about what technology to
use to keep data replicated among computes when a new ami is launched,
launching times, performance matters on compute nodes having to store
backing files locally, etc.

This make me realize, i have a huge comminity behind openstack, so
wanted to ear from it:

* what are your thoughts about what happened / what we are thinking
right now ?
* how does other users manage the backing file ( _base ) directory
having all this considerations on big openstack deployments ?

I will be thrilled to read from other users, experiences and thoughts.

As allways, best.
Alejandro

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators at lists.openstack.org<mailto:OpenStack-operators at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


_______________________________________________
OpenStack-operators mailing list
OpenStack-operators at lists.openstack.org<mailto:OpenStack-operators at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


_______________________________________________
OpenStack-operators mailing list
OpenStack-operators at lists.openstack.org<mailto:OpenStack-operators at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


_______________________________________________
OpenStack-operators mailing list
OpenStack-operators at lists.openstack.org<mailto:OpenStack-operators at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140403/93e75f9a/attachment.html>


More information about the OpenStack-operators mailing list