[Openstack-operators] [openstack][nova] Several questions/experiences about _base directory on a big production environment
matt at nycresistor.com
Wed Apr 2 18:27:37 UTC 2014
the original goal of cloud and openstack was to adhere to a service
oriented architecture so that you could rely on n+1 service structures.
this goes back to the puppies versus cattle thing. of course, openstack
just isn't all the way there yet, and when we as operators go out and try
to ensure parts of our infrastructure don't go down, or have to deal with
very real hardware / financial constraints... we end up violating that
I'd say that by centralizing _base you basically setup a vertical ( or a
puppy ). Much like we have to deal with in MySQL for openstack itself
today. We're still building verticals when the goal was to remove
verticals from the herd and rely on horizontal scaling.
So from a theory perspective, it's not the direction we want to go in terms
of openstack culture. By keeping _base local to each compute node, you are
ensuring the compute nodes can operate independently without reliance on
any centralized service. That's the theoretical goal. Of course, that can
be a costly goal. Not just in terms of _base.
So I think we're still struggling with achieving what is still a
theoretical ideal rather than building against a proven model. And that's
what happens when you decide to operate on the ragged edge of technology.
>From my perspective I'd have not setup the NFS as a direct link to the
compute nodes. I'd have maybe rsync'ed against the NFS as a sort of
backup. But I'd have wanted a local path that could operate independently
even if in limited capacity if the authoritative data source had failed.
It's fine to build authoritative sources, and central repos... as long as
they're failure will not impact the service oriented architecture
That's my stream of thought on this. I'd love to hear other folks ideas.
On Wed, Apr 2, 2014 at 2:04 PM, Alejandro Comisario <
alejandro.comisario at mercadolibre.com> wrote:
> Hi guys ...
> We have a pretty big openstack environment and we use a shared NFS to
> populate backing file directory ( the famous _base directory located
> on /var/lib/nova/instances/_base ) due to a human error, the backing
> file used by thousands of guests was deleted, causing this guests to
> go read-only filesystem in a second.
> Till that moment we were convinced to use the _base directory as a
> shared NFS because:
> * spawning a new ami gives total visibility to the whole cloud making
> instances take nothing to boot despite the nova region
> * ease glance workload
> * easiest management no having to replicate files constantly not
> pushing bandwidth usage internally
> But after this really big issue, and after what took us to recover
> from this, we were thinking about how to protect against this kind of
> "single point of failure".
> Our first aproach this days was to put Read Only the NFS share, making
> impossible for computes ( and humans ) to write to that directory,
> giving permision to just one compute whos the one responsible to spawn
> an instance from a new ami and write the file to the directory, still
> ... the storage keeps being the SPOF.
> So, we are handling the possibility of having the used backing files
> LOCAL on every compute ( +1K hosts ) and reduce the failure chances to
> the minimum, obviously, with a pararell talk about what technology to
> use to keep data replicated among computes when a new ami is launched,
> launching times, performance matters on compute nodes having to store
> backing files locally, etc.
> This make me realize, i have a huge comminity behind openstack, so
> wanted to ear from it:
> * what are your thoughts about what happened / what we are thinking right
> now ?
> * how does other users manage the backing file ( _base ) directory
> having all this considerations on big openstack deployments ?
> I will be thrilled to read from other users, experiences and thoughts.
> As allways, best.
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-operators