<div dir="ltr"><div><div><div><div><div><div>the original goal of cloud and openstack was to adhere to a service oriented architecture so that you could rely on n+1 service structures. <br><br></div>this goes back to the puppies versus cattle thing.  of course, openstack just isn't all the way there yet, and when we as operators go out and try to ensure parts of our infrastructure don't go down, or have to deal with very real hardware / financial constraints... we end up violating that lofty goal.<br>

<br></div>I'd say that by centralizing _base you basically setup a vertical ( or a puppy ).  Much like we have to deal with in MySQL for openstack itself today.  We're still building verticals when the goal was to remove verticals from the herd and rely on horizontal scaling.<br>

<br></div>So from a theory perspective, it's not the direction we want to go in terms of openstack culture.  By keeping _base local to each compute node, you are ensuring the compute nodes can operate independently without reliance on any centralized service.  That's the theoretical goal.  Of course, that can be a costly goal.  Not just in terms of _base.<br>

<br></div>So I think we're still struggling with achieving what is still a theoretical ideal rather than building against a proven model.  And that's what happens when you decide to operate on the ragged edge of technology. <br>

<br></div>From my perspective I'd have not setup the NFS as a direct link to the compute nodes.  I'd have maybe rsync'ed against the NFS as a sort of backup.  But I'd have wanted a local path that could operate independently even if in limited capacity if the authoritative data source had failed.  It's fine to build authoritative sources, and central repos... as long as they're failure will not impact the service oriented architecture components.  <br>

<br>That's my stream of thought on this.  I'd love to hear other folks ideas.<br><br></div>-Matt<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Apr 2, 2014 at 2:04 PM, Alejandro Comisario <span dir="ltr"><<a href="mailto:alejandro.comisario@mercadolibre.com" target="_blank">alejandro.comisario@mercadolibre.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi guys ...<br>

We have a pretty big openstack environment and we use a shared NFS to<br>

populate backing file directory ( the famous _base directory located<br>

on /var/lib/nova/instances/_base ) due to a human error, the backing<br>

file used by thousands of guests was deleted, causing this guests to<br>

go read-only filesystem in a second.<br>

<br>

Till that moment we were convinced to use the _base directory as a<br>

shared NFS because:<br>

<br>

* spawning a new ami gives total visibility to the whole cloud making<br>

instances take nothing to boot despite the nova region<br>

* ease glance workload<br>

* easiest management no having to replicate files constantly not<br>

pushing bandwidth usage internally<br>

<br>

But after this really big issue, and after what took us to recover<br>

from this, we were thinking about how to protect against this kind of<br>

"single point of failure".<br>

Our first aproach this days was to put Read Only the NFS share, making<br>

impossible for computes ( and humans ) to write to that directory,<br>

giving permision to just one compute whos the one responsible to spawn<br>

an instance from a new ami and write the file to the directory, still<br>

... the storage keeps being the SPOF.<br>

<br>

So, we are handling the possibility of having the used backing files<br>

LOCAL on every compute ( +1K hosts ) and reduce the failure chances to<br>

the minimum, obviously, with a pararell talk about what technology to<br>

use to keep data replicated among computes when a new ami is launched,<br>

launching times, performance matters on compute nodes having to store<br>

backing files locally, etc.<br>

<br>

This make me realize, i have a huge comminity behind openstack, so<br>

wanted to ear from it:<br>

<br>

* what are your thoughts about what happened / what we are thinking right now ?<br>

* how does other users manage the backing file ( _base ) directory<br>

having all this considerations on big openstack deployments ?<br>

<br>

I will be thrilled to read from other users, experiences and thoughts.<br>

<br>

As allways, best.<br>

Alejandro<br>

<br>

_______________________________________________<br>

OpenStack-operators mailing list<br>

<a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>

</blockquote></div><br></div>