[Openstack-operators] [openstack][nova] Several questions/experiences about _base directory on a big production environment
alejandro.comisario at mercadolibre.com
Wed Apr 2 18:04:45 UTC 2014
Hi guys ...
We have a pretty big openstack environment and we use a shared NFS to
populate backing file directory ( the famous _base directory located
on /var/lib/nova/instances/_base ) due to a human error, the backing
file used by thousands of guests was deleted, causing this guests to
go read-only filesystem in a second.
Till that moment we were convinced to use the _base directory as a
shared NFS because:
* spawning a new ami gives total visibility to the whole cloud making
instances take nothing to boot despite the nova region
* ease glance workload
* easiest management no having to replicate files constantly not
pushing bandwidth usage internally
But after this really big issue, and after what took us to recover
from this, we were thinking about how to protect against this kind of
"single point of failure".
Our first aproach this days was to put Read Only the NFS share, making
impossible for computes ( and humans ) to write to that directory,
giving permision to just one compute whos the one responsible to spawn
an instance from a new ami and write the file to the directory, still
... the storage keeps being the SPOF.
So, we are handling the possibility of having the used backing files
LOCAL on every compute ( +1K hosts ) and reduce the failure chances to
the minimum, obviously, with a pararell talk about what technology to
use to keep data replicated among computes when a new ami is launched,
launching times, performance matters on compute nodes having to store
backing files locally, etc.
This make me realize, i have a huge comminity behind openstack, so
wanted to ear from it:
* what are your thoughts about what happened / what we are thinking right now ?
* how does other users manage the backing file ( _base ) directory
having all this considerations on big openstack deployments ?
I will be thrilled to read from other users, experiences and thoughts.
As allways, best.
More information about the OpenStack-operators