<div dir="ltr"><div><div>there's shared storage on a centralized network filesystem... then there's shared storage on a distributed network filesystem.  thus the age old openafs vs nfs war is reborn.<br><br></div>i'd check out ceph block device for live migration... but saying that... live migration has not achieved a maturity level that i'd even consider trying it in production.<br>

<br></div>-matt<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Apr 2, 2014 at 7:40 PM, Chris Friesen <span dir="ltr"><<a href="mailto:chris.friesen@windriver.com" target="_blank">chris.friesen@windriver.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">So if you're recommending not using shared storage, what's your answer to people asking for live-migration?  (Given that block migration is supposed to be going away.)<span class="HOEnZb"><font color="#888888"><br>


<br>

Chris</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

On 04/02/2014 05:08 PM, George Shuklin wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Every time anyone start to consolidate resources (shared storage,<br>

virtual chassis for router, etc), it consolidate all failures to one.<br>

One failure and every consolidated system participating in festival.<br>

<br>

Then they starts to increase fault tolerance of consolidated system,<br>

raising administrative plank to the sky, requesting more and more<br>

hardware for the clustering, requesting enterprise-grade, "no one was<br>

fired buying enterprise <bullshit-brand-name-here>". As result -<br>

consolidated system works with same MTBF as non-consolidated, saving<br>

"costs" compare to even more enterprise-grade super-solution with cost<br>

of few percent countries GDP, and actually costs more than<br>

non-consolidated solution.<br>

<br>

Failure for x86 is ALWAYS option. Processor can not repeat instructions,<br>

no comparator between few parallel processors, and so on. Compare to<br>

mainframes. So, if failure is an option, that means, reduce importance<br>

of that failure, it scope.<br>

<br>

If one of 1k hosts goes down for three hours this is sad. But it much<br>

much much better than central system every of 1k hosts depends on goes<br>

down just for 11 seconds (3h*3600/1000).<br>

<br>

So answer is simple: do not aggregate. But _base to slower drives if you<br>

want to save costs, but do not consolidate failures.<br>

<br>

On 04/02/2014 09:04 PM, Alejandro Comisario wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi guys ...<br>

We have a pretty big openstack environment and we use a shared NFS to<br>

populate backing file directory ( the famous _base directory located<br>

on /var/lib/nova/instances/_base ) due to a human error, the backing<br>

file used by thousands of guests was deleted, causing this guests to<br>

go read-only filesystem in a second.<br>

<br>

Till that moment we were convinced to use the _base directory as a<br>

shared NFS because:<br>

<br>

* spawning a new ami gives total visibility to the whole cloud making<br>

instances take nothing to boot despite the nova region<br>

* ease glance workload<br>

* easiest management no having to replicate files constantly not<br>

pushing bandwidth usage internally<br>

<br>

But after this really big issue, and after what took us to recover<br>

from this, we were thinking about how to protect against this kind of<br>

"single point of failure".<br>

Our first aproach this days was to put Read Only the NFS share, making<br>

impossible for computes ( and humans ) to write to that directory,<br>

giving permision to just one compute whos the one responsible to spawn<br>

an instance from a new ami and write the file to the directory, still<br>

... the storage keeps being the SPOF.<br>

<br>

So, we are handling the possibility of having the used backing files<br>

LOCAL on every compute ( +1K hosts ) and reduce the failure chances to<br>

the minimum, obviously, with a pararell talk about what technology to<br>

use to keep data replicated among computes when a new ami is launched,<br>

launching times, performance matters on compute nodes having to store<br>

backing files locally, etc.<br>

<br>

This make me realize, i have a huge comminity behind openstack, so<br>

wanted to ear from it:<br>

<br>

* what are your thoughts about what happened / what we are thinking<br>

right now ?<br>

* how does other users manage the backing file ( _base ) directory<br>

having all this considerations on big openstack deployments ?<br>

<br>

I will be thrilled to read from other users, experiences and thoughts.<br>

<br>

As allways, best.<br>

Alejandro<br>

<br>

______________________________<u></u>_________________<br>

OpenStack-operators mailing list<br>

<a href="mailto:OpenStack-operators@lists.openstack.org" target="_blank">OpenStack-operators@lists.<u></u>openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/<u></u>cgi-bin/mailman/listinfo/<u></u>openstack-operators</a><br>

</blockquote>

<br>

<br>

______________________________<u></u>_________________<br>

OpenStack-operators mailing list<br>

<a href="mailto:OpenStack-operators@lists.openstack.org" target="_blank">OpenStack-operators@lists.<u></u>openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/<u></u>cgi-bin/mailman/listinfo/<u></u>openstack-operators</a><br>

</blockquote>

<br>

<br>

______________________________<u></u>_________________<br>

OpenStack-operators mailing list<br>

<a href="mailto:OpenStack-operators@lists.openstack.org" target="_blank">OpenStack-operators@lists.<u></u>openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/<u></u>cgi-bin/mailman/listinfo/<u></u>openstack-operators</a><br>

</div></div></blockquote></div><br></div>