<div dir="ltr"><div>Hi Alejandro,</div><div><br></div><div>In our case, though we use shared storage for volumes and application data, we use local disks for the VM's backing files (_base).</div><div><br></div><div>To mitigate the space and performance issues, we adopted the following measures, of which standardization and minimization of ami's quantity are very important:</div>
<div><br></div><div>* Minimization of number of ami's by standarizing common software into a golden image (also improves deployment speed); we only use two standard versioned ami's. Only the latest version is used and cached. Any additional software installation is scripted and prepackaged in repositories</div>
<div>* Cleaning of old/unused backing files in _base directory using the configurable nova-compute periodic task (new in folsom) and also a custom script which cleans unresized/unsuffixed backing files. The later is not necessary in grizzly.</div>
<div>* Reservation of a percentage of unallocable space in each compute host specifically for _base files</div><div>* Use of glance-cache tools for caching locally to avoid network usage at instance launch time; we cache only golden images</div>
<div>* Better use of local disk space by using RAID 0 => Instances are disposable => lots of instances => lot of redundancy at the app level</div><div><br></div><div>Cheers,</div><div><br></div><div><br></div><div>
<br></div><div>--</div><div>Gustavo Randich</div><div>Devop</div><div>Despegar.com</div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Apr 3, 2014 at 6:41 PM, Alejandro Comisario <span dir="ltr"><<a href="mailto:alejandro.comisario@mercadolibre.com" target="_blank">alejandro.comisario@mercadolibre.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I would love to have insights regarding people using _base with no<br>
shared storage but locally on the compute, up&down sides, experiences<br>
& comments.<br>
<br>
Having base files on the same SATA disks where vm's are running seems<br>
big when decoupling _base from shared storage.<br>
<br>
best regards.<br>
<span class="HOEnZb"><font color="#888888">Alejandro<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
On Thu, Apr 3, 2014 at 11:19 AM, Alejandro Comisario<br>
<<a href="mailto:alejandro.comisario@mercadolibre.com">alejandro.comisario@mercadolibre.com</a>> wrote:<br>
><br>
> Thanks to everyone for the prompted respones!<br>
> Its clear that _base on NFS is not the way to go 100% when thinking<br>
> about avoiding dissasters.<br>
> So, i believe its good to start talking about not using _base backing<br>
> files and maybe impressions IF using _base, concerns about having<br>
> these files locally on the compute on the same disks where the vms are<br>
> running ( in our case are SATA disks ).<br>
><br>
> That kind of discussion is think is the most relevant one.<br>
> What are the experiences of running the backing files locally on the<br>
> same compute where vms are running ?<br>
><br>
> best<br>
> Alejandro Comisario<br>
><br>
> On Thu, Apr 3, 2014 at 12:28 AM, Joe Topjian <<a href="mailto:joe@topjian.net">joe@topjian.net</a>> wrote:<br>
> > Is it Ceph live migration that you don't think is mature for production or<br>
> > live migration in general? If the latter, I'd like to understand why you<br>
> > feel that way.<br>
> ><br>
> > Looping back to Alejandro's original message: I share his pain of _base<br>
> > issues. It's happened to me before and it sucks.<br>
> ><br>
> > We use shared storage for a production cloud of ours. The cloud has a 24x7<br>
> > SLA and shared storage with live migration helps us achieve that. It's not a<br>
> > silver bullet, but it has saved us so many hours of work.<br>
> ><br>
> > The remove_unused_base_images option is stable and works. I still disagree<br>
> > with the default value being "true", but I can vouch that it has worked<br>
> > without harm for the past year in an environment where it previously shot me<br>
> > in the foot.<br>
> ><br>
> > With that option enabled, you should not have to go into _base at all. Any<br>
> > work that we do in _base is manual audits and the rare time when the<br>
> > database might be inconsistent with what's really hosted.<br>
> ><br>
> > To mitigate against potential _base issues, we just try to be as careful as<br>
> > possible -- measure 5 times before cutting. Our standard procedure is to<br>
> > move the files we plan on removing to a temporary directory and wait a few<br>
> > days to see if any users raise an alarm.<br>
> ><br>
> > Diego has a great point about not using qemu backing files: if your backend<br>
> > storage implements deduplication and/or compression, you should see the same<br>
> > savings as what _base is trying to achieve.<br>
> ><br>
> > We're in the process of building a new public cloud and made the decision to<br>
> > not implement shared storage. I have a queue of blog posts that I'd love to<br>
> > write and the thoughts behind this decision is one of them. Very briefly,<br>
> > the decision was based on the SLA that the public cloud will have combined<br>
> > with our feeling that "cattle" instances are more acceptable to the average<br>
> > end-user nowadays.<br>
> ><br>
> > That's not to say that I'm "done" with shared storage. IMO, it all depends<br>
> > on the environment. One great thing about OpenStack is that it can be<br>
> > tailored to work in so many different environments.<br>
> ><br>
> ><br>
> ><br>
> > On Wed, Apr 2, 2014 at 5:48 PM, matt <<a href="mailto:matt@nycresistor.com">matt@nycresistor.com</a>> wrote:<br>
> >><br>
> >> there's shared storage on a centralized network filesystem... then there's<br>
> >> shared storage on a distributed network filesystem. thus the age old<br>
> >> openafs vs nfs war is reborn.<br>
> >><br>
> >> i'd check out ceph block device for live migration... but saying that...<br>
> >> live migration has not achieved a maturity level that i'd even consider<br>
> >> trying it in production.<br>
> >><br>
> >> -matt<br>
> >><br>
> >><br>
> >> On Wed, Apr 2, 2014 at 7:40 PM, Chris Friesen<br>
> >> <<a href="mailto:chris.friesen@windriver.com">chris.friesen@windriver.com</a>> wrote:<br>
> >>><br>
> >>> So if you're recommending not using shared storage, what's your answer to<br>
> >>> people asking for live-migration? (Given that block migration is supposed<br>
> >>> to be going away.)<br>
> >>><br>
> >>> Chris<br>
> >>><br>
> >>><br>
> >>> On 04/02/2014 05:08 PM, George Shuklin wrote:<br>
> >>>><br>
> >>>> Every time anyone start to consolidate resources (shared storage,<br>
> >>>> virtual chassis for router, etc), it consolidate all failures to one.<br>
> >>>> One failure and every consolidated system participating in festival.<br>
> >>>><br>
> >>>> Then they starts to increase fault tolerance of consolidated system,<br>
> >>>> raising administrative plank to the sky, requesting more and more<br>
> >>>> hardware for the clustering, requesting enterprise-grade, "no one was<br>
> >>>> fired buying enterprise <bullshit-brand-name-here>". As result -<br>
> >>>> consolidated system works with same MTBF as non-consolidated, saving<br>
> >>>> "costs" compare to even more enterprise-grade super-solution with cost<br>
> >>>> of few percent countries GDP, and actually costs more than<br>
> >>>> non-consolidated solution.<br>
> >>>><br>
> >>>> Failure for x86 is ALWAYS option. Processor can not repeat instructions,<br>
> >>>> no comparator between few parallel processors, and so on. Compare to<br>
> >>>> mainframes. So, if failure is an option, that means, reduce importance<br>
> >>>> of that failure, it scope.<br>
> >>>><br>
> >>>> If one of 1k hosts goes down for three hours this is sad. But it much<br>
> >>>> much much better than central system every of 1k hosts depends on goes<br>
> >>>> down just for 11 seconds (3h*3600/1000).<br>
> >>>><br>
> >>>> So answer is simple: do not aggregate. But _base to slower drives if you<br>
> >>>> want to save costs, but do not consolidate failures.<br>
> >>>><br>
> >>>> On 04/02/2014 09:04 PM, Alejandro Comisario wrote:<br>
> >>>>><br>
> >>>>> Hi guys ...<br>
> >>>>> We have a pretty big openstack environment and we use a shared NFS to<br>
> >>>>> populate backing file directory ( the famous _base directory located<br>
> >>>>> on /var/lib/nova/instances/_base ) due to a human error, the backing<br>
> >>>>> file used by thousands of guests was deleted, causing this guests to<br>
> >>>>> go read-only filesystem in a second.<br>
> >>>>><br>
> >>>>> Till that moment we were convinced to use the _base directory as a<br>
> >>>>> shared NFS because:<br>
> >>>>><br>
> >>>>> * spawning a new ami gives total visibility to the whole cloud making<br>
> >>>>> instances take nothing to boot despite the nova region<br>
> >>>>> * ease glance workload<br>
> >>>>> * easiest management no having to replicate files constantly not<br>
> >>>>> pushing bandwidth usage internally<br>
> >>>>><br>
> >>>>> But after this really big issue, and after what took us to recover<br>
> >>>>> from this, we were thinking about how to protect against this kind of<br>
> >>>>> "single point of failure".<br>
> >>>>> Our first aproach this days was to put Read Only the NFS share, making<br>
> >>>>> impossible for computes ( and humans ) to write to that directory,<br>
> >>>>> giving permision to just one compute whos the one responsible to spawn<br>
> >>>>> an instance from a new ami and write the file to the directory, still<br>
> >>>>> ... the storage keeps being the SPOF.<br>
> >>>>><br>
> >>>>> So, we are handling the possibility of having the used backing files<br>
> >>>>> LOCAL on every compute ( +1K hosts ) and reduce the failure chances to<br>
> >>>>> the minimum, obviously, with a pararell talk about what technology to<br>
> >>>>> use to keep data replicated among computes when a new ami is launched,<br>
> >>>>> launching times, performance matters on compute nodes having to store<br>
> >>>>> backing files locally, etc.<br>
> >>>>><br>
> >>>>> This make me realize, i have a huge comminity behind openstack, so<br>
> >>>>> wanted to ear from it:<br>
> >>>>><br>
> >>>>> * what are your thoughts about what happened / what we are thinking<br>
> >>>>> right now ?<br>
> >>>>> * how does other users manage the backing file ( _base ) directory<br>
> >>>>> having all this considerations on big openstack deployments ?<br>
> >>>>><br>
> >>>>> I will be thrilled to read from other users, experiences and thoughts.<br>
> >>>>><br>
> >>>>> As allways, best.<br>
> >>>>> Alejandro<br>
> >>>>><br>
> >>>>> _______________________________________________<br>
> >>>>> OpenStack-operators mailing list<br>
> >>>>> <a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a><br>
> >>>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>
> >>>><br>
> >>>><br>
> >>>><br>
> >>>> _______________________________________________<br>
> >>>> OpenStack-operators mailing list<br>
> >>>> <a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a><br>
> >>>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>
> >>><br>
> >>><br>
> >>><br>
> >>> _______________________________________________<br>
> >>> OpenStack-operators mailing list<br>
> >>> <a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a><br>
> >>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>
> >><br>
> >><br>
> >><br>
> >> _______________________________________________<br>
> >> OpenStack-operators mailing list<br>
> >> <a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a><br>
> >> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>
> >><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > OpenStack-operators mailing list<br>
> > <a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a><br>
> > <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>
> ><br>
<br>
_______________________________________________<br>
Mailing list: <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack</a><br>
Post to : <a href="mailto:openstack@lists.openstack.org">openstack@lists.openstack.org</a><br>
Unsubscribe : <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack</a><br>
</div></div></blockquote></div><br></div>