[Openstack-operators] [Openstack] [openstack][nova] Several questions/experiences about _base directory on a big production environment
Gustavo Randich
gustavo.randich at gmail.com
Fri Apr 4 16:52:52 UTC 2014
Hi Alejandro,
In our case, though we use shared storage for volumes and application data,
we use local disks for the VM's backing files (_base).
To mitigate the space and performance issues, we adopted the following
measures, of which standardization and minimization of ami's quantity are
very important:
* Minimization of number of ami's by standarizing common software into a
golden image (also improves deployment speed); we only use two standard
versioned ami's. Only the latest version is used and cached. Any additional
software installation is scripted and prepackaged in repositories
* Cleaning of old/unused backing files in _base directory using the
configurable nova-compute periodic task (new in folsom) and also a custom
script which cleans unresized/unsuffixed backing files. The later is not
necessary in grizzly.
* Reservation of a percentage of unallocable space in each compute host
specifically for _base files
* Use of glance-cache tools for caching locally to avoid network usage at
instance launch time; we cache only golden images
* Better use of local disk space by using RAID 0 => Instances are
disposable => lots of instances => lot of redundancy at the app level
Cheers,
--
Gustavo Randich
Devop
Despegar.com
On Thu, Apr 3, 2014 at 6:41 PM, Alejandro Comisario <
alejandro.comisario at mercadolibre.com> wrote:
> I would love to have insights regarding people using _base with no
> shared storage but locally on the compute, up&down sides, experiences
> & comments.
>
> Having base files on the same SATA disks where vm's are running seems
> big when decoupling _base from shared storage.
>
> best regards.
> Alejandro
>
> On Thu, Apr 3, 2014 at 11:19 AM, Alejandro Comisario
> <alejandro.comisario at mercadolibre.com> wrote:
> >
> > Thanks to everyone for the prompted respones!
> > Its clear that _base on NFS is not the way to go 100% when thinking
> > about avoiding dissasters.
> > So, i believe its good to start talking about not using _base backing
> > files and maybe impressions IF using _base, concerns about having
> > these files locally on the compute on the same disks where the vms are
> > running ( in our case are SATA disks ).
> >
> > That kind of discussion is think is the most relevant one.
> > What are the experiences of running the backing files locally on the
> > same compute where vms are running ?
> >
> > best
> > Alejandro Comisario
> >
> > On Thu, Apr 3, 2014 at 12:28 AM, Joe Topjian <joe at topjian.net> wrote:
> > > Is it Ceph live migration that you don't think is mature for
> production or
> > > live migration in general? If the latter, I'd like to understand why
> you
> > > feel that way.
> > >
> > > Looping back to Alejandro's original message: I share his pain of _base
> > > issues. It's happened to me before and it sucks.
> > >
> > > We use shared storage for a production cloud of ours. The cloud has a
> 24x7
> > > SLA and shared storage with live migration helps us achieve that. It's
> not a
> > > silver bullet, but it has saved us so many hours of work.
> > >
> > > The remove_unused_base_images option is stable and works. I still
> disagree
> > > with the default value being "true", but I can vouch that it has worked
> > > without harm for the past year in an environment where it previously
> shot me
> > > in the foot.
> > >
> > > With that option enabled, you should not have to go into _base at all.
> Any
> > > work that we do in _base is manual audits and the rare time when the
> > > database might be inconsistent with what's really hosted.
> > >
> > > To mitigate against potential _base issues, we just try to be as
> careful as
> > > possible -- measure 5 times before cutting. Our standard procedure is
> to
> > > move the files we plan on removing to a temporary directory and wait a
> few
> > > days to see if any users raise an alarm.
> > >
> > > Diego has a great point about not using qemu backing files: if your
> backend
> > > storage implements deduplication and/or compression, you should see
> the same
> > > savings as what _base is trying to achieve.
> > >
> > > We're in the process of building a new public cloud and made the
> decision to
> > > not implement shared storage. I have a queue of blog posts that I'd
> love to
> > > write and the thoughts behind this decision is one of them. Very
> briefly,
> > > the decision was based on the SLA that the public cloud will have
> combined
> > > with our feeling that "cattle" instances are more acceptable to the
> average
> > > end-user nowadays.
> > >
> > > That's not to say that I'm "done" with shared storage. IMO, it all
> depends
> > > on the environment. One great thing about OpenStack is that it can be
> > > tailored to work in so many different environments.
> > >
> > >
> > >
> > > On Wed, Apr 2, 2014 at 5:48 PM, matt <matt at nycresistor.com> wrote:
> > >>
> > >> there's shared storage on a centralized network filesystem... then
> there's
> > >> shared storage on a distributed network filesystem. thus the age old
> > >> openafs vs nfs war is reborn.
> > >>
> > >> i'd check out ceph block device for live migration... but saying
> that...
> > >> live migration has not achieved a maturity level that i'd even
> consider
> > >> trying it in production.
> > >>
> > >> -matt
> > >>
> > >>
> > >> On Wed, Apr 2, 2014 at 7:40 PM, Chris Friesen
> > >> <chris.friesen at windriver.com> wrote:
> > >>>
> > >>> So if you're recommending not using shared storage, what's your
> answer to
> > >>> people asking for live-migration? (Given that block migration is
> supposed
> > >>> to be going away.)
> > >>>
> > >>> Chris
> > >>>
> > >>>
> > >>> On 04/02/2014 05:08 PM, George Shuklin wrote:
> > >>>>
> > >>>> Every time anyone start to consolidate resources (shared storage,
> > >>>> virtual chassis for router, etc), it consolidate all failures to
> one.
> > >>>> One failure and every consolidated system participating in festival.
> > >>>>
> > >>>> Then they starts to increase fault tolerance of consolidated system,
> > >>>> raising administrative plank to the sky, requesting more and more
> > >>>> hardware for the clustering, requesting enterprise-grade, "no one
> was
> > >>>> fired buying enterprise <bullshit-brand-name-here>". As result -
> > >>>> consolidated system works with same MTBF as non-consolidated, saving
> > >>>> "costs" compare to even more enterprise-grade super-solution with
> cost
> > >>>> of few percent countries GDP, and actually costs more than
> > >>>> non-consolidated solution.
> > >>>>
> > >>>> Failure for x86 is ALWAYS option. Processor can not repeat
> instructions,
> > >>>> no comparator between few parallel processors, and so on. Compare to
> > >>>> mainframes. So, if failure is an option, that means, reduce
> importance
> > >>>> of that failure, it scope.
> > >>>>
> > >>>> If one of 1k hosts goes down for three hours this is sad. But it
> much
> > >>>> much much better than central system every of 1k hosts depends on
> goes
> > >>>> down just for 11 seconds (3h*3600/1000).
> > >>>>
> > >>>> So answer is simple: do not aggregate. But _base to slower drives
> if you
> > >>>> want to save costs, but do not consolidate failures.
> > >>>>
> > >>>> On 04/02/2014 09:04 PM, Alejandro Comisario wrote:
> > >>>>>
> > >>>>> Hi guys ...
> > >>>>> We have a pretty big openstack environment and we use a shared NFS
> to
> > >>>>> populate backing file directory ( the famous _base directory
> located
> > >>>>> on /var/lib/nova/instances/_base ) due to a human error, the
> backing
> > >>>>> file used by thousands of guests was deleted, causing this guests
> to
> > >>>>> go read-only filesystem in a second.
> > >>>>>
> > >>>>> Till that moment we were convinced to use the _base directory as a
> > >>>>> shared NFS because:
> > >>>>>
> > >>>>> * spawning a new ami gives total visibility to the whole cloud
> making
> > >>>>> instances take nothing to boot despite the nova region
> > >>>>> * ease glance workload
> > >>>>> * easiest management no having to replicate files constantly not
> > >>>>> pushing bandwidth usage internally
> > >>>>>
> > >>>>> But after this really big issue, and after what took us to recover
> > >>>>> from this, we were thinking about how to protect against this kind
> of
> > >>>>> "single point of failure".
> > >>>>> Our first aproach this days was to put Read Only the NFS share,
> making
> > >>>>> impossible for computes ( and humans ) to write to that directory,
> > >>>>> giving permision to just one compute whos the one responsible to
> spawn
> > >>>>> an instance from a new ami and write the file to the directory,
> still
> > >>>>> ... the storage keeps being the SPOF.
> > >>>>>
> > >>>>> So, we are handling the possibility of having the used backing
> files
> > >>>>> LOCAL on every compute ( +1K hosts ) and reduce the failure
> chances to
> > >>>>> the minimum, obviously, with a pararell talk about what technology
> to
> > >>>>> use to keep data replicated among computes when a new ami is
> launched,
> > >>>>> launching times, performance matters on compute nodes having to
> store
> > >>>>> backing files locally, etc.
> > >>>>>
> > >>>>> This make me realize, i have a huge comminity behind openstack, so
> > >>>>> wanted to ear from it:
> > >>>>>
> > >>>>> * what are your thoughts about what happened / what we are thinking
> > >>>>> right now ?
> > >>>>> * how does other users manage the backing file ( _base ) directory
> > >>>>> having all this considerations on big openstack deployments ?
> > >>>>>
> > >>>>> I will be thrilled to read from other users, experiences and
> thoughts.
> > >>>>>
> > >>>>> As allways, best.
> > >>>>> Alejandro
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> OpenStack-operators mailing list
> > >>>>> OpenStack-operators at lists.openstack.org
> > >>>>>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > >>>>
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> OpenStack-operators mailing list
> > >>>> OpenStack-operators at lists.openstack.org
> > >>>>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> OpenStack-operators mailing list
> > >>> OpenStack-operators at lists.openstack.org
> > >>>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> OpenStack-operators mailing list
> > >> OpenStack-operators at lists.openstack.org
> > >>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > >>
> > >
> > >
> > > _______________________________________________
> > > OpenStack-operators mailing list
> > > OpenStack-operators at lists.openstack.org
> > >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > >
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140404/7a376966/attachment.html>
More information about the OpenStack-operators
mailing list