[nova] workarounds and operator experience around bug 1522307/1908133

Sean Mooney smooney at redhat.com
Fri Jan 8 22:07:09 UTC 2021


On Fri, 2021-01-08 at 18:27 -0300, Rodrigo Barbieri wrote:
> Thanks for the responses Eugen and Sean!
> 
> The placement.yaml approach sounds good if it can prevent the compute host
> from reporting local_gb repeatedly, and then as you suggested use Placement
> Aggregates I can perhaps make that work for a subset of use cases. Too bad
> it is only available on Victoria+. I was looking for something that could
> work, even if partially, on Queens and Stein.
> 
> The cron job updating the reservation, I'm not sure if it will clash with
> the host updates (being overriden, as I've described in the LP bug), but
> you actually gave me another idea. I may be able to create a fake
> allocation in the nodes to cancel out their reported values, and then rely
> only on the shared value through placement.
well actully you could use the host reserved disk space config value to do that on older releases
just set it equal to the pool size.
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_disk_mb
not sure why that is in MB it really should be GB but anyway if you set that then it will set the placement value.

> 
> Monitoring Ceph is only part of the problem. The second part, if you end up
> needing it (and you may if you're not very conservative in the monitoring
> parameters and have unpredictable workload) is to prevent new instances
> from being created, thus new data from being stored, to prevent it from
> filling up before you can react to it (think of an accidental DoS attack by
> running a certain storage-heavy workloads).
> 
> @Eugen, yes. I was actually looking for more reliable ways to prevent it
> from happening.
> 
> Overall, the shared placement + fake allocation sounded like the cleanest
> workaround for me. I will try that and report back.

if i get time in the next week or two im hoping ot try and tweak our ceph ci job to test
that toplogy in the upstream ci. but just looking a the placemnt funcitonal tests it should work.

This covers the use of sharing resouce providers
https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/shared-resources.yaml

the final section thes the allocation candiate endpoint and asserts we getan allocation for both providres
https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/shared-resources.yaml#L135-L143

its relitivly simple to read this file top to bottom and its only 143 lines long but it basically step
through and constucte the topolgoy i was descifbing or at least a similar ones and shows step by step what
the different behavior will be as the rps are created and aggreates are created exctra.

the main issue with this approch is we dont really have a good way to upgrade existing deployments to this toplogy beyond
live migrating everything one node at a time so that there allcoation will get reshaped as a side effect of the move operation.

looking a tthe history of this file it was added 3 years ago https://github.com/openstack/placement/commit/caeae7a41ed41535195640dfa6c5bb58a7999a9b
around stien although it may also have worked before thatim not sure when we added sharing providers.

> 
> Thanks for the help!
> 
> On Wed, Jan 6, 2021 at 10:57 AM Eugen Block <eblock at nde.ag> wrote:
> 
> > Hi,
> > 
> > we're using OpenStack with Ceph in production and also have customers
> > doing that.
> >  From my point of view fixing nova to be able to deal with shared
> > storage of course would improve many things, but it doesn't liberate
> > you from monitoring your systems. Filling up a ceph cluster should be
> > avoided and therefore proper monitoring is required.
> > 
> > I assume you were able to resolve the frozen instances?
> > 
> > Regards,
> > Eugen
> > 
> > 
> > Zitat von Sean Mooney <smooney at redhat.com>:
> > 
> > > On Tue, 2021-01-05 at 14:17 -0300, Rodrigo Barbieri wrote:
> > > > Hi Nova folks and OpenStack operators!
> > > > 
> > > > I have had some trouble recently where while using the "images_type =
> > rbd"
> > > > libvirt option my ceph cluster got filled up without I noticing and
> > froze
> > > > all my nova services and instances.
> > > > 
> > > > I started digging and investigating why and how I could prevent or
> > > > workaround this issue, but I didn't find a very reliable clean way.
> > > > 
> > > > I documented all my steps and investigation in bug 1908133 [0]. It has
> > been
> > > > marked as a duplicate of 1522307 [1] which has been around for quite
> > some
> > > > time, so I am wondering if any operators have been using nova + ceph in
> > > > production with "images_type = rbd" config set and how you have been
> > > > handling/working around the issue.
> > > 
> > > this is indeed a know issue and the long term plan to fix it was to
> > > track shared storae
> > > as a sharing resouce provide in plamcent. that never happend so
> > > there si currenlty no mechanium
> > > available to prevent this explcitly in nova.
> > > 
> > > the disk filter which is nolonger used could prevnet the boot of a
> > > vm that would fill the ceph pool but
> > > it could not protect against two concurrent request form filling the
> > pool.
> > > 
> > > placement can protect against that due to the transational nature of
> > > allocations which serialise
> > > all resouce useage however since each host reports the total size of
> > > the ceph pool as its local storage that wont work out of the box.
> > > 
> > > as a quick hack what you can do is set the
> > > [DEFAULT]/disk_allocation_ratio=(1/number of compute nodes)
> > > 
> > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.disk_allocation_ratio
> > > on each of your compute agents configs.
> > > 
> > > 
> > > that will prevent over subscription however it has other negitve
> > sidefects.
> > > mainly that you will fail to scudle instance that could boot if a
> > > host exced its 1/n usage
> > > so unless you have perfectly blanced consumtion this is not a good
> > approch.
> > > 
> > > a better appoch but one that requires external scripting is to have
> > > a chron job that will update the resrved
> > >  usaage of each of the disk_gb inventores to the actull amount of of
> > > stoarge allocated form the pool.
> > > 
> > > the real fix however is for nova to tack its shared usage in
> > > placment correctly as a sharing resouce provide.
> > > 
> > > its possible you might be able to do that via the porvider.yaml file
> > > 
> > > by overriding the local disk_gb to 0 on all comupte nodes
> > > then creating a singel haring resouce provider of disk_gb that
> > > models the ceph pool.
> > > 
> > > 
> > https://specs.openstack.org/openstack/nova-specs/specs/ussuri/approved/provider-config-file.html
> > > currently that does not support the addtion of providers to placment
> > > agggreate so while it could be used to 0 out the comptue node
> > > disk inventoies and to create a sharing provider it with the
> > > MISC_SHARES_VIA_AGGREGATE trait it cant do the final step of mapping
> > > which compute nodes can consume form sharing provider via the
> > > agggrate but you could do that form.
> > > that assume that "sharing resouce provdiers" actully work.
> > > 
> > > 
> > > bacialy what it comes down to today is you need to monitor the
> > > avaiable resouce yourslef externally and ensure you never run out of
> > > space.
> > > that sucks but untill we proably track things in plamcent there is
> > > nothign we can really do.
> > > the two approch i suggested above might work for a subset of
> > > usecasue but really this is a feature that need native suport in
> > > nova to adress properly.
> > > 
> > > > 
> > > > Thanks in advance!
> > > > 
> > > > [0] https://bugs.launchpad.net/nova/+bug/1908133
> > > > [1] https://bugs.launchpad.net/nova/+bug/1522307
> > > > 
> > 
> > 
> > 
> > 
> > 
> 





More information about the openstack-discuss mailing list