[nova] workarounds and operator experience around bug 1522307/1908133

Sean Mooney smooney at redhat.com
Thu Jan 14 17:15:38 UTC 2021


On Thu, 2021-01-14 at 13:50 -0300, Rodrigo Barbieri wrote:
> Hello there,
> 
> Thanks Sean for the suggestions. I've tested them and reported my findings
> in https://bugs.launchpad.net/nova/+bug/1908133
> 
> Your links helped me a lot of figuring out that my placement aggregates
> were set up incorrectly, and the fake reservation worked slightly better
> than the reserved_host_disk_mb (more details on that in the bug update).
> And it works very well on Rocky+, so that's very good.
> 
> This problem is now much more manageable, thanks for the suggestions!
im glad to hear it worked.
im still hoping to see if i can configure our ceph multi node job to replciate
this shared provider configuretion in our ci and test it but i likely wont get to that
untill after feature freeze at m3. assuming i can get it to work there too we can docudment
a procedure for how to do this and next cycle we can consider if there is a clean way to
automate the process.

thanks for updateing the bug with your findings :)
> 
> Regards,
> 
> On Fri, Jan 8, 2021 at 7:13 PM Sean Mooney <smooney at redhat.com> wrote:
> 
> > On Fri, 2021-01-08 at 18:27 -0300, Rodrigo Barbieri wrote:
> > > Thanks for the responses Eugen and Sean!
> > > 
> > > The placement.yaml approach sounds good if it can prevent the compute
> > host
> > > from reporting local_gb repeatedly, and then as you suggested use
> > Placement
> > > Aggregates I can perhaps make that work for a subset of use cases. Too
> > bad
> > > it is only available on Victoria+. I was looking for something that could
> > > work, even if partially, on Queens and Stein.
> > > 
> > > The cron job updating the reservation, I'm not sure if it will clash with
> > > the host updates (being overriden, as I've described in the LP bug), but
> > > you actually gave me another idea. I may be able to create a fake
> > > allocation in the nodes to cancel out their reported values, and then
> > rely
> > > only on the shared value through placement.
> > well actully you could use the host reserved disk space config value to do
> > that on older releases
> > just set it equal to the pool size.
> > 
> > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_disk_mb
> > not sure why that is in MB it really should be GB but anyway if you set
> > that then it will set the placement value.
> > 
> > > 
> > > Monitoring Ceph is only part of the problem. The second part, if you end
> > up
> > > needing it (and you may if you're not very conservative in the monitoring
> > > parameters and have unpredictable workload) is to prevent new instances
> > > from being created, thus new data from being stored, to prevent it from
> > > filling up before you can react to it (think of an accidental DoS attack
> > by
> > > running a certain storage-heavy workloads).
> > > 
> > > @Eugen, yes. I was actually looking for more reliable ways to prevent it
> > > from happening.
> > > 
> > > Overall, the shared placement + fake allocation sounded like the cleanest
> > > workaround for me. I will try that and report back.
> > 
> > if i get time in the next week or two im hoping ot try and tweak our ceph
> > ci job to test
> > that toplogy in the upstream ci. but just looking a the placemnt
> > funcitonal tests it should work.
> > 
> > This covers the use of sharing resouce providers
> > 
> > https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/shared-resources.yaml
> > 
> > the final section thes the allocation candiate endpoint and asserts we
> > getan allocation for both providres
> > 
> > https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/shared-resources.yaml#L135-L143
> > 
> > its relitivly simple to read this file top to bottom and its only 143
> > lines long but it basically step
> > through and constucte the topolgoy i was descifbing or at least a similar
> > ones and shows step by step what
> > the different behavior will be as the rps are created and aggreates are
> > created exctra.
> > 
> > the main issue with this approch is we dont really have a good way to
> > upgrade existing deployments to this toplogy beyond
> > live migrating everything one node at a time so that there allcoation will
> > get reshaped as a side effect of the move operation.
> > 
> > looking a tthe history of this file it was added 3 years ago
> > https://github.com/openstack/placement/commit/caeae7a41ed41535195640dfa6c5bb58a7999a9b
> > around stien although it may also have worked before thatim not sure when
> > we added sharing providers.
> > 
> > > 
> > > Thanks for the help!
> > > 
> > > On Wed, Jan 6, 2021 at 10:57 AM Eugen Block <eblock at nde.ag> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > we're using OpenStack with Ceph in production and also have customers
> > > > doing that.
> > > >  From my point of view fixing nova to be able to deal with shared
> > > > storage of course would improve many things, but it doesn't liberate
> > > > you from monitoring your systems. Filling up a ceph cluster should be
> > > > avoided and therefore proper monitoring is required.
> > > > 
> > > > I assume you were able to resolve the frozen instances?
> > > > 
> > > > Regards,
> > > > Eugen
> > > > 
> > > > 
> > > > Zitat von Sean Mooney <smooney at redhat.com>:
> > > > 
> > > > > On Tue, 2021-01-05 at 14:17 -0300, Rodrigo Barbieri wrote:
> > > > > > Hi Nova folks and OpenStack operators!
> > > > > > 
> > > > > > I have had some trouble recently where while using the
> > "images_type =
> > > > rbd"
> > > > > > libvirt option my ceph cluster got filled up without I noticing and
> > > > froze
> > > > > > all my nova services and instances.
> > > > > > 
> > > > > > I started digging and investigating why and how I could prevent or
> > > > > > workaround this issue, but I didn't find a very reliable clean way.
> > > > > > 
> > > > > > I documented all my steps and investigation in bug 1908133 [0]. It
> > has
> > > > been
> > > > > > marked as a duplicate of 1522307 [1] which has been around for
> > quite
> > > > some
> > > > > > time, so I am wondering if any operators have been using nova +
> > ceph in
> > > > > > production with "images_type = rbd" config set and how you have
> > been
> > > > > > handling/working around the issue.
> > > > > 
> > > > > this is indeed a know issue and the long term plan to fix it was to
> > > > > track shared storae
> > > > > as a sharing resouce provide in plamcent. that never happend so
> > > > > there si currenlty no mechanium
> > > > > available to prevent this explcitly in nova.
> > > > > 
> > > > > the disk filter which is nolonger used could prevnet the boot of a
> > > > > vm that would fill the ceph pool but
> > > > > it could not protect against two concurrent request form filling the
> > > > pool.
> > > > > 
> > > > > placement can protect against that due to the transational nature of
> > > > > allocations which serialise
> > > > > all resouce useage however since each host reports the total size of
> > > > > the ceph pool as its local storage that wont work out of the box.
> > > > > 
> > > > > as a quick hack what you can do is set the
> > > > > [DEFAULT]/disk_allocation_ratio=(1/number of compute nodes)
> > > > > 
> > > > 
> > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.disk_allocation_ratio
> > > > > on each of your compute agents configs.
> > > > > 
> > > > > 
> > > > > that will prevent over subscription however it has other negitve
> > > > sidefects.
> > > > > mainly that you will fail to scudle instance that could boot if a
> > > > > host exced its 1/n usage
> > > > > so unless you have perfectly blanced consumtion this is not a good
> > > > approch.
> > > > > 
> > > > > a better appoch but one that requires external scripting is to have
> > > > > a chron job that will update the resrved
> > > > >  usaage of each of the disk_gb inventores to the actull amount of of
> > > > > stoarge allocated form the pool.
> > > > > 
> > > > > the real fix however is for nova to tack its shared usage in
> > > > > placment correctly as a sharing resouce provide.
> > > > > 
> > > > > its possible you might be able to do that via the porvider.yaml file
> > > > > 
> > > > > by overriding the local disk_gb to 0 on all comupte nodes
> > > > > then creating a singel haring resouce provider of disk_gb that
> > > > > models the ceph pool.
> > > > > 
> > > > > 
> > > > 
> > https://specs.openstack.org/openstack/nova-specs/specs/ussuri/approved/provider-config-file.html
> > > > > currently that does not support the addtion of providers to placment
> > > > > agggreate so while it could be used to 0 out the comptue node
> > > > > disk inventoies and to create a sharing provider it with the
> > > > > MISC_SHARES_VIA_AGGREGATE trait it cant do the final step of mapping
> > > > > which compute nodes can consume form sharing provider via the
> > > > > agggrate but you could do that form.
> > > > > that assume that "sharing resouce provdiers" actully work.
> > > > > 
> > > > > 
> > > > > bacialy what it comes down to today is you need to monitor the
> > > > > avaiable resouce yourslef externally and ensure you never run out of
> > > > > space.
> > > > > that sucks but untill we proably track things in plamcent there is
> > > > > nothign we can really do.
> > > > > the two approch i suggested above might work for a subset of
> > > > > usecasue but really this is a feature that need native suport in
> > > > > nova to adress properly.
> > > > > 
> > > > > > 
> > > > > > Thanks in advance!
> > > > > > 
> > > > > > [0] https://bugs.launchpad.net/nova/+bug/1908133
> > > > > > [1] https://bugs.launchpad.net/nova/+bug/1522307
> > > > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > 
> > 
> > 
> > 
> > 
> 





More information about the openstack-discuss mailing list