<div dir="ltr"><div>Hello there,</div><div><br></div><div>Thanks Sean for the suggestions. I've tested them and reported my findings in <a href="https://bugs.launchpad.net/nova/+bug/1908133">https://bugs.launchpad.net/nova/+bug/1908133</a></div><div><br></div><div>Your links helped me a lot of figuring out that my placement aggregates were set up incorrectly, and the fake reservation worked slightly better than the reserved_host_disk_mb (more details on that in the bug update). And it works very well on Rocky+, so that's very good.<br></div><div><br></div><div>This problem is now much more manageable, thanks for the suggestions!</div><div><br></div><div>Regards,<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jan 8, 2021 at 7:13 PM Sean Mooney <<a href="mailto:smooney@redhat.com">smooney@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, 2021-01-08 at 18:27 -0300, Rodrigo Barbieri wrote:<br>

> Thanks for the responses Eugen and Sean!<br>

> <br>

> The placement.yaml approach sounds good if it can prevent the compute host<br>

> from reporting local_gb repeatedly, and then as you suggested use Placement<br>

> Aggregates I can perhaps make that work for a subset of use cases. Too bad<br>

> it is only available on Victoria+. I was looking for something that could<br>

> work, even if partially, on Queens and Stein.<br>

> <br>

> The cron job updating the reservation, I'm not sure if it will clash with<br>

> the host updates (being overriden, as I've described in the LP bug), but<br>

> you actually gave me another idea. I may be able to create a fake<br>

> allocation in the nodes to cancel out their reported values, and then rely<br>

> only on the shared value through placement.<br>

well actully you could use the host reserved disk space config value to do that on older releases<br>

just set it equal to the pool size.<br>

<a href="https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_disk_mb" rel="noreferrer" target="_blank">https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_disk_mb</a><br>

not sure why that is in MB it really should be GB but anyway if you set that then it will set the placement value.<br>

<br>

> <br>

> Monitoring Ceph is only part of the problem. The second part, if you end up<br>

> needing it (and you may if you're not very conservative in the monitoring<br>

> parameters and have unpredictable workload) is to prevent new instances<br>

> from being created, thus new data from being stored, to prevent it from<br>

> filling up before you can react to it (think of an accidental DoS attack by<br>

> running a certain storage-heavy workloads).<br>

> <br>

> @Eugen, yes. I was actually looking for more reliable ways to prevent it<br>

> from happening.<br>

> <br>

> Overall, the shared placement + fake allocation sounded like the cleanest<br>

> workaround for me. I will try that and report back.<br>

<br>

if i get time in the next week or two im hoping ot try and tweak our ceph ci job to test<br>

that toplogy in the upstream ci. but just looking a the placemnt funcitonal tests it should work.<br>

<br>

This covers the use of sharing resouce providers<br>

<a href="https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/shared-resources.yaml" rel="noreferrer" target="_blank">https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/shared-resources.yaml</a><br>

<br>

the final section thes the allocation candiate endpoint and asserts we getan allocation for both providres<br>

<a href="https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/shared-resources.yaml#L135-L143" rel="noreferrer" target="_blank">https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/shared-resources.yaml#L135-L143</a><br>

<br>

its relitivly simple to read this file top to bottom and its only 143 lines long but it basically step<br>

through and constucte the topolgoy i was descifbing or at least a similar ones and shows step by step what<br>

the different behavior will be as the rps are created and aggreates are created exctra.<br>

<br>

the main issue with this approch is we dont really have a good way to upgrade existing deployments to this toplogy beyond<br>

live migrating everything one node at a time so that there allcoation will get reshaped as a side effect of the move operation.<br>

<br>

looking a tthe history of this file it was added 3 years ago <a href="https://github.com/openstack/placement/commit/caeae7a41ed41535195640dfa6c5bb58a7999a9b" rel="noreferrer" target="_blank">https://github.com/openstack/placement/commit/caeae7a41ed41535195640dfa6c5bb58a7999a9b</a><br>

around stien although it may also have worked before thatim not sure when we added sharing providers.<br>

<br>

> <br>

> Thanks for the help!<br>

> <br>

> On Wed, Jan 6, 2021 at 10:57 AM Eugen Block <<a href="mailto:eblock@nde.ag" target="_blank">eblock@nde.ag</a>> wrote:<br>

> <br>

> > Hi,<br>

> > <br>

> > we're using OpenStack with Ceph in production and also have customers<br>

> > doing that.<br>

> >  From my point of view fixing nova to be able to deal with shared<br>

> > storage of course would improve many things, but it doesn't liberate<br>

> > you from monitoring your systems. Filling up a ceph cluster should be<br>

> > avoided and therefore proper monitoring is required.<br>

> > <br>

> > I assume you were able to resolve the frozen instances?<br>

> > <br>

> > Regards,<br>

> > Eugen<br>

> > <br>

> > <br>

> > Zitat von Sean Mooney <<a href="mailto:smooney@redhat.com" target="_blank">smooney@redhat.com</a>>:<br>

> > <br>

> > > On Tue, 2021-01-05 at 14:17 -0300, Rodrigo Barbieri wrote:<br>

> > > > Hi Nova folks and OpenStack operators!<br>

> > > > <br>

> > > > I have had some trouble recently where while using the "images_type =<br>

> > rbd"<br>

> > > > libvirt option my ceph cluster got filled up without I noticing and<br>

> > froze<br>

> > > > all my nova services and instances.<br>

> > > > <br>

> > > > I started digging and investigating why and how I could prevent or<br>

> > > > workaround this issue, but I didn't find a very reliable clean way.<br>

> > > > <br>

> > > > I documented all my steps and investigation in bug 1908133 [0]. It has<br>

> > been<br>

> > > > marked as a duplicate of 1522307 [1] which has been around for quite<br>

> > some<br>

> > > > time, so I am wondering if any operators have been using nova + ceph in<br>

> > > > production with "images_type = rbd" config set and how you have been<br>

> > > > handling/working around the issue.<br>

> > > <br>

> > > this is indeed a know issue and the long term plan to fix it was to<br>

> > > track shared storae<br>

> > > as a sharing resouce provide in plamcent. that never happend so<br>

> > > there si currenlty no mechanium<br>

> > > available to prevent this explcitly in nova.<br>

> > > <br>

> > > the disk filter which is nolonger used could prevnet the boot of a<br>

> > > vm that would fill the ceph pool but<br>

> > > it could not protect against two concurrent request form filling the<br>

> > pool.<br>

> > > <br>

> > > placement can protect against that due to the transational nature of<br>

> > > allocations which serialise<br>

> > > all resouce useage however since each host reports the total size of<br>

> > > the ceph pool as its local storage that wont work out of the box.<br>

> > > <br>

> > > as a quick hack what you can do is set the<br>

> > > [DEFAULT]/disk_allocation_ratio=(1/number of compute nodes)<br>

> > > <br>

> > <a href="https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.disk_allocation_ratio" rel="noreferrer" target="_blank">https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.disk_allocation_ratio</a><br>

> > > on each of your compute agents configs.<br>

> > > <br>

> > > <br>

> > > that will prevent over subscription however it has other negitve<br>

> > sidefects.<br>

> > > mainly that you will fail to scudle instance that could boot if a<br>

> > > host exced its 1/n usage<br>

> > > so unless you have perfectly blanced consumtion this is not a good<br>

> > approch.<br>

> > > <br>

> > > a better appoch but one that requires external scripting is to have<br>

> > > a chron job that will update the resrved<br>

> > >  usaage of each of the disk_gb inventores to the actull amount of of<br>

> > > stoarge allocated form the pool.<br>

> > > <br>

> > > the real fix however is for nova to tack its shared usage in<br>

> > > placment correctly as a sharing resouce provide.<br>

> > > <br>

> > > its possible you might be able to do that via the porvider.yaml file<br>

> > > <br>

> > > by overriding the local disk_gb to 0 on all comupte nodes<br>

> > > then creating a singel haring resouce provider of disk_gb that<br>

> > > models the ceph pool.<br>

> > > <br>

> > > <br>

> > <a href="https://specs.openstack.org/openstack/nova-specs/specs/ussuri/approved/provider-config-file.html" rel="noreferrer" target="_blank">https://specs.openstack.org/openstack/nova-specs/specs/ussuri/approved/provider-config-file.html</a><br>

> > > currently that does not support the addtion of providers to placment<br>

> > > agggreate so while it could be used to 0 out the comptue node<br>

> > > disk inventoies and to create a sharing provider it with the<br>

> > > MISC_SHARES_VIA_AGGREGATE trait it cant do the final step of mapping<br>

> > > which compute nodes can consume form sharing provider via the<br>

> > > agggrate but you could do that form.<br>

> > > that assume that "sharing resouce provdiers" actully work.<br>

> > > <br>

> > > <br>

> > > bacialy what it comes down to today is you need to monitor the<br>

> > > avaiable resouce yourslef externally and ensure you never run out of<br>

> > > space.<br>

> > > that sucks but untill we proably track things in plamcent there is<br>

> > > nothign we can really do.<br>

> > > the two approch i suggested above might work for a subset of<br>

> > > usecasue but really this is a feature that need native suport in<br>

> > > nova to adress properly.<br>

> > > <br>

> > > > <br>

> > > > Thanks in advance!<br>

> > > > <br>

> > > > [0] <a href="https://bugs.launchpad.net/nova/+bug/1908133" rel="noreferrer" target="_blank">https://bugs.launchpad.net/nova/+bug/1908133</a><br>

> > > > [1] <a href="https://bugs.launchpad.net/nova/+bug/1522307" rel="noreferrer" target="_blank">https://bugs.launchpad.net/nova/+bug/1522307</a><br>

> > > > <br>

> > <br>

> > <br>

> > <br>

> > <br>

> > <br>

> <br>

<br>

<br>

<br>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>Rodrigo Barbieri<div>MSc Computer Scientist</div><div>OpenStack Manila Core Contributor</div><div>Federal University of São Carlos</div><div><br></div></div></div></div></div></div></div></div>