[nova][dev] Bug about disabled compute during scheduling
Belmiro/Surya, I'm trying to follow up on something Belmiro mentioned at the summit before I forget about it. CERN sets this value low: https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.m... And as a result, when disabling nova-computes during maintenance, you can fail during scheduling because placement only returns resource providers for disabled computes. I believe Dan and I kicked around some ideas on how we could deal with this, like either via a periodic in the compute service or when the compute service is disabled in the API, we would set the 'reserved' inventory value equal to the total to take those computes out of scheduling. I think Belmiro said this is what CERN is doing today as a workaround? For the latter solution, I don't know if we'd proxy that change directly from nova-api to placement, or make an RPC cast/call to nova-compute to do it, but that's an implementation detail. I mostly just want to make sure we get a bug reported for this so we don't lose track of it. Can one of you open a bug with your scenario and current workaround? -- Thanks, Matt
Hi Matt, Thanks for looking into this, On Wed, Dec 5, 2018 at 10:27 PM Matt Riedemann <mriedemos@gmail.com> wrote:
Belmiro/Surya,
I'm trying to follow up on something Belmiro mentioned at the summit before I forget about it.
CERN sets this value low:
https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.m...
And as a result, when disabling nova-computes during maintenance, you can fail during scheduling because placement only returns resource providers for disabled computes.
I believe Dan and I kicked around some ideas on how we could deal with this, like either via a periodic in the compute service or when the compute service is disabled in the API, we would set the 'reserved' inventory value equal to the total to take those computes out of scheduling.
Just read the discussion on the channel and saw there were a couple of approaches proposed like traits and neg-aggregates in addition to the above two.
I think Belmiro said this is what CERN is doing today as a workaround?
As far as I know we don't have it in PROD, I will let Belmiro confirm this anyways
For the latter solution, I don't know if we'd proxy that change directly from nova-api to placement, or make an RPC cast/call to nova-compute to do it, but that's an implementation detail.
I mostly just want to make sure we get a bug reported for this so we don't lose track of it. Can one of you open a bug with your scenario and current workaround?
We have already filed a bug for this: https://bugs.launchpad.net/nova/+bug/1805984. Will add the workaround we have into the description. ------------ Regards, Surya.
participants (2)
-
Matt Riedemann
-
Surya Seetharaman