Re: [ironic][nova-scheduler][blazar] "host_state" in nova filters has stale aggregate information

25 Apr 2024

      ...
“i would generaly say that it not a race although it is undefined behviaor.”
Is just calling the “add host to aggregate API<https://docs.openstack.org/api-ref/compute/#add-host>” concurrently,
undefined behavior?
On Wed, 2024-04-24 at 21:07 +0000, Michael Sherman wrote:
the effect that has on instance that are being scheduled is.

the may or may not see the host in the aggreate
and we have not api garureentee what will happen.

so you cannot depend on the update being seen by the scheduler imitatively.

for any request that were made before the add host command was done we cannot guareentee that the will see the new host
but we also cannot garunetee that they wont. this is because the request might be sitting in the rabbit queue for a
non determinsitic period of time so we do not know if when the schduler processes that request if they will see the old
value or new.
...
My sequence of operations is:
  1.  Add N hosts to an aggregate concurrently
  2.  Wait a while (minutes to hours), and verify that “aggregate show” lists the correct hosts in the aggregate
  3.  Then attempt to schedule N compute instances, with a filter that checks host membership in the aggregate
  4.  Observe scheduling failures
there is some level of caching in the schduler
for example fi you add new host to a cloud and map them to cells you need to restart the scheuler to clear the cell
cache

you may be hitting a similar caching issue btu i dont think we cache the aggreate membership in the same way.
...
To rule out our code as the issue, I was able to reproduce the behavior using devstack on master, using the nova_fake
driver with 10 fake compute services and the aggregate_instance_extra_specs filter instead of ironic and the blazar-
nova filter.
So long as the N “add_host_to_aggregate” calls to nova_api are made in parallel, there’s a decent probability that the
host_state aggregate info passed to the filters will not agree with the values in the DB.
you might want to set
https://docs.openstack.org/nova/latest/configuration/config.html#filter_sche... to false

```
 track_instance_changes

    Type:

        boolean
    Default:

        True

    Enable querying of individual hosts for instance information.

    The scheduler may need information about the instances on a host in order to evaluate its filters and weighers. The
most common need for this information is for the (anti-)affinity filters, which need to choose a host based on the
instances already running on a host.

    If the configured filters and weighers do not need this information, disabling this option will improve performance.
It may also be disabled when the tracking overhead proves too heavy, although this will cause classes requiring host
usage data to query the database on each request instead.
```

i did not think that affectetd the aggrate membership but it might be worth testing
...
This doesn’t depend on launching instances quickly after making the changes, the inconsistency does not seem to ever
resolve until nova-scheduler is restarted.
right so that sounds like you hiting a caching issue.
we have a fanout rpc that updates all runnign schduler with the updated aggreate info

https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174...
which is called on create aggreate
https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174...
and update aggreate
https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174...
and when we add or remove hosts form an aggreate
https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174...
https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174...
that updates the cached aggreate assocations in the hostmanager
by calling
https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174...
which calls
https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174...

so that should not require a restart based on what im seeing in the code.
the cache in the schduler will get update once the schduler has time to rpocess that rpc call.
...
-Mike

Re: [ironic][nova-scheduler][blazar] "host_state" in nova filters has stale aggregate information

smooney＠redhat.com