“i would generaly say that it not a race although it is undefined behviaor.”
Is just calling the “add host to aggregate API<https://docs.openstack.org/api-ref/compute/#add-host>” concurrently, undefined behavior?
On Wed, 2024-04-24 at 21:07 +0000, Michael Sherman wrote: the effect that has on instance that are being scheduled is. the may or may not see the host in the aggreate and we have not api garureentee what will happen. so you cannot depend on the update being seen by the scheduler imitatively. for any request that were made before the add host command was done we cannot guareentee that the will see the new host but we also cannot garunetee that they wont. this is because the request might be sitting in the rabbit queue for a non determinsitic period of time so we do not know if when the schduler processes that request if they will see the old value or new.
My sequence of operations is:
1. Add N hosts to an aggregate concurrently 2. Wait a while (minutes to hours), and verify that “aggregate show” lists the correct hosts in the aggregate 3. Then attempt to schedule N compute instances, with a filter that checks host membership in the aggregate 4. Observe scheduling failures
there is some level of caching in the schduler for example fi you add new host to a cloud and map them to cells you need to restart the scheuler to clear the cell cache you may be hitting a similar caching issue btu i dont think we cache the aggreate membership in the same way.
To rule out our code as the issue, I was able to reproduce the behavior using devstack on master, using the nova_fake driver with 10 fake compute services and the aggregate_instance_extra_specs filter instead of ironic and the blazar- nova filter.
So long as the N “add_host_to_aggregate” calls to nova_api are made in parallel, there’s a decent probability that the host_state aggregate info passed to the filters will not agree with the values in the DB.
you might want to set https://docs.openstack.org/nova/latest/configuration/config.html#filter_sche... to false ``` track_instance_changes Type: boolean Default: True Enable querying of individual hosts for instance information. The scheduler may need information about the instances on a host in order to evaluate its filters and weighers. The most common need for this information is for the (anti-)affinity filters, which need to choose a host based on the instances already running on a host. If the configured filters and weighers do not need this information, disabling this option will improve performance. It may also be disabled when the tracking overhead proves too heavy, although this will cause classes requiring host usage data to query the database on each request instead. ``` i did not think that affectetd the aggrate membership but it might be worth testing
This doesn’t depend on launching instances quickly after making the changes, the inconsistency does not seem to ever resolve until nova-scheduler is restarted.
right so that sounds like you hiting a caching issue. we have a fanout rpc that updates all runnign schduler with the updated aggreate info https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174... which is called on create aggreate https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174... and update aggreate https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174... and when we add or remove hosts form an aggreate https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174... https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174... that updates the cached aggreate assocations in the hostmanager by calling https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174... which calls https://github.com/openstack/nova/blob/ca1db54f1bc498528ac3c8601157cb32e5174... so that should not require a restart based on what im seeing in the code. the cache in the schduler will get update once the schduler has time to rpocess that rpc call.
-Mike