On 5/19/20 16:10, melanie witt wrote:
On 5/19/20 15:23, Laurent Dumont wrote:
From what we can gather, there are a couple of parameters that be be tweaked.
1. host_subset_size (Return X number of host instead of 1?) 2. randomize_allocation_candidates (Not 100% on this one) 3. shuffle_best_same_weighed_hosts (Return a random of X number of computes if they are all equal (instance of the same list for all scheduling requests)) 4. max_attempts (how many times the Scheduler will try to fit the instance somewhere)
We've already raised "max_attempts" to 5 from the default of 3 and will raise it further. That said, what are the recommendations for the rest of the settings? We are not exactly concerned with stacking vs spreading (but that's always nice) of the instances but rather making sure deployments fail because of real reasons and not just because Nova/Scheduler keeps stepping on it's own toes.
This is something I've written in the past related to the anti-affinity piece of what you're describing, that might be of help:
https://bugzilla.redhat.com/show_bug.cgi?id=1780380#c4
Option (2) in your list only helps if you have > 1000 hosts in your deployment and you want to make sure resource provider candidates beyond the same first 1000 are regularly made available for scheduling (by randomizing before returning the top 1000 weighted hosts). The placement API will limit the maximum number of returned allocation candidates to 1000 for performance reasons.
And for reference, here is where the limit of 1000 results comes from, it is configurable: https://docs.openstack.org/nova/queens/configuration/config.html#scheduler.m...
Option (3) in your list only helps if you have lots of hosts being weighed equally and you need some randomization per exact weight to help prevent collisions. This is usually applicable to requests for certain NUMA topology and you get many hosts weighted equally.
Hope this helps, -melanie