[nova] Scheduler Optimiser

smooney at redhat.com smooney at redhat.com
Tue Jul 4 11:08:39 UTC 2023


not to bias either approach but just to ensure you both understand the current
recommendation. when configuring the scheduler there are some rules fo tumb that shoudl be followed.

first if it can be done by both a scheduler filter and placement then placement should be faster.
when specifing the filters operators a encouraged to order them following a psudo id3 algorithm approch
in other words always put the filters that elimiate the most host first i.e aggreate* filters then simple
host specific filters (num instances) then complex host spefific filters liek the numa toplogy filter.
finally if a filter is not useful for your cloud(i.e. you have not pci devices for passthough) you can remove it from
the enabled filters list

for the weighers all weigher are enabled by default and all enabled filters run on each host that passed
the filtes. so there is no ordering to consider with weighers but if you dont need them for your cloud just
like the filters you can remove them form the enabled set.

for placement there are only 2 things you can really tweak, the max number of results and randomisation of allcoation
candiates.
https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.max_placement_results
https://docs.openstack.org/placement/latest/configuration/config.html#placement.randomize_allocation_candidates

i know cern set the max_placement_results very low. i.e <20 the default is 1000 but most dont modify this.
i generally think settign randomize_allocation_candidates is good espcially if you reduce the max_placement_results

as noted there are quite a few flags to delegate filters to placemetn lik
https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.query_placement_for_routed_network_aggregates
https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.limit_tenants_to_placement_aggregate
https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.placement_aggregate_required_for_tenants
https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.query_placement_for_availability_zone
https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.query_placement_for_image_type_support
https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.image_metadata_prefilter


these often replace filteres like the az filter with a more effeicnt placement query so they should generally be prefer
however they dotn always have exactly the same behavior so sometimes you cant just swap form one to anohter if you
rely on the semantics of the filter that are changed.

there are some options like 
https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.host_subset_size
https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.shuffle_best_same_weighed_hosts

that allow you to add a little bit of additional randomness to the host selection which may help with spreading/packing
or overall better utilisation of host but it should not really affect performance of the schduelr.
i generally recommend setting shuffle_best_same_weighed_hosts=true but leaving host_subset_size at its default of 1
that way you only get randomness if there are multiple equally good options.

by default the scheduler should be determinist today if you do not adjust either host_subset_size or
shuffle_best_same_weighed_hosts

i don't know if that will help either of your research but those are the current best pratices for cofniguring the
scheduler.

regards
sean.

On Mon, 2023-07-03 at 14:08 -0600, Alvaro Soto wrote:
> Nice idea, but my main question is, how do you plan to beat the ones
> implemented currently?
> 
> I'm working a little researching a little with some techniques to try to
> beat the random resource allocation schedulers.
> 
> Can you share more about your research and/or implementation idea?
> 
> Cheers.
> 
> ---
> Alvaro Soto.
> 
> Note: My work hours may not be your work hours. Please do not feel the need
> to respond during a time that is not convenient for you.
> ----------------------------------------------------------
> Great people talk about ideas,
> ordinary people talk about things,
> small people talk... about other people.
> 
> On Mon, Jul 3, 2023, 2:00 PM Dominik Danelski <ddanelski at cloudferro.com>
> wrote:
> 
> > 
> > Hello,
> > 
> > 
> > I would like to introduce you to the tool developed under the working
> > title "Scheduler Optimiser". It is meant to test the effectiveness of
> > different Scheduler configurations, both weights and filters on a given
> > list of VM orders and in a semi-realistic infrastructure.
> > 
> > My company - CloudFerro - has been preparing in-house for the last few
> > months and foresees publishing the project as FOSS once it reaches the
> > MVP stage. To make the final result more useful to the community and
> > speed up the development (and release), I humbly ask for your expertise:
> > Are you aware of previous similar efforts? Do you notice some flaws in
> > the current approach? What, in your opinion, are more important aspects
> > of the infrastructure behaviour, and what can be relatively safely
> > ignored in terms of the effect on Scheduler results/allocation?
> > 
> > 
> > Project objectives:
> > 
> >   * Use Devstack (or another OpenStack deployer) with a real Scheduler
> >     to replay a list of compute VM orders, either real from one's
> >     infrastructure or artificially created.
> >   * Assess the effectiveness of the scheduling in various terms like:
> >     "How many machines of a given type can still be allocated at the
> >     moment?" using plug-in "success meters". In a strict sense, the
> >     project does not simulate THE Scheduler but interacts with it.
> >   * Use fake-virt to emulate huge architectures on a relatively tiny
> >     test bench.
> >   * Have as little as possible, and ideally no changes to the Devstack's
> >     code that could not be included in the upstream repository. The
> >     usage should be as simple as: 1. Install Devstack. 2. Configure
> >     Devstack's cluster with its infrastructure information like flavours
> >     and hosts. 3. Configure Scheduler for a new test case. 4. Replay VM
> >     orders. 5. Repeat steps 3 and 4 to find better Scheduler settings.
> >   * Facilitate creating a minimal required setup of the test bench. Not
> >     by replacing standard Devstack scripts, but mainly through tooling
> >     for quick rebuilding data like flavours, infrastructure state, and
> >     other factors relevant to the simulation.
> > 
> > 
> > Outside of the scope:
> > 
> >   * Running continuous analysis on the production environment, even if
> >     some plug-ins could be extracted for this purpose.
> >   * Retaining information about users and projects when replaying orders.
> >   * (Probably / low priority) replaying actions other than VM
> >     creation/deletion as they form a minority of operations and ignoring
> >     them should not have a distinct effect on the comparison experiments.
> > 
> > 
> > Current state:
> > 
> >     Implemented:
> > 
> >   * Recreating flavours from JSON file exported via OpenStack CLI.
> >   * Replaying a list of orders in the form of (creation_date,
> >     termination_date, resource_id (optional), flavor_id) with basic
> >     flavour properties like VCPU, RAM, and DISK GB. The orders are
> >     replayed consecutively.
> >   * Plug-in success-rater mechanism which runs rater classes (returning
> >     quantified success measure) after each VM add/delete action, retains
> >     their intermediate history and "total success" - how it is defined
> >     is implementation dependent. First classes interacting with
> >     Placement like: "How many VMs of flavours x (with basic parameters
> >     for now) can fit in the cluster?" or "How many hosts are empty?".
> > 
> > 
> > Missing:
> > 
> >   * Recreating hosts, note the fake-virt remark from "Risks and
> > Challenges".
> >   * Tools facilitating Scheduler configuration.
> >   * Creating VMs with more parameters like VGPU, traits, and aggregates.
> >   * (Lower priority) saving the intermediate state of the cluster during
> >     simulation i.e. allocations to analyse it without rerunning the
> >     experiment. Currently, only the quantified meters are saved.
> >   * Gently failing and saving all information in case of resource
> >     depletion: close to completion, handling one exception type in upper
> >     layers is needed.
> >   * More success meters.
> > 
> > 
> > Risks and Challenges:
> > 
> >   * Currently, the tool replays actions one by one, it waits for each
> >     creation and deletion to be complete before running success raters
> >     and taking another order. Thus, the order of actions is important,
> >     but not their absolute time and temporal density. This might skip
> >     some side-effects of a realistic execution.
> >   * Similarly, to the above, fake-virt provides simple classes that will
> >     not reproduce some behaviours of real-world hypervisors. An explicit
> >     Scheduler avoids hosts that had recently failed to allocate a VM,
> >     but most likely fake-virt will not mock such behaviour.
> >   * Fake-virt should reproduce a real diverse infrastructure instead of
> >     x copies of the same flavour. This might be the only, but very
> >     important change to the OpenStack codebase. If successful, it could
> >     benefit other projects and tests as well.
> > 
> > 
> > Even though the list of missing features is seemingly larger, the most
> > important parts of the program are already there, so we hope to finish
> > the MVP development in a relatively short amount of time. We are going
> > to publish it as FOSS in either case, but as mentioned your observations
> > would be very much welcome at this stage. I am also open to answering
> > more questions about the project.
> > 
> > 
> > Kind regards
> > 
> > Dominik Danelski
> > 
> > 




More information about the openstack-discuss mailing list