@Alvaro Soto The 2 main motivations are: 1. By default, the VMs are allocated evenly, so we sometimes lack a space for very large VMs. Some less balanced system with "reservations" for them like "fill this remaining space with a smaller VM only if no other option is available" could be designed. 2. Some patterns like one customer ordering dozens of identical VMs in a row can be observed. Maybe knowledge about them could be somehow used to handle them better. Currently, there are no very concrete ideas on how to follow, besides what I wrote above. We expect to tinker with the settings a bit at first with these ideas in mind and by iterative research hopefully some more optimal settings or filters could be created. @Sean Thank you very much; that is a lot of knowledge, which we will definitely use as a starting point. It's hard for me to comment more about that at this point, but I'm grateful and will keep these hints in mind when we get to the experimentation phase. Kind regards Dominik Danelski On 4.07.2023 13:08, smooney@redhat.com wrote:
not to bias either approach but just to ensure you both understand the current recommendation. when configuring the scheduler there are some rules fo tumb that shoudl be followed.
first if it can be done by both a scheduler filter and placement then placement should be faster. when specifing the filters operators a encouraged to order them following a psudo id3 algorithm approch in other words always put the filters that elimiate the most host first i.e aggreate* filters then simple host specific filters (num instances) then complex host spefific filters liek the numa toplogy filter. finally if a filter is not useful for your cloud(i.e. you have not pci devices for passthough) you can remove it from the enabled filters list
for the weighers all weigher are enabled by default and all enabled filters run on each host that passed the filtes. so there is no ordering to consider with weighers but if you dont need them for your cloud just like the filters you can remove them form the enabled set.
for placement there are only 2 things you can really tweak, the max number of results and randomisation of allcoation candiates. https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.m... https://docs.openstack.org/placement/latest/configuration/config.html#placem...
i know cern set the max_placement_results very low. i.e <20 the default is 1000 but most dont modify this. i generally think settign randomize_allocation_candidates is good espcially if you reduce the max_placement_results
as noted there are quite a few flags to delegate filters to placemetn lik https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.q... https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.l... https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.p... https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.q... https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.q... https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.i...
these often replace filteres like the az filter with a more effeicnt placement query so they should generally be prefer however they dotn always have exactly the same behavior so sometimes you cant just swap form one to anohter if you rely on the semantics of the filter that are changed.
there are some options like https://docs.openstack.org/nova/latest/configuration/config.html#filter_sche... https://docs.openstack.org/nova/latest/configuration/config.html#filter_sche...
that allow you to add a little bit of additional randomness to the host selection which may help with spreading/packing or overall better utilisation of host but it should not really affect performance of the schduelr. i generally recommend setting shuffle_best_same_weighed_hosts=true but leaving host_subset_size at its default of 1 that way you only get randomness if there are multiple equally good options.
by default the scheduler should be determinist today if you do not adjust either host_subset_size or shuffle_best_same_weighed_hosts
i don't know if that will help either of your research but those are the current best pratices for cofniguring the scheduler.
regards sean.
On Mon, 2023-07-03 at 14:08 -0600, Alvaro Soto wrote:
Nice idea, but my main question is, how do you plan to beat the ones implemented currently?
I'm working a little researching a little with some techniques to try to beat the random resource allocation schedulers.
Can you share more about your research and/or implementation idea?
Cheers.
--- Alvaro Soto.
Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people.
On Mon, Jul 3, 2023, 2:00 PM Dominik Danelski <ddanelski@cloudferro.com> wrote:
Hello,
I would like to introduce you to the tool developed under the working title "Scheduler Optimiser". It is meant to test the effectiveness of different Scheduler configurations, both weights and filters on a given list of VM orders and in a semi-realistic infrastructure.
My company - CloudFerro - has been preparing in-house for the last few months and foresees publishing the project as FOSS once it reaches the MVP stage. To make the final result more useful to the community and speed up the development (and release), I humbly ask for your expertise: Are you aware of previous similar efforts? Do you notice some flaws in the current approach? What, in your opinion, are more important aspects of the infrastructure behaviour, and what can be relatively safely ignored in terms of the effect on Scheduler results/allocation?
Project objectives:
* Use Devstack (or another OpenStack deployer) with a real Scheduler to replay a list of compute VM orders, either real from one's infrastructure or artificially created. * Assess the effectiveness of the scheduling in various terms like: "How many machines of a given type can still be allocated at the moment?" using plug-in "success meters". In a strict sense, the project does not simulate THE Scheduler but interacts with it. * Use fake-virt to emulate huge architectures on a relatively tiny test bench. * Have as little as possible, and ideally no changes to the Devstack's code that could not be included in the upstream repository. The usage should be as simple as: 1. Install Devstack. 2. Configure Devstack's cluster with its infrastructure information like flavours and hosts. 3. Configure Scheduler for a new test case. 4. Replay VM orders. 5. Repeat steps 3 and 4 to find better Scheduler settings. * Facilitate creating a minimal required setup of the test bench. Not by replacing standard Devstack scripts, but mainly through tooling for quick rebuilding data like flavours, infrastructure state, and other factors relevant to the simulation.
Outside of the scope:
* Running continuous analysis on the production environment, even if some plug-ins could be extracted for this purpose. * Retaining information about users and projects when replaying orders. * (Probably / low priority) replaying actions other than VM creation/deletion as they form a minority of operations and ignoring them should not have a distinct effect on the comparison experiments.
Current state:
Implemented:
* Recreating flavours from JSON file exported via OpenStack CLI. * Replaying a list of orders in the form of (creation_date, termination_date, resource_id (optional), flavor_id) with basic flavour properties like VCPU, RAM, and DISK GB. The orders are replayed consecutively. * Plug-in success-rater mechanism which runs rater classes (returning quantified success measure) after each VM add/delete action, retains their intermediate history and "total success" - how it is defined is implementation dependent. First classes interacting with Placement like: "How many VMs of flavours x (with basic parameters for now) can fit in the cluster?" or "How many hosts are empty?".
Missing:
* Recreating hosts, note the fake-virt remark from "Risks and Challenges". * Tools facilitating Scheduler configuration. * Creating VMs with more parameters like VGPU, traits, and aggregates. * (Lower priority) saving the intermediate state of the cluster during simulation i.e. allocations to analyse it without rerunning the experiment. Currently, only the quantified meters are saved. * Gently failing and saving all information in case of resource depletion: close to completion, handling one exception type in upper layers is needed. * More success meters.
Risks and Challenges:
* Currently, the tool replays actions one by one, it waits for each creation and deletion to be complete before running success raters and taking another order. Thus, the order of actions is important, but not their absolute time and temporal density. This might skip some side-effects of a realistic execution. * Similarly, to the above, fake-virt provides simple classes that will not reproduce some behaviours of real-world hypervisors. An explicit Scheduler avoids hosts that had recently failed to allocate a VM, but most likely fake-virt will not mock such behaviour. * Fake-virt should reproduce a real diverse infrastructure instead of x copies of the same flavour. This might be the only, but very important change to the OpenStack codebase. If successful, it could benefit other projects and tests as well.
Even though the list of missing features is seemingly larger, the most important parts of the program are already there, so we hope to finish the MVP development in a relatively short amount of time. We are going to publish it as FOSS in either case, but as mentioned your observations would be very much welcome at this stage. I am also open to answering more questions about the project.
Kind regards
Dominik Danelski