<div dir="auto">Nice idea, but my main question is, how do you plan to beat the ones implemented currently? <div dir="auto"><br></div><div dir="auto">I'm working a little researching a little with some techniques to try to beat the random resource allocation schedulers.</div><div dir="auto"><br></div><div dir="auto">Can you share more about your research and/or implementation idea?</div><div dir="auto"><br></div><div dir="auto">Cheers.<br><br><div data-smartmail="gmail_signature" dir="auto">---<br>Alvaro Soto.<br><br>Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.<br>----------------------------------------------------------<br>Great people talk about ideas,<br>ordinary people talk about things,<br>small people talk... about other people.</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jul 3, 2023, 2:00 PM Dominik Danelski <<a href="mailto:ddanelski@cloudferro.com">ddanelski@cloudferro.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

Hello,<br>

<br>

<br>

I would like to introduce you to the tool developed under the working <br>

title "Scheduler Optimiser". It is meant to test the effectiveness of <br>

different Scheduler configurations, both weights and filters on a given <br>

list of VM orders and in a semi-realistic infrastructure.<br>

<br>

My company - CloudFerro - has been preparing in-house for the last few <br>

months and foresees publishing the project as FOSS once it reaches the <br>

MVP stage. To make the final result more useful to the community and <br>

speed up the development (and release), I humbly ask for your expertise: <br>

Are you aware of previous similar efforts? Do you notice some flaws in <br>

the current approach? What, in your opinion, are more important aspects <br>

of the infrastructure behaviour, and what can be relatively safely <br>

ignored in terms of the effect on Scheduler results/allocation?<br>

<br>

<br>

Project objectives:<br>

<br>

  * Use Devstack (or another OpenStack deployer) with a real Scheduler<br>

    to replay a list of compute VM orders, either real from one's<br>

    infrastructure or artificially created.<br>

  * Assess the effectiveness of the scheduling in various terms like:<br>

    "How many machines of a given type can still be allocated at the<br>

    moment?" using plug-in "success meters". In a strict sense, the<br>

    project does not simulate THE Scheduler but interacts with it.<br>

  * Use fake-virt to emulate huge architectures on a relatively tiny<br>

    test bench.<br>

  * Have as little as possible, and ideally no changes to the Devstack's<br>

    code that could not be included in the upstream repository. The<br>

    usage should be as simple as: 1. Install Devstack. 2. Configure<br>

    Devstack's cluster with its infrastructure information like flavours<br>

    and hosts. 3. Configure Scheduler for a new test case. 4. Replay VM<br>

    orders. 5. Repeat steps 3 and 4 to find better Scheduler settings.<br>

  * Facilitate creating a minimal required setup of the test bench. Not<br>

    by replacing standard Devstack scripts, but mainly through tooling<br>

    for quick rebuilding data like flavours, infrastructure state, and<br>

    other factors relevant to the simulation.<br>

<br>

<br>

Outside of the scope:<br>

<br>

  * Running continuous analysis on the production environment, even if<br>

    some plug-ins could be extracted for this purpose.<br>

  * Retaining information about users and projects when replaying orders.<br>

  * (Probably / low priority) replaying actions other than VM<br>

    creation/deletion as they form a minority of operations and ignoring<br>

    them should not have a distinct effect on the comparison experiments.<br>

<br>

<br>

Current state:<br>

<br>

    Implemented:<br>

<br>

  * Recreating flavours from JSON file exported via OpenStack CLI.<br>

  * Replaying a list of orders in the form of (creation_date,<br>

    termination_date, resource_id (optional), flavor_id) with basic<br>

    flavour properties like VCPU, RAM, and DISK GB. The orders are<br>

    replayed consecutively.<br>

  * Plug-in success-rater mechanism which runs rater classes (returning<br>

    quantified success measure) after each VM add/delete action, retains<br>

    their intermediate history and "total success" - how it is defined<br>

    is implementation dependent. First classes interacting with<br>

    Placement like: "How many VMs of flavours x (with basic parameters<br>

    for now) can fit in the cluster?" or "How many hosts are empty?".<br>

<br>

<br>

Missing:<br>

<br>

  * Recreating hosts, note the fake-virt remark from "Risks and Challenges".<br>

  * Tools facilitating Scheduler configuration.<br>

  * Creating VMs with more parameters like VGPU, traits, and aggregates.<br>

  * (Lower priority) saving the intermediate state of the cluster during<br>

    simulation i.e. allocations to analyse it without rerunning the<br>

    experiment. Currently, only the quantified meters are saved.<br>

  * Gently failing and saving all information in case of resource<br>

    depletion: close to completion, handling one exception type in upper<br>

    layers is needed.<br>

  * More success meters.<br>

<br>

<br>

Risks and Challenges:<br>

<br>

  * Currently, the tool replays actions one by one, it waits for each<br>

    creation and deletion to be complete before running success raters<br>

    and taking another order. Thus, the order of actions is important,<br>

    but not their absolute time and temporal density. This might skip<br>

    some side-effects of a realistic execution.<br>

  * Similarly, to the above, fake-virt provides simple classes that will<br>

    not reproduce some behaviours of real-world hypervisors. An explicit<br>

    Scheduler avoids hosts that had recently failed to allocate a VM,<br>

    but most likely fake-virt will not mock such behaviour.<br>

  * Fake-virt should reproduce a real diverse infrastructure instead of<br>

    x copies of the same flavour. This might be the only, but very<br>

    important change to the OpenStack codebase. If successful, it could<br>

    benefit other projects and tests as well.<br>

<br>

<br>

Even though the list of missing features is seemingly larger, the most <br>

important parts of the program are already there, so we hope to finish <br>

the MVP development in a relatively short amount of time. We are going <br>

to publish it as FOSS in either case, but as mentioned your observations <br>

would be very much welcome at this stage. I am also open to answering <br>

more questions about the project.<br>

<br>

<br>

Kind regards<br>

<br>

Dominik Danelski<br>

<br>

</blockquote></div>