AW: Customization of nova-scheduler

Belmiro Moreira moreira.belmiro.email.lists at gmail.com
Wed Jun 2 14:12:57 UTC 2021


Hi Sean,

maybe this is the time to bring up again the discussion regarding
preemptible instances support in Nova.

Preemptible/Spot instances are available in all of the major public clouds
to allow a better resource utilization. OpenStack private clouds suffer
exactly from the same issue.

There was a lot of work done in this area during the last 3 years.

Most of the work is summarized by the blogs/presentations/cern-gitlab that
you mentioned.

CERN has been running this code in production since 1 year ago. It allows
us to use the spare capacity in the compute nodes dedicated for specific
services to run batch workloads.

I heard that "ARDC Nectar Research Cloud" is also running it.

I believe the work that was done is an excellent PoC.

Also, to me this looks like it should be a Nova feature. Having an external
project to support this functionality it's a huge overhead.

cheers,

Belmiro


On Tue, Jun 1, 2021 at 11:03 PM Sean Mooney <smooney at redhat.com> wrote:

> On Mon, 2021-05-31 at 17:21 +0100, Stephen Finucane wrote:
> > On Mon, 2021-05-31 at 13:44 +0200, levonmelikbekjan at yahoo.de wrote:
> > > Hello Stephen,
> > >
> > > I am a student from Germany who is currently working on his bachelor
> thesis. My job is to build a cloud solution for my university with
> Openstack. The functionality should include the prioritization of users. So
> that you can imagine exactly how the whole thing should work, I would like
> to give you an example.
> > >
> > > Two cases should be solved!
> > >
> > > Case 1: A user A with a low priority uses a VM from Openstack with
> half performance of the available host. Then user B comes in with a high
> priority and needs the full performance of the host for his VM. When
> creating the VM of user B, the VM of user A should be deleted because there
> is not enough compute power for user B. The VM of user B is successfully
> created.
> > >
> > > Case 2: A user A with a low priority uses a VM with half the
> performance of the available host, then user B comes in with a high
> priority and needs half of the performance of the host for his VM. When
> creating the VM of user B, user A should not be deleted, since enough
> computing power is available for both users.
> > >
> one thing to keep in mind is that end users are not allow to know the
> capstity of the cloud in terms of number of host, the resouces on a host or
> what
> host there vm is placeed on. so as a user the conceph of "a low priority
> uses a VM from Openstack with half performance of the available host" is not
> something that you can express arctecurally in nova.
> flavor define the size of vms in absolute term i.e. 4GB of ram not relitve
> "50% of the host".
> we have a 3 laryer schuldeing prcoess that start with a query to the
> placment service for a set of quantitative resouce class and qualitative
> traits.
> that produces a set fo allcoation candiate against a serise of host that
> could fit the instance, we then filter those host useing python filters
> wich are boolean fucntion that either pass the host or reject it finally
> after filtering we weight the remaining hosts and selecet one to boot the
> vm.
>
> once you have completed a steph in this processs you can nolonger go to a
> previous step and you can never readd a host afteer it has been elimiated by
> placemnt or a filter to be considered again. as a result if you get the
> end of the avaiable hosts and there are none that can fix your vm we cannot
> delete a vm and start again without redoing all the work and possible
> facing with concurrent api requests.
> this is why this is a hard problem with out an external service that can
> rebalance exiting workloads and free up capsity.
>
>
>
> > > These cases should work for unlimited users. In order to optimize the
> whole thing, I would like to write a function that precisely calculates all
> performance components to determine whether enough resources are available
> for the VM of the high priority user.
> >
> > What you're describing is commonly referred to as "preemptible" or "spot"
> > instances. This topic has a long, complicated history in nova and has
> yet to be
> > implemented. Searching for "preemptible instances openstack" should
> yield you
> > lots of discussion on the topic along with a few proof-of-concept
> approaches
> > using external services or out-of-tree modifications to nova.
> >
> > > I’m new to Openstack, but I’ve already implemented cloud projects with
> Microsoft Azure and have solid programming skills. Can you give me a hint
> where and how I can start?
> >
> > As hinted above, this is likely to be a very difficult project given the
> fraught
> > history of the idea. I don't want to dissuade you from this work but you
> should
> > be aware of what you're getting into from the start. If you're serious
> about
> > pursuing this, I suggest you first do some research on prior art. As
> noted
> > above, there is lots of information on the internet about this. With this
> > research done, you'll need to decide whether this is something you want
> to
> > approach within nova itself, via out-of-tree extensions or via a third
> party
> > project. If you're opting for integration with nova, then you'll need to
> think
> > long and hard about how you would design such a system and start working
> on a
> > spec (a design document) outlining your proposed solution. Details on
> how to
> > write a spec are discussed at [1]. The only extension points nova offers
> today
> > are scheduler filters and weighers so your options for an out-of-tree
> extension
> > approach will be limited. A third party project will arguably be the
> easiest
> > approach but you will be restricted to talking to nova's REST APIs which
> may
> > limit the design somewhat. This Blazar spec [2] could give you some
> ideas on
> > this approach (assuming it was never actually implemented, though it may
> well
> > have been).
> >
> > > My university gave me three compute hosts and one control host to
> implement this solution for the bachelor thesis. I’m currently setting up
> Openstack and all the services on the control host all by myself to
> understand all the functionality (sorry for not using Packstack) 😉. All my
> hosts have CentOS 7 and the minimum deployment which I configure is Train.
> > >
> > > My idea is to work with nova schedulers, because they seem to be
> interesting for my case. I've found a whole infrastructure description of
> the provisioning of an instance in Openstack
> https://docs.openstack.org/operations-guide/de/_images/provision-an-instance.png.
>
> > >
> > > The nova scheduler
> https://docs.openstack.org/operations-guide/ops-customize-compute.html is
> the first component, where it is possible to implement functions via Python
> and the Compute API
> https://docs.openstack.org/api-ref/compute/?expanded=show-details-of-specific-api-version-detail,list-servers-detail
> to check for active VMs and probably delete them if needed before a
> successful request for an instantiation can be made.
> > >
> > > What do you guys think about it? Does it seem like a good starting
> point for you or is it the wrong approach?
> >
> > This could potentially work, but I suspect there will be serious
> performance
> > implications with this, particularly at scale. Scheduler filters are
> > historically used for simple things like "find me a group of hosts that
> have
> > this metadata attribute I set on my image". Making API calls sounds like
> > something that would take significant time and therefore slow down the
> schedule
> > process. You'd also have to decide what your heuristic for deciding
> which VM(s)
> > to delete would be, since there's nothing obvious in nova that you could
> use.
> > You could use something as simple as filter extra specs or something as
> > complicated as an external service.
> yes implementing preemption in the scheduler  as filet was disccused in
> the passed and discounted for the performance implication stephen hinted at.
> in tree we currentlyt do not allow filter to make any api or db queires.
> that approach also will not work toady since you would have to rexecute the
> query to the placment service after deleting an instance when you run out
> of capacity and restart the filtering which a filter cannot do as i noted
> above.
>
> the most recent spec in this area was
> https://review.opendev.org/c/openstack/nova-specs/+/438640 for the
> integrated approch and
> https://review.opendev.org/c/openstack/nova-specs/+/554212/12 which
> proposed adding  a pending state for use with a standalone service
>
> https://gitlab.cern.ch/ttsiouts/ReaperServicePrototype
>
> ther are a number of presentation on this form cern/stackhapc
> https://www.stackhpc.com/scientific-sig-at-the-dublin-ptg.html
>
> http://openstack-in-production.blogspot.com/2018/02/maximizing-resource-utilization-with.html
>
> https://openlab.cern/sites/openlab.web.cern.ch/files/2018-07/Containers_on_Baremetal_and_Preemptible_VMs_at_CERN_and_SKA.pdf
>
> https://indico.cern.ch/event/739089/sessions/282073/attachments/1689073/2717151/ASDF_preemptible.pdf
>
>
> the current state is rebuilding from cell0 is not support but the pending
> state was never added and the reaper service was not upstream.
>
> work in this are has now move the blazar project as stphen noted in [2]
>
> https://specs.openstack.org/openstack/blazar-specs/specs/ussuri/blazar-preemptible-instances.html
> but is dont think it has made much progress.
> https://review.opendev.org/q/topic:%22preemptibles%22+(status:open%20OR%20status:merged)
>
> nova previously had a pluggable scheduler that would have allowed you to
> reimplent the scudler entirely from scratch but we removed that
> capability in the last year or two. at this point the only viable approach
> that will not take multiple upstream cycles to this is really to use an
> external service.
>
> >
> > This should be lots to get you started. Once again, do make sure you're
> aware of
> > what you're getting yourself into before you start. This could get
> complicated
> > very quickly :)
>
> yes anything other then adding the pending state to nova will be very
> complex due to placement interaction.
> you would really need to implement a fallback query mechanism in the
> scudler iteself.
> anything after the call to placement is already too late. you might be
> able to reuse consumer types to make some allocation
> preemtiblae and have a prefilter decide if an allocation should be a
> normal nova consumer or premtable consumer based on
> a flavor extra spec.
> https://docs.openstack.org/placement/train/specs/train/approved/2005473-support-consumer-types.html
> this would still require the pending state and an external reaper service
> to free the capsity to be clean but its a possible direction.
>
>
> >
> > Cheers,
> > Stephen
> >
> > > I'm very happy to have found you!!!
> > >
> > > Thank you really much for your time!
> >
> >
> > [1] https://specs.openstack.org/openstack/nova-specs/readme.html
> > [2]
> https://specs.openstack.org/openstack/blazar-specs/specs/ussuri/blazar-preemptible-instances.html
> >
> > > Best regards
> > > Levon
> > >
> > > -----Ursprüngliche Nachricht-----
> > > Von: Stephen Finucane <stephenfin at redhat.com>
> > > Gesendet: Montag, 31. Mai 2021 12:34
> > > An: Levon Melikbekjan <levonmelikbekjan at yahoo.de>;
> openstack at lists.openstack.org
> > > Betreff: Re: Customization of nova-scheduler
> > >
> > > On Wed, 2021-05-26 at 22:46 +0200, Levon Melikbekjan wrote:
> > > > Hello Openstack team,
> > > >
> > > > is it possible to customize the nova-scheduler via Python? If yes,
> how?
> > >
> > > Yes, you can provide your own filters and weighers. This is documented
> at [1].
> > >
> > > Hope this helps,
> > > Stephen
> > >
> > > [1]
> https://docs.openstack.org/nova/latest/user/filter-scheduler#writing-your-own-filter
> > >
> > > >
> > > > Best regards
> > > > Levon
> > > >
> > >
> > >
> >
> >
> >
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210602/b52aefef/attachment-0001.html>


More information about the openstack-discuss mailing list