AW: Customization of nova-scheduler

Sean Mooney smooney at redhat.com
Wed Jun 2 15:04:16 UTC 2021


On Wed, 2021-06-02 at 16:12 +0200, Belmiro Moreira wrote:
> Hi Sean,
> 
> maybe this is the time to bring up again the discussion regarding
> preemptible instances support in Nova.
maybe realistically im not sure we have the capasity to do the detailed design required
this cycle but we could disucss it with an aim to having something ready for next cycle.
i still think this is a valuable capablity which is partly why i brought this topic up with
gibi this morning http://eavesdrop.openstack.org/irclogs/%23openstack-nova/latest.log.html#t2021-06-02T10:26:24
his reply is here http://eavesdrop.openstack.org/irclogs/%23openstack-nova/latest.log.html#t2021-06-02T12:00:03
i was exploring the question does the soon to be intoduced consumer types impact the desgin in any way.

if unified limits was aware of consumer types and we had a placement:consumer_type=premtibale extra spec for example
and we enhanced nova to use that we could adress some of the awkwardness in the current design where you have to have
two project to do quota properly.

effectively i think unified limits + consumer types shoudl probably be a prerequisite.
we might want to revive the pending state also alhtough we now have rebuild form cell 0 i belvie so that may not be reuqied.

if there is interst in this perhaps we should explore a subteam/popup team to pursue this again?


> 
> Preemptible/Spot instances are available in all of the major public clouds
> to allow a better resource utilization. OpenStack private clouds suffer
> exactly from the same issue.
> 
> There was a lot of work done in this area during the last 3 years.
> 
> Most of the work is summarized by the blogs/presentations/cern-gitlab that
> you mentioned.
> 
> CERN has been running this code in production since 1 year ago. It allows
> us to use the spare capacity in the compute nodes dedicated for specific
> services to run batch workloads.

yep i see utility in it for providing extra cloud capacity for ci also.
> 
> I heard that "ARDC Nectar Research Cloud" is also running it.
> 
> I believe the work that was done is an excellent PoC.
well since cern and netcar are potentialy running it already is that not an endorsement of an
external agent approch :)

> 
> Also, to me this looks like it should be a Nova feature. Having an external
> project to support this functionality it's a huge overhead.

so we have been debaiting addign a new agent to nova for a while that would be
responsible for runing some of the period healing type task.
we were calling it nova audit as a place holder but it woudld basicaly do thing like arciving
delete rows healing allocation ectra. the other logical approch would be to incoperate it into
the nova conductor but im still not sold that it shoudl be in the nova tree.

im not againt that either but perhaps a better apprcoh would be to create seperate repo that
is a deliverable of nova based on the poc code and incubate it there. im really conviced that an external process is
a huge overhead but also haveing to maintain the project release it ectra probably is.

with that said i have always been a fan of the idea of having a common agent on a node that ran multiple services.
e.g. a way to deploy nova api, nova conductor and nova scheduler as a singel binary to reduce the number of service
you need to manage but i think that is a seperate topic.

> 
> cheers,
> 
> Belmiro
> 
> 
> On Tue, Jun 1, 2021 at 11:03 PM Sean Mooney <smooney at redhat.com> wrote:
> 
> > On Mon, 2021-05-31 at 17:21 +0100, Stephen Finucane wrote:
> > > On Mon, 2021-05-31 at 13:44 +0200, levonmelikbekjan at yahoo.de wrote:
> > > > Hello Stephen,
> > > > 
> > > > I am a student from Germany who is currently working on his bachelor
> > thesis. My job is to build a cloud solution for my university with
> > Openstack. The functionality should include the prioritization of users. So
> > that you can imagine exactly how the whole thing should work, I would like
> > to give you an example.
> > > > 
> > > > Two cases should be solved!
> > > > 
> > > > Case 1: A user A with a low priority uses a VM from Openstack with
> > half performance of the available host. Then user B comes in with a high
> > priority and needs the full performance of the host for his VM. When
> > creating the VM of user B, the VM of user A should be deleted because there
> > is not enough compute power for user B. The VM of user B is successfully
> > created.
> > > > 
> > > > Case 2: A user A with a low priority uses a VM with half the
> > performance of the available host, then user B comes in with a high
> > priority and needs half of the performance of the host for his VM. When
> > creating the VM of user B, user A should not be deleted, since enough
> > computing power is available for both users.
> > > > 
> > one thing to keep in mind is that end users are not allow to know the
> > capstity of the cloud in terms of number of host, the resouces on a host or
> > what
> > host there vm is placeed on. so as a user the conceph of "a low priority
> > uses a VM from Openstack with half performance of the available host" is not
> > something that you can express arctecurally in nova.
> > flavor define the size of vms in absolute term i.e. 4GB of ram not relitve
> > "50% of the host".
> > we have a 3 laryer schuldeing prcoess that start with a query to the
> > placment service for a set of quantitative resouce class and qualitative
> > traits.
> > that produces a set fo allcoation candiate against a serise of host that
> > could fit the instance, we then filter those host useing python filters
> > wich are boolean fucntion that either pass the host or reject it finally
> > after filtering we weight the remaining hosts and selecet one to boot the
> > vm.
> > 
> > once you have completed a steph in this processs you can nolonger go to a
> > previous step and you can never readd a host afteer it has been elimiated by
> > placemnt or a filter to be considered again. as a result if you get the
> > end of the avaiable hosts and there are none that can fix your vm we cannot
> > delete a vm and start again without redoing all the work and possible
> > facing with concurrent api requests.
> > this is why this is a hard problem with out an external service that can
> > rebalance exiting workloads and free up capsity.
> > 
> > 
> > 
> > > > These cases should work for unlimited users. In order to optimize the
> > whole thing, I would like to write a function that precisely calculates all
> > performance components to determine whether enough resources are available
> > for the VM of the high priority user.
> > > 
> > > What you're describing is commonly referred to as "preemptible" or "spot"
> > > instances. This topic has a long, complicated history in nova and has
> > yet to be
> > > implemented. Searching for "preemptible instances openstack" should
> > yield you
> > > lots of discussion on the topic along with a few proof-of-concept
> > approaches
> > > using external services or out-of-tree modifications to nova.
> > > 
> > > > I’m new to Openstack, but I’ve already implemented cloud projects with
> > Microsoft Azure and have solid programming skills. Can you give me a hint
> > where and how I can start?
> > > 
> > > As hinted above, this is likely to be a very difficult project given the
> > fraught
> > > history of the idea. I don't want to dissuade you from this work but you
> > should
> > > be aware of what you're getting into from the start. If you're serious
> > about
> > > pursuing this, I suggest you first do some research on prior art. As
> > noted
> > > above, there is lots of information on the internet about this. With this
> > > research done, you'll need to decide whether this is something you want
> > to
> > > approach within nova itself, via out-of-tree extensions or via a third
> > party
> > > project. If you're opting for integration with nova, then you'll need to
> > think
> > > long and hard about how you would design such a system and start working
> > on a
> > > spec (a design document) outlining your proposed solution. Details on
> > how to
> > > write a spec are discussed at [1]. The only extension points nova offers
> > today
> > > are scheduler filters and weighers so your options for an out-of-tree
> > extension
> > > approach will be limited. A third party project will arguably be the
> > easiest
> > > approach but you will be restricted to talking to nova's REST APIs which
> > may
> > > limit the design somewhat. This Blazar spec [2] could give you some
> > ideas on
> > > this approach (assuming it was never actually implemented, though it may
> > well
> > > have been).
> > > 
> > > > My university gave me three compute hosts and one control host to
> > implement this solution for the bachelor thesis. I’m currently setting up
> > Openstack and all the services on the control host all by myself to
> > understand all the functionality (sorry for not using Packstack) 😉. All my
> > hosts have CentOS 7 and the minimum deployment which I configure is Train.
> > > > 
> > > > My idea is to work with nova schedulers, because they seem to be
> > interesting for my case. I've found a whole infrastructure description of
> > the provisioning of an instance in Openstack
> > https://docs.openstack.org/operations-guide/de/_images/provision-an-instance.png.
> > 
> > > > 
> > > > The nova scheduler
> > https://docs.openstack.org/operations-guide/ops-customize-compute.html is
> > the first component, where it is possible to implement functions via Python
> > and the Compute API
> > https://docs.openstack.org/api-ref/compute/?expanded=show-details-of-specific-api-version-detail,list-servers-detail
> > to check for active VMs and probably delete them if needed before a
> > successful request for an instantiation can be made.
> > > > 
> > > > What do you guys think about it? Does it seem like a good starting
> > point for you or is it the wrong approach?
> > > 
> > > This could potentially work, but I suspect there will be serious
> > performance
> > > implications with this, particularly at scale. Scheduler filters are
> > > historically used for simple things like "find me a group of hosts that
> > have
> > > this metadata attribute I set on my image". Making API calls sounds like
> > > something that would take significant time and therefore slow down the
> > schedule
> > > process. You'd also have to decide what your heuristic for deciding
> > which VM(s)
> > > to delete would be, since there's nothing obvious in nova that you could
> > use.
> > > You could use something as simple as filter extra specs or something as
> > > complicated as an external service.
> > yes implementing preemption in the scheduler  as filet was disccused in
> > the passed and discounted for the performance implication stephen hinted at.
> > in tree we currentlyt do not allow filter to make any api or db queires.
> > that approach also will not work toady since you would have to rexecute the
> > query to the placment service after deleting an instance when you run out
> > of capacity and restart the filtering which a filter cannot do as i noted
> > above.
> > 
> > the most recent spec in this area was
> > https://review.opendev.org/c/openstack/nova-specs/+/438640 for the
> > integrated approch and
> > https://review.opendev.org/c/openstack/nova-specs/+/554212/12 which
> > proposed adding  a pending state for use with a standalone service
> > 
> > https://gitlab.cern.ch/ttsiouts/ReaperServicePrototype
> > 
> > ther are a number of presentation on this form cern/stackhapc
> > https://www.stackhpc.com/scientific-sig-at-the-dublin-ptg.html
> > 
> > http://openstack-in-production.blogspot.com/2018/02/maximizing-resource-utilization-with.html
> > 
> > https://openlab.cern/sites/openlab.web.cern.ch/files/2018-07/Containers_on_Baremetal_and_Preemptible_VMs_at_CERN_and_SKA.pdf
> > 
> > https://indico.cern.ch/event/739089/sessions/282073/attachments/1689073/2717151/ASDF_preemptible.pdf
> > 
> > 
> > the current state is rebuilding from cell0 is not support but the pending
> > state was never added and the reaper service was not upstream.
> > 
> > work in this are has now move the blazar project as stphen noted in [2]
> > 
> > https://specs.openstack.org/openstack/blazar-specs/specs/ussuri/blazar-preemptible-instances.html
> > but is dont think it has made much progress.
> > https://review.opendev.org/q/topic:%22preemptibles%22+(status:open%20OR%20status:merged)
> > 
> > nova previously had a pluggable scheduler that would have allowed you to
> > reimplent the scudler entirely from scratch but we removed that
> > capability in the last year or two. at this point the only viable approach
> > that will not take multiple upstream cycles to this is really to use an
> > external service.
> > 
> > > 
> > > This should be lots to get you started. Once again, do make sure you're
> > aware of
> > > what you're getting yourself into before you start. This could get
> > complicated
> > > very quickly :)
> > 
> > yes anything other then adding the pending state to nova will be very
> > complex due to placement interaction.
> > you would really need to implement a fallback query mechanism in the
> > scudler iteself.
> > anything after the call to placement is already too late. you might be
> > able to reuse consumer types to make some allocation
> > preemtiblae and have a prefilter decide if an allocation should be a
> > normal nova consumer or premtable consumer based on
> > a flavor extra spec.
> > https://docs.openstack.org/placement/train/specs/train/approved/2005473-support-consumer-types.html
> > this would still require the pending state and an external reaper service
> > to free the capsity to be clean but its a possible direction.
> > 
> > 
> > > 
> > > Cheers,
> > > Stephen
> > > 
> > > > I'm very happy to have found you!!!
> > > > 
> > > > Thank you really much for your time!
> > > 
> > > 
> > > [1] https://specs.openstack.org/openstack/nova-specs/readme.html
> > > [2]
> > https://specs.openstack.org/openstack/blazar-specs/specs/ussuri/blazar-preemptible-instances.html
> > > 
> > > > Best regards
> > > > Levon
> > > > 
> > > > -----Ursprüngliche Nachricht-----
> > > > Von: Stephen Finucane <stephenfin at redhat.com>
> > > > Gesendet: Montag, 31. Mai 2021 12:34
> > > > An: Levon Melikbekjan <levonmelikbekjan at yahoo.de>;
> > openstack at lists.openstack.org
> > > > Betreff: Re: Customization of nova-scheduler
> > > > 
> > > > On Wed, 2021-05-26 at 22:46 +0200, Levon Melikbekjan wrote:
> > > > > Hello Openstack team,
> > > > > 
> > > > > is it possible to customize the nova-scheduler via Python? If yes,
> > how?
> > > > 
> > > > Yes, you can provide your own filters and weighers. This is documented
> > at [1].
> > > > 
> > > > Hope this helps,
> > > > Stephen
> > > > 
> > > > [1]
> > https://docs.openstack.org/nova/latest/user/filter-scheduler#writing-your-own-filter
> > > > 
> > > > > 
> > > > > Best regards
> > > > > Levon
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > 





More information about the openstack-discuss mailing list