Open Stack

Wed May 25 11:41:31 UTC 2011

Heh, you're right, I was completely mistaken :)

That's a really cool idea Soren!

One of problems we're faced with is keeping instances from the same customer off a single compute node. We want to spread them out to lessen the impact of machine failure. In order to solve this we needed to put a weight on each host candidate and use that to decide the final ordering. The implication is we can't dump all the requests in the Scheduler queues and have them round-robin'ed, as is currently done. A single Scheduler has to devise a plan and then all the Compute nodes can execute them concurrently. A large part of the Distributed Scheduler work has been working around this issue.

That said, your idea has tremendous merit for those customers that don't have that requirement. 

Right now, we have two ways of dispersing "Provision Resource" requests:
1. The way it is now: dump all the requests in the Scheduler queues and have them picked off concurrently.
2. The way the Distributed Scheduler works: Send the request for N resources out as an atomic unit. A single Scheduler devises a plan and sends tiny units-of-work to the Compute nodes.

Your idea would be the third approach:
3. Place the request in a Flavor-specific queue and let any Compute node capable handle the request.

Let me give a little thought how we can do this cleanly. It's going to get complicated in the code if each request potentially has a different dispersal strategy. The request dispersal strategy seems to go hand-in-hand with the underlying worker-node selection process. They both need to agree on the approach. 

But ... it's certainly not impossible. We just need to bundle the code in nova.compute.api.create() to the configured nova.scheduler.driver ... 1:1
Believe me, based on the stuff we've got going on in the Distributed Scheduler, that would be a really nice refactoring. 

As an aside, the implications of #3 that I can see at first blush:
- We'd be effectively bypassing the scheduler, so we'd need to think about how that would work with Zones since Queues don't span Zones.
- We'd need to create a per-Flavor FanOut queue. Some Compute nodes would express their interest in handling the request and we'd, somehow, need to decide which one gets the work. Who would decide that? How could they do it without creating a single point of failure?

Regardless ... it's a great idea and definitely one that deserves more consideration.

Thanks!
-S

________________________________________
From: Soren Hansen [soren at linux2go.dk]
Sent: Tuesday, May 24, 2011 4:56 PM
To: Sandy Walsh
Cc: openstack at lists.launchpad.net
Subject: Re: [Openstack] OpenStack API, Reservation ID's and Num Instances ...

2011/5/23 Sandy Walsh <sandy.walsh at rackspace.com>:
> To Soren's point about "losing the ability to rely on a fixed set of
> topics in the message queue for doing scheduling" this is not the case,
> there are no new topics introduced.

That's not exactly what I meant.

If we stuck with the simple flavours that we have right now, we could
schedule stuff exclusively using the message queue. The scheduler
would not need to know *anything* about the various compute nodes.
Scheduling an instance of flavour X would be achieved simply by
sending a "run this instance" message on the message queue with the
"flavour-X" topic. Any compute node able to accommodate an instance of
that size would be subscribed to that topic, and the message queue
would simply route the message to a "random" one of them. As a compute
node fills up, it'll unsubscribe from the topics representing flavours
that it no longer has room for. This sounds Very Scalable[tm] to me :)

Even if all the scheduling attributes we come up with were discrete
and enumerable, the cartesian product of all of them is potentially
*huge*, so having a topic for each of the possible combinations sounds
like very little fun. If any of them are not enumerable, it gets even
less fun. So, adding these capabilities would get in the way of
implementing something like the above. I guess it could be a
configuration option, i.e. if you choose the rich scheduling option
set, you don't get to use the cool scheduler.

--
Soren Hansen        | http://linux2go.dk/
Ubuntu Developer    | http://www.ubuntu.com/
OpenStack Developer | http://www.openstack.org/

Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace. 
Any dissemination, distribution or copying of the enclosed material is prohibited.
If you receive this transmission in error, please notify us immediately by e-mail
at abuse at rackspace.com, and delete the original message. 
Your cooperation is appreciated.

Open Stack

[Openstack] OpenStack API, Reservation ID's and Num Instances ...

OpenStack

Community

Documentation

Branding & Legal