[openstack-dev] [Nova] Does Nova really need an SQL database?

Alex Glikson GLIKSON at il.ibm.com
Wed Nov 20 05:46:10 UTC 2013

Another possible approach could be that only part of the 50 succeeds 
(reported back to the user), and then a retry mechanism at a higher level 
would potentially approach the other partition/scheduler - similar to 
today's retries.


From:   Mike Wilson <geekinutah at gmail.com>
To:     "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev at lists.openstack.org>, 
Date:   20/11/2013 05:53 AM
Subject:        Re: [openstack-dev] [Nova] Does Nova really need an SQL 

I've been thinking about this use case for a DHT-like design, I think I 
want to do what other people have alluded to here and try and intercept 
problematic requests like this one in some sort of "pre sending to 
ring-segment" stage. In this case the "pre-stage" could decide to send 
this off to a scheduler that has a more complete view of the world. 
Alternatively, don't make a single request for 50 instances, just send 50 
requests for one? Is that a viable thing to do for this use case?


On Tue, Nov 19, 2013 at 7:03 PM, Joshua Harlow <harlowja at yahoo-inc.com> 
At yahoo at least 50+ simultaneous will be the common case (maybe we are

Think of what happens on www.yahoo.com say on the olympics, news.yahoo.com
could need 50+ very very quickly (especially if say a gold medal is won by
some famous person). So I wouldn't discount those being the common case
(may not be common for some, but is common for others). In fact any
website with spurious/spikey traffic will have the same desire; so it
might be a target use-case for website like companies (or ones that can't
upfront predict spikes).

Overall though I think what u said about 'don't fill it up' is good
general knowledge. Filling up stuff beyond a certain threshold is
dangerous just in general (one should only push the limits so far before

On 11/19/13 4:08 PM, "Clint Byrum" <clint at fewbar.com> wrote:

>Excerpts from Chris Friesen's message of 2013-11-19 12:18:16 -0800:
>> On 11/19/2013 01:51 PM, Clint Byrum wrote:
>> > Excerpts from Chris Friesen's message of 2013-11-19 11:37:02 -0800:
>> >> On 11/19/2013 12:35 PM, Clint Byrum wrote:
>> >>
>> >>> Each scheduler process can own a different set of resources. If 
>> >>> each grab instance requests in a round-robin fashion, then they 
>> >>> fill their resources up in a relatively well balanced way until one
>> >>> scheduler's resources are exhausted. At that time it should bow out
>> >>> taking new instances. If it can't fit a request in, it should kick
>> >>> request out for retry on another scheduler.
>> >>>
>> >>> In this way, they only need to be in sync in that they need a way 
>> >>> agree on who owns which resources. A distributed hash table that
>> >>> refreshed whenever schedulers come and go would be fine for that.
>> >>
>> >> That has some potential, but at high occupancy you could end up
>> >> to schedule something because no one scheduler has sufficient
>> >> even if the cluster as a whole does.
>> >>
>> >
>> > I'm not sure what you mean here. What resource spans multiple compute
>> > hosts?
>> Imagine the cluster is running close to full occupancy, each scheduler
>> has room for 40 more instances.  Now I come along and issue a single
>> request to boot 50 instances.  The cluster has room for that, but none
>> of the schedulers do.
>You're assuming that all 50 come in at once. That is only one use case
>and not at all the most common.
>> >> This gets worse once you start factoring in things like heat and
>> >> instance groups that will want to schedule whole sets of resources
>> >> (instances, IP addresses, network links, cinder volumes, etc.) at
>> >> with constraints on where they can be placed relative to each other.
>> > Actually that is rather simple. Such requests have to be serialized
>> > into a work-flow. So if you say "give me 2 instances in 2 different
>> > locations" then you allocate 1 instance, and then another one with
>> > 'not_in_location(1)' as a condition.
>> Actually, you don't want to serialize it, you want to hand the whole
>> of resource requests and constraints to the scheduler all at once.
>> If you do them one at a time, then early decisions made with
>> less-than-complete knowledge can result in later scheduling requests
>> failing due to being unable to meet constraints, even if there are
>> actually sufficient resources in the cluster.
>> The "VM ensembles" document at
>> has a good example of how one-at-a-time scheduling can cause spurious
>> failures.
>> And if you're handing the whole set of requests to a scheduler all at
>> once, then you want the scheduler to have access to as many resources
>> possible so that it has the highest likelihood of being able to satisfy
>> the request given the constraints.
>This use case is real and valid, which is why I think there is room for
>multiple approaches. For instance the situation you describe can also be
>dealt with by just having the cloud stay under-utilized and accepting
>that when you get over a certain percentage utilized spurious failures
>will happen. We have a similar solution in the ext3 filesystem on Linux.
>Don't fill it up, or suffer a huge performance penalty.
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org

OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131120/0dc4a280/attachment.html>

More information about the OpenStack-dev mailing list