[openstack-dev] A simple way to improve nova scheduler

Sandy Walsh sandy.walsh at rackspace.com
Fri Jul 19 20:23:31 UTC 2013



On 07/19/2013 05:01 PM, Boris Pavlovic wrote:
> Sandy,
> 
> Hm I don't know that algorithm. But our approach doesn't have
> exponential exchange.
> I don't think that in 10k nodes cloud we will have a problems with 150
> RPC call/sec. Even in 100k we will have only 1.5k RPC call/sec.
> More then (compute nodes update their state in DB through conductor
> which produce the same count of RPC calls). 
> 
> So I don't see any explosion here.

Sorry, I was commenting on Soren's suggestion from way back (essentially
listening on a separate exchange for each unique flavor ... so no
scheduler was needed at all). It was a great idea, but fell apart rather
quickly.

The existing approach the scheduler takes is expensive (asking the db
for state of all hosts) and polling the compute nodes might be do-able,
but you're still going to have latency problems waiting for the
responses (the states are invalid nearly immediately, especially if a
fill-first scheduling algorithm is used). We ran into this problem
before in an earlier scheduler implementation. The round-tripping kills.

We have a lot of really great information on Host state in the form of
notifications right now. I think having a service (or notification
driver) listening for these and keeping an the HostState incrementally
updated (and reported back to all of the schedulers via the fanout
queue) would be a better approach.

-S


> 
> Best regards,
> Boris Pavlovic
> 
> Mirantis Inc.  
> 
> 
> On Fri, Jul 19, 2013 at 11:47 PM, Sandy Walsh <sandy.walsh at rackspace.com
> <mailto:sandy.walsh at rackspace.com>> wrote:
> 
> 
> 
>     On 07/19/2013 04:25 PM, Brian Schott wrote:
>     > I think Soren suggested this way back in Cactus to use MQ for compute
>     > node state rather than database and it was a good idea then.
> 
>     The problem with that approach was the number of queues went exponential
>     as soon as you went beyond simple flavors. Add Capabilities or other
>     criteria and you get an explosion of exchanges to listen to.
> 
> 
> 
>     > On Jul 19, 2013, at 10:52 AM, Boris Pavlovic <boris at pavlovic.me
>     <mailto:boris at pavlovic.me>
>     > <mailto:boris at pavlovic.me <mailto:boris at pavlovic.me>>> wrote:
>     >
>     >> Hi all,
>     >>
>     >>
>     >> In Mirantis Alexey Ovtchinnikov and me are working on nova scheduler
>     >> improvements.
>     >>
>     >> As far as we can see the problem, now scheduler has two major issues:
>     >>
>     >> 1) Scalability. Factors that contribute to bad scalability are these:
>     >> *) Each compute node every periodic task interval (60 sec by default)
>     >> updates resources state in DB.
>     >> *) On every boot request scheduler has to fetch information about all
>     >> compute nodes from DB.
>     >>
>     >> 2) Flexibility. Flexibility perishes due to problems with:
>     >> *) Addiing new complex resources (such as big lists of complex
>     objects
>     >> e.g. required by PCI Passthrough
>     >>
>     https://review.openstack.org/#/c/34644/5/nova/db/sqlalchemy/models.py)
>     >> *) Using different sources of data in Scheduler for example from
>     >> cinder or ceilometer.
>     >> (as required by Volume Affinity Filter
>     >> https://review.openstack.org/#/c/29343/)
>     >>
>     >>
>     >> We found a simple way to mitigate this issues by avoiding of DB usage
>     >> for host state storage.
>     >>
>     >> A more detailed discussion of the problem state and one of a possible
>     >> solution can be found here:
>     >>
>     >>
>     https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit#
>     >>
>     >>
>     >> Best regards,
>     >> Boris Pavlovic
>     >>
>     >> Mirantis Inc.
>     >>
>     >> _______________________________________________
>     >> OpenStack-dev mailing list
>     >> OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     >> <mailto:OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>>
>     >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     >
>     >
>     >
>     > _______________________________________________
>     > OpenStack-dev mailing list
>     > OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     >
> 
>     _______________________________________________
>     OpenStack-dev mailing list
>     OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list