[openstack-dev] Nova scheduler sub-group meeting agenda 6/11
Mike Wilson
geekinutah at gmail.com
Wed Jun 12 18:43:16 UTC 2013
I guess one other aspect of how the fanout_cast is not scaleable that I
neglected to mention is the demand that it puts on the message broker. You
may be aware that schedulers do a fanout_cast to all compute nodes when
they start up asking for an update. We see ridiculously high failure to
deliver rates when this happens. This is obviously a fault of the broker
itself, but it seems like a good idea in general to avoid broadcasty type
communication. We still see plenty of messages failing to be delivered,
even during normal unspectacular operation. I doubt this is a problem that
is specific to us also as I have talked with a few people that have the
same experience.
-Mike Wilson
On Wed, Jun 12, 2013 at 12:14 PM, Mike Wilson <geekinutah at gmail.com> wrote:
> Wow, I missed this thread completely, sorry. I just went over the meeting
> notes and I'd like to add what I can from our own experience with the
> scheduler at Bluehost.
>
> The first issue we had was dealing with the fanout_cast to schedulers from
> the compute nodes. With a large number of nodes all of the scheduler's
> processing time is just getting these updates and processing them. I wasn't
> the one who dug into this and tore it out, but I think we determined that
> for us it was sufficient to get the information from the DB and rely on
> that. In any case, we need to have one way to report instead of reporting
> to the DB and to the individual schedulers as was discussed in the meeting.
> Personally, I think the fanout_cast needs to go away. If
> updating capabilities using RPC is desired that's fine, but it shouldn't be
> a broadcast type communication. It would be better to have the schedulers
> share a host state and one of them at a time can get an update and apply it
> to the shared store. That way we can just spin up more schedulers when your
> current set are not keeping up.
>
> Second issue is something Phil brought up which is the filtering stuff.
> This, to me, was the larger issue and why we stuck our own scheduler in
> instead of trying to fix the problem. There are a few filters that you
> don't need to spin through the whole list to apply. For example filters
> that select or exclude specific hosts should be applied to a collection
> instead of each item of the collection. Btw, I'm geekinutah on IRC, feel
> free to msg me about Bluehost stuff anytime.
>
> -Mike Wilson
>
>
> On Wed, Jun 12, 2013 at 11:31 AM, Joe Gordon <joe.gordon0 at gmail.com>wrote:
>
>>
>>
>>
>> On Mon, Jun 10, 2013 at 3:11 PM, Dugger, Donald D <
>> donald.d.dugger at intel.com> wrote:
>>
>>> Current list of topics we're going over is:
>>>
>>> 1) Extending data in host state
>>> 2) Utilization based scheduling
>>> 3) Whole host allocation capability
>>> 4) Coexistence of different schedulers
>>> 5) Rack aware scheduling
>>> 6) List scheduler hints via API
>>> 7) Host directory service
>>> 8) The future of the scheduler
>>> 9) Network bandwisth aware scheduling (and wider aspects)
>>> 10) ensembles/vclusters
>>>
>>> We've done a first pass over all of these so next will be follow ups to
>>> see where we are. But first, a new issue was raised at the last meeting:
>>>
>>> 11) Scheduler scalability
>>>
>>> The assertion was that BlueHost has created an OpenStack cluster with
>>> ~16,000 nodes and the scheduler didn't scale, they had to throw it out
>>> completely and just put in a simple random selection scheduler. Obviously
>>> scalability of the scheduler is a concern so I'd like to spend this meeting
>>> discussing this topic. (If someone from BlueHost could attend that would
>>> be great).
>>>
>>
>>
>> This is what I am basing my information on (
>> http://www.openstack.org/summit/portland-2013/session-videos/presentation/using-openstack-in-a-traditional-hosting-environment starting
>> at 9:45). Compute nodes broadcast updates to the schedulers every minute
>> which for 16k nodes is 266 messages a second (on average). And with the
>> scheduler being single threaded, processing these messages will keep the
>> scheduler(s) very busy just processing compute broadcasts.
>>
>>
>>>
>>> --
>>> Don Dugger
>>> "Censeo Toto nos in Kansa esse decisse." - D. Gale
>>> Ph: 303/443-3786
>>>
>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130612/deca06c5/attachment.html>
More information about the OpenStack-dev
mailing list