Open Stack

Thu Mar 6 12:26:36 UTC 2014

On 3 March 2014 19:33, David Peraza <david_peraza at persistentsys.com> wrote:
> Thanks John,
>
> What I'm trying to do is to run an asynchronous task that pre-organizes the target hosts for an image. Then scheduler only need to read the top of the list or priority queue. We have a paper proposed for the summit that will explain the approach, hopefully it gets accepted so we can have a conversation on this at the summit. I suspect the DB overhead will go away if we try our approach. Still theory though, that is why I want to get a significant test environment to appreciate the performance better.

I attempted something similar as part of the caching scheduler work.

When picking the size of the "slot" cache, I found I got the best
performance when I turned it off. Small bursts of builds were slightly
quicker, but would get delayed if they came in when the cache was
being populated. Large bursts of requests very quickly depleted the
cache, and filling it back up was quite expensive, and you queue up
other requests while you do that. So choosing the cache size was very
tricky. All the time, you end up making some bad choices because you
are only looking at a subset of the nodes.

I am however very interested in seeing if you have found a balance
that works well. It feels like some combination would help in certain
situations. I just couldn't find either myself.

My current approach is just to cache the lists of hosts you get from
the DB, and update the host state with each decision you make, so
those requests don't race each other.

Some simple optimisations to the filter and weights system seemed to
be a much better route to improving the performance. (I had some
patches up for that, will refresh them when Juno opens).

But until we get the move to conductor work complete (using select
destination instead of run_instance), the DB calls locking all the
eventlet threads seems like the biggest issue.

Anyways, looking forward to a good discussion at the summit.

John

>
> Regards,
> David Peraza
>
> -----Original Message-----
> From: John Garbutt [mailto:john at johngarbutt.com]
> Sent: Tuesday, February 25, 2014 5:45 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] Simulating many fake nova compute nodes for scheduler testing
>
> On 24 February 2014 20:13, David Peraza <david_peraza at persistentsys.com> wrote:
>> Thanks John,
>>
>> I also think it is a good idea to test the algorithm at unit test level, but I will like to try out over amqp as well, that is, we process and threads talking to each other over rabbit or qpid. I'm trying to test out performance as well.
>>
>
> Nothing beats testing the thing for real, of course.
>
> As a heads up, the overheads of DB calls turned out to dwarf any algorithmic improvements I managed. There will clearly be some RPC overhead, but it didn't stand out as much as the DB issue.
>
> The move to conductor work should certainly stop the scheduler making those pesky DB calls to update the nova instance. And then, improvements like no-db-scheduler and improvements to scheduling algorithms should shine through much more.
>
> Thanks,
> John
>
>
>> -----Original Message-----
>> From: John Garbutt [mailto:john at johngarbutt.com]
>> Sent: Monday, February 24, 2014 11:51 AM
>> To: OpenStack Development Mailing List (not for usage questions)
>> Subject: Re: [openstack-dev] [nova] Simulating many fake nova compute
>> nodes for scheduler testing
>>
>> On 24 February 2014 16:24, David Peraza <david_peraza at persistentsys.com> wrote:
>>> Hello all,
>>>
>>> I have been trying some new ideas on scheduler and I think I'm
>>> reaching a resource issue. I'm running 6 compute service right on my
>>> 4 CPU 4 Gig VM, and I started to get some memory allocation issues.
>>> Keystone and Nova are already complaining there is not enough memory.
>>> The obvious solution to add more candidates is to get another VM and set another 6 Fake compute service.
>>> I could do that but I think I need to be able to scale more without
>>> the need to use this much resources. I will like to simulate a cloud
>>> of 100 maybe
>>> 1000 compute nodes that do nothing (Fake driver) this should not take
>>> this much memory. Anyone knows of a more efficient way to  simulate
>>> many computes? I was thinking changing the Fake driver to report many
>>> compute services in different threads instead of having to spawn a
>>> process per compute service. Any other ideas?
>>
>> It depends what you want to test, but I was able to look at tuning the filters and weights using the test at the end of this file:
>> https://review.openstack.org/#/c/67855/33/nova/tests/scheduler/test_ca
>> ching_scheduler.py
>>
>> Cheers,
>> John
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> DISCLAIMER
>> ==========
>> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Open Stack

[openstack-dev] [nova] Simulating many fake nova compute nodes for scheduler testing

OpenStack

Community

Documentation

Branding & Legal