[Openstack] deal with booting lots of instance simultaneously

Diego Parrilla Santamaría diego.parrilla.santamaria at gmail.com
Tue Feb 19 09:52:45 UTC 2013


Increasing the RPC timeout should help. I have seen this problem in
nova-network in the past. Vish suggestion sounds good.

Recently we launched by mistake 128 VMs in a production environment of a
customer: 0 errors. They are using 12 cores and several gigs for the
nova-network servers with dual 10G pipes. So hardware matters, of course.

My two cents,
Diego
 --
Diego Parrilla
<http://www.stackops.com/>*CEO*
*www.stackops.com | * diego.parrilla at stackops.com** | +34 649 94 43 29 |
skype:diegoparrilla*
* <http://www.stackops.com/>
*

*



On Tue, Feb 19, 2013 at 10:09 AM, gtt116 <gtt116 at 126.com> wrote:

>  Hi Diego
>
> Thanks for you reply.
> How many hosts do you have? I have 4 hosts. And in this bug,
> https://bugs.launchpad.net/nova/+bug/1094226, The N is 20. In my
> environment N is about 16.
>
> I found that nova-network is too busy to deal with so many rpc request at
> the same time. The Rabbitmq is strong enough in the scenario.
>
> 于 2013年02月19日 16:54, Diego Parrilla Santamaría 写道:
>
> Hi gtt,
>
>  what does it mean for you 'lots of instance simultaneously'? 100, 1000,
> 10000, more?
>
>  We have launched >100 (but less than <1000) simultaneously without any
> issue. Rabbit running in a multicore with several gigs of RAM with out of
> the box configuration.
>
>  Cheers
> Diego
>   --
>  Diego Parrilla
>  <http://www.stackops.com/>*CEO*
> *www.stackops.com | * diego.parrilla at stackops.com** | +34 649 94 43 29 |
> skype:diegoparrilla*
> * <http://www.stackops.com/>
>  *
>
> *
>
>
>
> On Tue, Feb 19, 2013 at 9:35 AM, gtt116 <gtt116 at 126.com> wrote:
>
>>  Hi all,
>>
>> When create lots of instance simultaneously, there will be lots of
>> instance in ERROR state. And most of them are caused by network rpc request
>> timeout. This result is not so graceful.
>>
>> I think it will be better if scheduler keep a queue of creating request.
>> when he find all the hosts are busy enough(compute_node.current_workload
>> reach some value), stop cast the request to host temporarily, until he
>> found some host free enough. In this way, we can make sure booting lots of
>> instances simultaneously results in active instances rather than lots of
>> ERROR instance. but will cause a small weak point, if the top value of
>> current_workload small enough, create instance processing will be slow.
>>
>> Do you have another quick fix?
>>
>> Thanks,
>>
>> --
>> best regards,
>> gtt
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack at lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> best regards,
> gtt
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20130219/cf9579e8/attachment.html>


More information about the Openstack mailing list