Open Stack

Fri Nov 13 15:09:24 UTC 2015

On 11/12/2015 5:34 PM, Joshua Harlow wrote:
> Ok, so the following is starting to form:
>
> https://etherpad.openstack.org/p/remote-conductor-performance
>
> Hopefully we can get to the bottom of this (especially for clouds that
> run a large amount of computes in a single cell/only one cell).
>
> Andrew Laski wrote:
>> On 11/12/15 at 10:53am, Clint Byrum wrote:
>>> Excerpts from Joshua Harlow's message of 2015-11-12 10:35:21 -0800:
>>>> Mike Dorman wrote:
>>>> > We do have a backlog story to investigate this more deeply, we just
>>>> have not had the time to do it yet. For us, it’s been easier/faster
>>>> to add more hardware to conductor to get over the hump temporarily.
>>>> >
>>>> > We kind of have that work earmarked for after the Liberty upgrade,
>>>> in hopes that maybe it’ll be fixed there.
>>>> >
>>>> > If anybody else has done even some trivial troubleshooting already,
>>>> it’d be great to get that info as a starting point. I.e. which
>>>> specific calls to conductor are causing the load, etc.
>>>> >
>>>> > Mike
>>>> >
>>>>
>>>> +1 I think we in the #openstack-performance channel really need to
>>>> investigate this, because it really worries me personally from hearing
>>>> many many rumors about how the remote conductor falls over. Please join
>>>> there and we can try to work through a plan to figure out what to do
>>>> about this situation. It would be great if the nova people also joined
>>>> there (because in the end, likely something in nova will need to be
>>>> fixed/changed/something else to resolve what appears to be a problem
>>>> for
>>>> many operators).
>>>>
>>>
>>> Falling over is definitely a bad sign. ;)
>>>
>>> The concept of pushing messages over a bus instead of just making local
>>> calls shouldn't result in much extra load. Perhaps we just have too many
>>> layers of unoptimized encapsulation. I have to wonder if something like
>>> protobuf would help.
>>
>> Falling over is also a very broad description and doesn't let us know
>> what the actual issue is.
>>
>>  From my experience the performance concern with conductor has been in
>> not understanding the ratio of conductor nodes to computes that are
>> necessary for our usage. Conductor doesn't add much extra load, but it
>> concentrates it on a smaller number of services. If we ran one conductor
>> per compute I suspect we would have no performance issues, but that's a
>> lot of capacity to use for this.
>>
>> I am curious what conductor/compute ratios that others are trying to
>> achieve, given equal hardware types for each, and what are the barriers
>> to this happening?
>>
>>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Cool, that's helpful for taking notes. I've posted some questions in there.

I also added this to the next performance team meeting agenda.  I have a 
conflict at that time so I might not be able to join, but I'm assuming 
notes will be put back into the etherpad.

-- 

Thanks,

Matt Riedemann

Open Stack

[Openstack-operators] [nova] FYI, local conductor mode is deprecated, pending removal in N

OpenStack

Community

Documentation

Branding & Legal