[openstack-dev] [nova] ops meetup feedback
Joshua Harlow
harlowja at fastmail.com
Wed Sep 28 15:36:00 UTC 2016
Matt Riedemann wrote:
> On 9/28/2016 12:10 AM, Joshua Harlow wrote:
>>>>> ACTION: we should make sure workarounds are advertised better
>>>>> ACTION: we should have some document about "when cells"?
>>>> This is a difficult question to answer because "it depends." It's akin
>>>> to asking "how many nova-api/nova-conductor processes should I run?"
>>>> Well, what hardware is being used, how much traffic do you get, is it
>>>> bursty or sustained, are instances created and left alone or are they
>>>> torn down regularly, do you prune your database, what version of rabbit
>>>> are you using, etc...
>>>>
>>>> I would expect the best answer(s) to this question are going to come
>>>> from the operators themselves. What I've seen with cellsv1 is that
>>>> someone will decide for themselves that they should put no more than X
>>>> computes in a cell and that information filters out to other operators.
>>>> That provides a starting point for a new deployment to tune from.
>>>
>>> I don't think we need "don't go larger than N nodes" kind of advice. But
>>> we should probably know what kinds of things we expect to be hot spots.
>>> Like mysql load, possibly indicated by system load or high level of db
>>> conflicts. Or rabbit mq load. Or something along those lines.
>>>
>>> Basically the things to look out for that indicate your are approaching
>>> a scale point where cells is going to help. That also helps in defining
>>> what kind of scaling issues cells won't help on, which need to be
>>> addressed in other ways (such as optimizations).
>>
>> Big +1 if we can really get out of the behavior/pattern of
>> thinking/thought of guessing at the overall system characteristics
>> *somehow* I think it would be great for our own communities maturity and
>> for each project/s. Even though I know such things are hard, it scares
>> the bejeezus out of me when we (as a group) create software but can't
>> give recommendations on its behavioral characteristics (we aren't doing
>> quantum physics here the last time I checked).
>>
>> Just some ideas:
>>
>> * Rally maybe can help here?
>> * Fixing a standard set of configuration options and testing that at
>> scale (using the intel lab?) - and then possibly using rally (or other)
>> to probe the system characteristics and then giving recommendations
>> before releasing the software for general consumption based on observed
>> system characteristics (this is basically what operators are going to
>> have to do anyway to qualify a release, especially if the community
>> isn't doing it and/or is shying away from doing it).
>>
>> I just have a hard time accepting that tribal knowledge about scale that
>> has to filter from operators to operator (yes I know from personal
>> experience this is how things trickled down) is a good way to go. It
>> reminds me of the medicine and practices in the late 1800s where all
>> sorts of quackery science was happening; and IMHO we can do better than
>> this :)
>
> Hmm, that reminds me that I'm running low on leeches...
>
Don't forget your mercury and radioactive toothpaste[1] also, they
perform miracles I tell you (or that's what I've heard) :)
[1] https://en.wikipedia.org/wiki/Doramad_Radioactive_Toothpaste
More information about the OpenStack-dev
mailing list