[openstack-dev] [nova] ops meetup feedback

Joshua Harlow harlowja at fastmail.com
Wed Sep 28 05:10:25 UTC 2016


>>> ACTION: we should make sure workarounds are advertised better
>>> ACTION: we should have some document about "when cells"?
>> This is a difficult question to answer because "it depends." It's akin
>> to asking "how many nova-api/nova-conductor processes should I run?"
>> Well, what hardware is being used, how much traffic do you get, is it
>> bursty or sustained, are instances created and left alone or are they
>> torn down regularly, do you prune your database, what version of rabbit
>> are you using, etc...
>>
>> I would expect the best answer(s) to this question are going to come
>> from the operators themselves. What I've seen with cellsv1 is that
>> someone will decide for themselves that they should put no more than X
>> computes in a cell and that information filters out to other operators.
>> That provides a starting point for a new deployment to tune from.
>
> I don't think we need "don't go larger than N nodes" kind of advice. But
> we should probably know what kinds of things we expect to be hot spots.
> Like mysql load, possibly indicated by system load or high level of db
> conflicts. Or rabbit mq load. Or something along those lines.
>
> Basically the things to look out for that indicate your are approaching
> a scale point where cells is going to help. That also helps in defining
> what kind of scaling issues cells won't help on, which need to be
> addressed in other ways (such as optimizations).

Big +1 if we can really get out of the behavior/pattern of 
thinking/thought of guessing at the overall system characteristics 
*somehow* I think it would be great for our own communities maturity and 
for each project/s. Even though I know such things are hard, it scares 
the bejeezus out of me when we (as a group) create software but can't 
give recommendations on its behavioral characteristics (we aren't doing 
quantum physics here the last time I checked).

Just some ideas:

* Rally maybe can help here?
* Fixing a standard set of configuration options and testing that at 
scale (using the intel lab?) - and then possibly using rally (or other) 
to probe the system characteristics and then giving recommendations 
before releasing the software for general consumption based on observed 
system characteristics (this is basically what operators are going to 
have to do anyway to qualify a release, especially if the community 
isn't doing it and/or is shying away from doing it).

I just have a hard time accepting that tribal knowledge about scale that 
has to filter from operators to operator (yes I know from personal 
experience this is how things trickled down) is a good way to go. It 
reminds me of the medicine and practices in the late 1800s where all 
sorts of quackery science was happening; and IMHO we can do better than 
this :)

Anyway, back to your regularly scheduled programming,

-Josh



More information about the OpenStack-dev mailing list