[openstack-dev] [Tempest][Production] Tempest / the gate / real world load
Maru Newby
marun at redhat.com
Mon Jan 13 20:52:53 UTC 2014
I'm afraid I missed this topic the first time around, and I think it bears revisiting.
tl;dr: I think we should consider ensuring gate stability in the face of resource-starved services by some combination of more intelligent test design and better handling of resource starvation (for example, rate-limiting). Stress-testing would be more effective if it were explicitly focused on real-world usage scenarios and run separately from the gate. I think stress-testing is about the 'when' of failure, whereas the gate is about 'if'.
I don't think it can be argued that OpenStack services (especially Neutron) can do better to ensure reliability under load. Running things in parallel in the gate shone a bright light on many problem areas and that was inarguably a good thing. Now that we have a better sense of the problem, though, it may be time to think about evolving our approach.
From the perspective of gating commits, I think it makes sense to (a) minimize gate execution time and (b) provide some guarantees of reliability under reasonable load. I don't think either of these requires continuing to evaluate unrealistic usage scenarios against services running in a severely resource-starved environment. Every service eventually falls over when too much is asked of it. These kinds of failure are not likely to be particularly deterministic, so wouldn't it make sense to avoid triggering them in the gate as much as possible?
In the specific case of Neutron, the current approach to testing isolation involves creating and tearing down networks at a tremendous rate. I'm not sure anyone can argue that this constitutes a usage scenario that is likely to appear in production, but because it causes problems in the gate, we've had to prioritize working on it over initiatives that might prove more useful to operators. While this may have been a necessary stop on the road to Neutron stability, I think it may be worth considering whether we want the gate to continue having an outsized role in defining optimization priorities.
Thoughts?
m.
On Dec 12, 2013, at 11:23 AM, Robert Collins <robertc at robertcollins.net> wrote:
> A few times now we've run into patches for devstack-gate / devstack
> that change default configuration to handle 'tempest load'.
>
> For instance - https://review.openstack.org/61137 (Sorry Salvatore I'm
> not picking on you really!)
>
> So there appears to be a meme that the gate is particularly stressful
> - a bad environment - and that real world situations have less load.
>
> This could happen a few ways: (a) deployers might separate out
> components more; (b) they might have faster machines; (c) they might
> have less concurrent activity.
>
> (a) - unlikely! Deployers will cram stuff together as much as they can
> to save overheads. Big clouds will have components split out - yes,
> but they will also have correspondingly more load to drive that split
> out.
>
> (b) Perhaps, but not orders of magnitude faster, the clouds we run on
> are running on fairly recent hardware, and by using big instances we
> don't get crammed it with that many other tenants.
>
> (c) Almost certainly not. Tempest currently does a maximum of four
> concurrent requests. A small business cloud could easily have 5 or 6
> people making concurrent requests from time to time, and bigger but
> not huge clouds will certainly have that. Their /average/ rate of API
> requests may be much lower, but when they point service orchestration
> tools at it -- particularly tools that walk their dependencies in
> parallel - load is going to be much much higher than what we generate
> with Tempest.
>
> tl;dr : if we need to change a config file setting in devstack-gate or
> devstack *other than* setting up the specific scenario, think thrice -
> should it be a production default and set in the relevant projects
> default config setting.
>
> Cheers,
> Rob
> --
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Converged Cloud
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list