[oslo][nova][stable][requirements] Fixing a high CPU usage from oslo.service into stable/rocky branch

Herve Beraud hberaud at redhat.com
Wed Nov 21 14:47:25 UTC 2018


Hey all!

Here is a thread to coordinate all the teams (oslo, nova, stable,
requirements) working on the update of the oslo.service constraint in the
Rocky requirements.

# Summary

Usage of threading event with eventlet caused inefficient code (causing
many useless system calls and  high CPU usage).
This issue was already fixed on oslo.service master and we also want to fix
it in stable/rocky.

Our main issue is how to fix the high CPU usage on stable/rocky without
break the nova CI.

Indeed, we already have backported the eventlet related fix to oslo.service
but this fix requires also a nova update to avoid nova CI errors due to
threading removal on oslo.service that introduce the nova CI errors.

A fix was proposed and merged on oslo.service master to introduce a new
feature (fixture) that avoid the nova CI errors, but
backporting the master fix to Rocky introduces a new feature into a stable
branch so this is also an issue.

So we need to discuss with all the teams to find a proper solution.

# History

A few weeks ago this issue was opened on oslo.service (
https://bugs.launchpad.net/oslo.service/+bug/1798774) and it was fixed by
this submited patch on the master branch (
https://review.openstack.org/#/c/611807/ ).

This change use the proper event primitive to fix the performance issue.

A new version of oslo.service was released (1.32.1)

Since these changes was introduced into oslo.service master, nova facing
some issues into the master CI process, due to the threading changes, and
they was fixed by these patches ( https://review.openstack.org/#/c/615724/,
https://review.openstack.org/#/c/617989/ ) into master.

Few weeks ago I have backport to oslo.service some changes (
https://review.openstack.org/#/c/614489/ ) from master to stable/rocky to
also fix the problem in the rocky release.

When this backport was merged we have created a new release of oslo.service
(1.31.6) ( https://review.openstack.org/#/c/616505/ ) (stable/rocky
version).

Then the openstack proposal bot submit a patch to requirements on stable
rocky to update the oslo.service version with the latest version (1.31.6)
but if we'll use it we'll then break the CI
https://review.openstack.org/#/c/618834/ so this patch is currently blocked
to avoid nova CI error.

# Issue

Since the oslo.services threading changes were backported to rocky we risk
to  faces the same issues inside the nova rocky CI if we update the
requirements.

In parallel in oslo.service we have started to backport a new patch who
introduces fixture  ( https://review.openstack.org/#/c/617989/ ) from
master to rocky, and also we start to backport on nova rocky branch (
https://review.openstack.org/619019, https://review.openstack.org/619022 )
patches who use oslo.service.fixture and who solve the nova CI issue. The
patch on oslo.service exposes a public oslo_service.fixture.SleepFixture
for this purpose. It can be maintained opaquely as internals change without
affecting its consumers.

The main problem is that the patch bring a new functionality to a stable
branch (oslo.service rocky) but this patch help to fix the nova issue.

Also openstack proposal bot submit a patch to requirements on stable rocky
to update the oslo.service version with the latest version (1.31.6) but if
we'll use it we'll then break the CI
https://review.openstack.org/#/c/618834/ since the oslo service 1.31.6 is
incompatible with novas stable rocky unittest due to the threading changes.

# Questions and proposed solutions

This thread try to summarize the current situation.

We need to find how to be able to proceed, so this thread aim to allow to
discuss between team to find the best way to fix.

1. Do we need to continue to try to backport fixture on oslo.service to fix
the CI problem (https://review.openstack.org/#/c/617989/) ?

2. Do we need to find an another approach like mocking
oslo.service.loopingcall._Event.wait in nova instead of mocking
oslo_service.loopingcall._ThreadingEvent.wait (example:
https://review.openstack.org/#/c/616697/2/nova/tests/unit/compute/test_compute_mgr.py)
?
This is only a fix on the nova side and it allows us to update oslo.service
requirements and allows us to fix the high CPU usage issue. I've submit
this patch (https://review.openstack.org/619246) who implement the
description above.

Personaly I think we need to find an another approach like the mocking
remplacement (c.f 2).

We need to decide which way we use and to discuss about other solutions.

-- 
Hervé Beraud
Senior Software Engineer
Red Hat - Openstack Oslo
irc: hberaud
-----BEGIN PGP SIGNATURE-----

wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+
Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+
RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP
F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G
5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g
glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw
m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ
hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0
qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y
F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3
B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O
v6rDpkeNksZ9fFSyoY2o
=ECSj
-----END PGP SIGNATURE-----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20181121/8e6c7d97/attachment-0001.html>


More information about the openstack-discuss mailing list