[oslo][nova][stable][requirements] Fixing a high CPU usage from oslo.service into stable/rocky branch

Sean Mooney smooney at redhat.com
Wed Nov 21 15:02:00 UTC 2018


On Wed, 2018-11-21 at 09:57 -0500, Doug Hellmann wrote:
> Herve Beraud <hberaud at redhat.com> writes:
> 
> > Hey all!
> > 
> > Here is a thread to coordinate all the teams (oslo, nova, stable,
> > requirements) working on the update of the oslo.service constraint in the
> > Rocky requirements.
> > 
> > # Summary
> > 
> > Usage of threading event with eventlet caused inefficient code (causing
> > many useless system calls and  high CPU usage).
> > This issue was already fixed on oslo.service master and we also want to fix
> > it in stable/rocky.
> > 
> > Our main issue is how to fix the high CPU usage on stable/rocky without
> > break the nova CI.
> > 
> > Indeed, we already have backported the eventlet related fix to oslo.service
> > but this fix requires also a nova update to avoid nova CI errors due to
> > threading removal on oslo.service that introduce the nova CI errors.
> > 
> > A fix was proposed and merged on oslo.service master to introduce a new
> > feature (fixture) that avoid the nova CI errors, but
> > backporting the master fix to Rocky introduces a new feature into a stable
> > branch so this is also an issue.
> > 
> > So we need to discuss with all the teams to find a proper solution.
> > 
> > # History
> > 
> > A few weeks ago this issue was opened on oslo.service (
> > https://bugs.launchpad.net/oslo.service/+bug/1798774) and it was fixed by
> > this submited patch on the master branch (
> > https://review.openstack.org/#/c/611807/ ).
> > 
> > This change use the proper event primitive to fix the performance issue.
> > 
> > A new version of oslo.service was released (1.32.1)
> > 
> > Since these changes was introduced into oslo.service master, nova facing
> > some issues into the master CI process, due to the threading changes, and
> > they was fixed by these patches ( https://review.openstack.org/#/c/615724/,
> > https://review.openstack.org/#/c/617989/ ) into master.
> > 
> > Few weeks ago I have backport to oslo.service some changes (
> > https://review.openstack.org/#/c/614489/ ) from master to stable/rocky to
> > also fix the problem in the rocky release.
> > 
> > When this backport was merged we have created a new release of oslo.service
> > (1.31.6) ( https://review.openstack.org/#/c/616505/ ) (stable/rocky
> > version).
> > 
> > Then the openstack proposal bot submit a patch to requirements on stable
> > rocky to update the oslo.service version with the latest version (1.31.6)
> > but if we'll use it we'll then break the CI
> > https://review.openstack.org/#/c/618834/ so this patch is currently blocked
> > to avoid nova CI error.
> > 
> > # Issue
> > 
> > Since the oslo.services threading changes were backported to rocky we risk
> > to  faces the same issues inside the nova rocky CI if we update the
> > requirements.
> > 
> > In parallel in oslo.service we have started to backport a new patch who
> > introduces fixture  ( https://review.openstack.org/#/c/617989/ ) from
> > master to rocky, and also we start to backport on nova rocky branch (
> > https://review.openstack.org/619019, https://review.openstack.org/619022 )
> > patches who use oslo.service.fixture and who solve the nova CI issue. The
> > patch on oslo.service exposes a public oslo_service.fixture.SleepFixture
> > for this purpose. It can be maintained opaquely as internals change without
> > affecting its consumers.
> > 
> > The main problem is that the patch bring a new functionality to a stable
> > branch (oslo.service rocky) but this patch help to fix the nova issue.
> > 
> > Also openstack proposal bot submit a patch to requirements on stable rocky
> > to update the oslo.service version with the latest version (1.31.6) but if
> > we'll use it we'll then break the CI
> > https://review.openstack.org/#/c/618834/ since the oslo service 1.31.6 is
> > incompatible with novas stable rocky unittest due to the threading changes.
> > 
> > # Questions and proposed solutions
> > 
> > This thread try to summarize the current situation.
> > 
> > We need to find how to be able to proceed, so this thread aim to allow to
> > discuss between team to find the best way to fix.
> > 
> > 1. Do we need to continue to try to backport fixture on oslo.service to fix
> > the CI problem (https://review.openstack.org/#/c/617989/) ?
> > 
> > 2. Do we need to find an another approach like mocking
> > oslo.service.loopingcall._Event.wait in nova instead of mocking
> > oslo_service.loopingcall._ThreadingEvent.wait (example:
> > https://review.openstack.org/#/c/616697/2/nova/tests/unit/compute/test_compute_mgr.py)
> > ?
> > This is only a fix on the nova side and it allows us to update oslo.service
> > requirements and allows us to fix the high CPU usage issue. I've submit
> > this patch (https://review.openstack.org/619246) who implement the
> > description above.
> > 
> > Personaly I think we need to find an another approach like the mocking
> > remplacement (c.f 2).
> > 
> > We need to decide which way we use and to discuss about other solutions.
> > 
> > -- 
> > Hervé Beraud
> > Senior Software Engineer
> > Red Hat - Openstack Oslo
> > irc: hberaud
> > -----BEGIN PGP SIGNATURE-----
> > 
> > wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+
> > Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+
> > RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP
> > F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G
> > 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g
> > glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw
> > m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ
> > hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0
> > qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y
> > F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3
> > B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O
> > v6rDpkeNksZ9fFSyoY2o
> > =ECSj
> > -----END PGP SIGNATURE-----
> 
> Thank you for summarizing this issue, Hervé, and for working on the
> patches we need.
> 
> I think I would be happy with either solution. Using clean backports
> seems less risky, and even though we are adding a new feature to
> oslo.service it's only a unit test fixture. On the other hand if we want
> to be very strict about not adding features in stable branches and we
> are OK with creating a change to nova's unit tests that is not
> backported from master, then that works for me, too.

it should be noted this is not just a blocker for the nova ci
if we dont fix the unit test it will break ditrobution that run
the unitest as part of there packaging of nova downstream.
i would prefer to backport the fixture personally and do a clean backport of the nova patches
also rather then a stable only patch. while thecnically a feature i dont really consider a
test fixture to be a feature in the normal sense and it is relitivly small and self contained.
> 
> I have a slight preference for the first proposal, but not strong enough
> to vote fight for it if the majority decides to go with the second
> option.
> 



More information about the openstack-discuss mailing list