[oslo][nova][stable][requirements] Fixing a high CPU usage from oslo.service into stable/rocky branch
Herve Beraud
hberaud at redhat.com
Thu Nov 22 16:58:08 UTC 2018
Just a tiny report to summarize the situation to offer a big picture and to
analyze proposed solutions since peoples have submits new patches and some
other people have shared their opinions.
Each solution contains the impacted projects, the related gerrit reviews,
pro/cons.
### 1 - Backport master changer and also backport new feature into a stable
branch
impacted projects: oslo.service, nova
related reviews:
- https://review.openstack.org/617989
- https://review.openstack.org/619022
- https://review.openstack.org/619019
pros:
- follow the master changes
- just use a backport
cons:
- backport a new feature into a stable branch and then this a violation of
the stable policy
- introduce a semver minor bump and a version 1.32 already exist
- won't be able to use either an older nova with a newer oslo.service
- increasing lower-constraints is not allowed on stable branches
- vendors need to ensure that there is a 'fixture' package available
### 2 - Update the nova test to mock the new private interface
impacted projects: nova
reviews:
- https://review.openstack.org/619246
pros:
- straightforward
cons:
- still mocking a private inferface
- stable only patches
### 3 - Maintain a private interface for ThreadingEvent only on stable/rocky
impacted projects: oslo.service
related reviews:
- https://review.openstack.org/619342/
pros:
- straightforward
cons:
- Changing the loopingcall module semantics as it is different type
- stable only patches
- misunderstoud service customer between Threading, eventlet, etc.. and
master behavior
### 4 - Don't use private interface in oslo.service
impacted projects: nova
related reviews:
- https://review.openstack.org/#/c/619360/
pros:
- straightforward
cons:
- this could work but it is not the same sematics as before as the type has
changed
- stable only patches
- misunderstoud service customer between Threading, eventlet, etc.. and
master behavior
### 5 - Leave the CPU bug open on rocky
impacted projects: oslo.service
related reviews: -
pros:
- Nova project doesn't impacted
cons:
- reintroduce the CPU issue
### 6 - Revert CPU fix and totally rework it into someting that doesn't
break the Nova CI
impacted projects: oslo.service
related reviews: -
pros:
- Nova project doesn't impacted
cons:
- potentially introduce more rewrites in the futures that depends on fix on
oslo.service loopingcall master
- stable only patches
- increase potential backport difficulties on oslo.service upstream and
downstream
- increase work load on upstream and downstream
Personally:
- I prefer the #4 or the #2 and they seem to be smooth changes without
earthquake or anything like this
- the #6 seems to be the worst solution on my side
- the #1 introduce semver issue and policy violations so I don't think we
can continue with it
Thoughts?
I hope this summarize help you to have a better overview :)
I hope I have not forgotten anything and if so I apologize in advance.
Le mer. 21 nov. 2018 à 15:47, Herve Beraud <hberaud at redhat.com> a écrit :
> Hey all!
>
> Here is a thread to coordinate all the teams (oslo, nova, stable,
> requirements) working on the update of the oslo.service constraint in the
> Rocky requirements.
>
> # Summary
>
> Usage of threading event with eventlet caused inefficient code (causing
> many useless system calls and high CPU usage).
> This issue was already fixed on oslo.service master and we also want to
> fix it in stable/rocky.
>
> Our main issue is how to fix the high CPU usage on stable/rocky without
> break the nova CI.
>
> Indeed, we already have backported the eventlet related fix to
> oslo.service but this fix requires also a nova update to avoid nova CI
> errors due to threading removal on oslo.service that introduce the nova CI
> errors.
>
> A fix was proposed and merged on oslo.service master to introduce a new
> feature (fixture) that avoid the nova CI errors, but
> backporting the master fix to Rocky introduces a new feature into a
> stable branch so this is also an issue.
>
> So we need to discuss with all the teams to find a proper solution.
>
> # History
>
> A few weeks ago this issue was opened on oslo.service (
> https://bugs.launchpad.net/oslo.service/+bug/1798774) and it was fixed by
> this submited patch on the master branch (
> https://review.openstack.org/#/c/611807/ ).
>
> This change use the proper event primitive to fix the performance issue.
>
> A new version of oslo.service was released (1.32.1)
>
> Since these changes was introduced into oslo.service master, nova facing
> some issues into the master CI process, due to the threading changes, and
> they was fixed by these patches ( https://review.openstack.org/#/c/615724/,
> https://review.openstack.org/#/c/617989/ ) into master.
>
> Few weeks ago I have backport to oslo.service some changes (
> https://review.openstack.org/#/c/614489/ ) from master to stable/rocky to
> also fix the problem in the rocky release.
>
> When this backport was merged we have created a new release of
> oslo.service (1.31.6) ( https://review.openstack.org/#/c/616505/ )
> (stable/rocky version).
>
> Then the openstack proposal bot submit a patch to requirements on stable
> rocky to update the oslo.service version with the latest version (1.31.6)
> but if we'll use it we'll then break the CI
> https://review.openstack.org/#/c/618834/ so this patch is currently
> blocked to avoid nova CI error.
>
> # Issue
>
> Since the oslo.services threading changes were backported to rocky we
> risk to faces the same issues inside the nova rocky CI if we update the
> requirements.
>
> In parallel in oslo.service we have started to backport a new patch who
> introduces fixture ( https://review.openstack.org/#/c/617989/ ) from
> master to rocky, and also we start to backport on nova rocky branch (
> https://review.openstack.org/619019, https://review.openstack.org/619022
> ) patches who use oslo.service.fixture and who solve the nova CI issue. The
> patch on oslo.service exposes a public oslo_service.fixture.SleepFixture
> for this purpose. It can be maintained opaquely as internals change without
> affecting its consumers.
>
> The main problem is that the patch bring a new functionality to a stable
> branch (oslo.service rocky) but this patch help to fix the nova issue.
>
> Also openstack proposal bot submit a patch to requirements on stable rocky
> to update the oslo.service version with the latest version (1.31.6) but if
> we'll use it we'll then break the CI
> https://review.openstack.org/#/c/618834/ since the oslo service 1.31.6 is
> incompatible with novas stable rocky unittest due to the threading changes.
>
> # Questions and proposed solutions
>
> This thread try to summarize the current situation.
>
> We need to find how to be able to proceed, so this thread aim to allow to
> discuss between team to find the best way to fix.
>
> 1. Do we need to continue to try to backport fixture on oslo.service to
> fix the CI problem (https://review.openstack.org/#/c/617989/) ?
>
> 2. Do we need to find an another approach like mocking
> oslo.service.loopingcall._Event.wait in nova instead of mocking
> oslo_service.loopingcall._ThreadingEvent.wait (example:
> https://review.openstack.org/#/c/616697/2/nova/tests/unit/compute/test_compute_mgr.py)
> ?
> This is only a fix on the nova side and it allows us to update
> oslo.service requirements and allows us to fix the high CPU usage issue.
> I've submit this patch (https://review.openstack.org/619246) who
> implement the description above.
>
> Personaly I think we need to find an another approach like the mocking
> remplacement (c.f 2).
>
> We need to decide which way we use and to discuss about other solutions.
>
> --
> Hervé Beraud
> Senior Software Engineer
> Red Hat - Openstack Oslo
> irc: hberaud
> -----BEGIN PGP SIGNATURE-----
>
> wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+
> Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+
> RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP
> F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G
> 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g
> glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw
> m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ
> hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0
> qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y
> F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3
> B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O
> v6rDpkeNksZ9fFSyoY2o
> =ECSj
> -----END PGP SIGNATURE-----
>
>
--
Hervé Beraud
Senior Software Engineer
Red Hat - Openstack Oslo
irc: hberaud
-----BEGIN PGP SIGNATURE-----
wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+
Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+
RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP
F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G
5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g
glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw
m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ
hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0
qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y
F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3
B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O
v6rDpkeNksZ9fFSyoY2o
=ECSj
-----END PGP SIGNATURE-----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20181122/13fa580a/attachment-0001.html>
More information about the openstack-discuss
mailing list