[oslo][stable] Backport of the default value of the config option change

Takashi Kajinami tkajinam at redhat.com
Tue Aug 9 08:31:34 UTC 2022


I tend to approve the backport as an exception, based on the following
points.

- The old default has been used for a long time and it has been proven to
be stable.
  It changes behavior but it does not require any change in the other
components (No change is
  required in rabbitmq, for example).

- The reason we made that switch was to just get rid of "noisy" heartbeat
warning, which did not
  affect actual functionality, and I don't expect any risks with restoring
these warning logs.

- On the other hand, we've learned the current default causes broken
functionality of non-wsgi services,
  which has a huge impact based on our current architecture. The issue was
already confirmed by multiple
  organizations. What is worse, debugging the issue is quite difficult

- Earlier it was suggested that the users should configure the parameter
according to the process architecture,
  but it's not quite easy to determine the proper setup unless you have
basic understanding about OpenStack
  architecture. Also, not all deployment toolings support setting options
per service (Neither Puppet OpenStack or
  TripleO supports it now). Using the more "safe" default would be much
beneficial for users/operators.

In the meantime I'd also look into the way to override the option in
deployment toolings, with my hat
as Puppet OpenStack Core and TripleO core on, but backporting the change is
something worth justifying
IMHO.

On Tue, Aug 9, 2022 at 1:30 AM Sean Mooney <smooney at redhat.com> wrote:

> On Mon, Aug 8, 2022 at 4:37 PM Dmitriy Rabotyagov
> <noonedeadpunk at gmail.com> wrote:
> >
> > Hey
> >
> > At the very least in OpenStack-Ansible we already handle that case,
> > and have overwritten heartbeat_in_pthread for non-UWSGI services,
> > which is already in stable branches. So backporting this new default
> > setting would make us revert this patch and apply a set of new ones
> > for uWSGI which is kind of nasty thing to do on stable branches.
> >
> > IIRC (can be wrong here), kolla-ansible and TripleO also adopted such
> > changes in their codebase.
> Tripleo is specificly broken by the current default in wallaby
> Slawek raised this question of backporting partly because we are
> trying to decied fi we
> need to backport this downstream only for our osp product or modify
> tripleo/puppet to override
> this.
>
> we would strongly prefer not to ship a different default in our
> product then upstream if we can avoid it
> but we likely cannot release with the current defaut without either
> changing this downstream or upstrema in ooo.
>
> >  So with quite high probability, if you use
> > any deployment tooling, this should be already handled relatively
> > well.
> >
> > We also can post a release note to stable branches about "known issue"
> > instead of backporting a new default.
> >
> > пн, 8 авг. 2022 г. в 12:46, Radosław Piliszek <
> radoslaw.piliszek at gmail.com>:
> > >
> > > Hi all,
> > >
> > > May this config option support "auto" by default and autodetect
> > > whether the application is running under mod_wsgi (and uwsgi if it
> > > also has the issue with green threads but here I'm not really sure...)
> > > and then decide on the best option?
> > > This way I would consider this backporting a fix (i.e. the library
> > > tries better to work in the target environment).
> > >
> > > As a final thought, bear in mind there are operators who have already
> > > overwritten the default, the deployment projects can help as well.
> > >
> > > -yoctozepto
> > >
> > > On Mon, 8 Aug 2022 at 10:30, Rodolfo Alonso Hernandez
> > > <ralonsoh at redhat.com> wrote:
> > > >
> > > > Hello all:
> > > >
> > > > I understand that by default we don't allow backporting a config
> knob default value. But I'm with Sean and his explanation. For "uwsgi"
> applications, if pthread is False, the only drawback will be the
> reconnection of the MQ socket. But in the case described by Slawek, the
> problem is more relevant because once the agent has been disconnected for a
> long time from the MQ, it is not possible to reconnect again and the agent
> needs to be manually restarted. I would backport the patch setting this
> config knob to False.
> > > >
> > > > Regards.
> > > >
> > > >
> > > > On Sat, Aug 6, 2022 at 12:08 AM Sean Mooney <smooney at redhat.com>
> wrote:
> > > >>
> > > >> On Fri, Aug 5, 2022 at 7:40 PM Ghanshyam Mann <
> gmann at ghanshyammann.com> wrote:
> > > >> >
> > > >> >  ---- On Fri, 05 Aug 2022 17:54:25 +0530  Slawek Kaplonski  wrote
> ---
> > > >> >  > Hi,
> > > >> >  >
> > > >> >  > Some time ago oslo.messaging changed default value of the
> "heartbeat_in_pthread" config option to "True" [1].
> > > >> >  > As was noticed some time ago, this don't works well with
> nova-compute - see bug [2] for details.
> > > >> >  > Recently we noticed in our downstream Red Hat OpenStack, that
> it's not only nova-compute which don't works well with it and can hangs. We
> saw the same issue in various neutron agent processes. And it seems that it
> can be the same for any non-wsgi service which is using rabbitmq to send
> heartbeats.
> > > >> >  > So giving all of that, I just proposed change of the default
> value of that config option to be "False" again [3].
> > > >> >  > And my question is - would it be possible and acceptable to
> backport such change up to stable/wallaby (if and when it will be approved
> for master of course). IMO this could be useful for users as using this
> option set as "True" be default don't makes any sense for the non-wsgi
> applications really and may cause more bad then good things really. What
> are You opinions about it?
> > > >> >
> > > >> > This is tricky, in general the default value change should not be
> backported because it change
> > > >> > the default behavior and so does the compatibility. But along
> with considering the cases do not
> > > >> > work with the current default value (you mentioned in this
> email), we should consider if this worked
> > > >> > in any other case or not. If so then I think we should not
> backport this and tell operator to override
> > > >> > it to False as workaround for stable branch fixes.
> > > >> as afar as i am aware the only impact of setting the default to
> false
> > > >> for wsgi applications is
> > > >> running under mod_wsgi or uwsgi may have the heatbeat greenthread
> > > >> killed when the wsgi server susspand the application
> > > >> after a time out following the processing of an api request.
> > > >>
> > > >> there is no known negitive impact to this other then a log message
> > > >> that can safely be ignored on both rabbitmq and the api log relating
> > > >> to the amqp messing connection being closed and repopend.
> > > >>
> > > >> keeping the value at true can cause the nova compute agent, neutron
> > > >> agent and i susppoct nova conductor/schduler to hang following a
> > > >> rabbitmq disconnect.
> > > >> that can leave the relevnet service unresponcei until its restarted.
> > > >>
> > > >> so having the default set to true is known to breake several
> services
> > > >> but tehre are no know issue that are caused by setting it to false
> > > >> that impact the operation fo any service.
> > > >>
> > > >> so i have a stong preference for setting thsi to false by default on
> > > >> stable branches.
> > > >> >
> > > >> > -gmann
> > > >> >
> > > >> >  >
> > > >> >  > [1]
> https://review.opendev.org/c/openstack/oslo.messaging/+/747395
> > > >> >  > [2] https://bugs.launchpad.net/oslo.messaging/+bug/1934937
> > > >> >  > [3]
> https://review.opendev.org/c/openstack/oslo.messaging/+/852251/
> > > >> >  >
> > > >> >  > --
> > > >> >  > Slawek Kaplonski
> > > >> >  > Principal Software Engineer
> > > >> >  > Red Hat
> > > >> >
> > > >>
> > > >>
> > >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20220809/1dee1ed8/attachment-0001.htm>


More information about the openstack-discuss mailing list