Re: openstack-discuss Digest, Vol 72, Issue 24

5 Oct 2024


      Good morning, I'm testing openstack (devstack). I downloaded an ubuntu
qcow2 image, I created the instance from this image, but when I access the
console => I find the message: booting form hard disk and nothing happens
after

Le sam. 5 oct. 2024 à 15:34, BEDDA Fadhel <fadhel.bedda@gmail.com> a écrit :
...
Good morning, I'm testing openstack (devstack). I downloaded an ubuntu
qcow2 image, I created the image from this image, but when I access the
console => I find the message: booting form hard disk and nothing happens
after
Le sam. 5 oct. 2024 à 12:50, <
openstack-discuss-request@lists.openstack.org> a écrit :
...
Send openstack-discuss mailing list submissions to
        openstack-discuss@lists.openstack.org
To subscribe or unsubscribe via email, send a message with subject or
body 'help' to
        openstack-discuss-request@lists.openstack.org
You can reach the person managing the list at
        openstack-discuss-owner@lists.openstack.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of openstack-discuss digest..."
Today's Topics:
1. Re: [all][qa][infra] Restore unit tests for intermediate python
versions
      (Clark Boylan)
   2. Re: Stop bumping major versions when dropping EOL Python versions
      (Elõd Illés)
   3. Re: [oslo.messaging] Heartbeat in pthread (Herve Beraud)
----------------------------------------------------------------------
Message: 1
Date: Fri, 04 Oct 2024 11:22:17 -0700
From: "Clark Boylan" <cboylan@sapwetik.org>
Subject: Re: [all][qa][infra] Restore unit tests for intermediate
        python versions
To: "Takashi Kajinami" <kajinamit@oss.nttdata.com>,
        openstack-discuss@lists.openstack.org
Message-ID: <99b5ae0b-2d69-4040-9d3b-8b28491969a6@app.fastmail.com>
Content-Type: text/plain
...
On 10/5/24 01:47, Clark Boylan wrote:
...
On Fri, Oct 4, 2024, at 9:30 AM, Takashi Kajinami wrote:
...
Hello,
As you may know, current master runs unit tests with only Python 3.9
and Python 3.12.
My understanding about this set up is that we test the minimum
boundary
and the maximum
boundary and assume the change may work with all the versions between
these two.
However in some of my recent work I had to introduce the switch in the
middle of these
two versions. For example
https://review.opendev.org/c/openstack/neutron/+/931271 introduces
the logic to distinguish Python < 3.10 from Python >= 3.10 , and we
don't specifically
test that boundary. I'm afraid that this may potentially cause
uncaught
problems.
One approach that wasn't discussed in the change is to continue to use
On Fri, Oct 4, 2024, at 10:26 AM, Takashi Kajinami wrote:
the existing code (pkg_resources) until python3.12 then only change the
behavior for 3.12. That approach would generally work if we are only
testing the boundaries since we're flipping the behavior on the boundary.
...
For this case we need python 3.11 job to test the boundary. Also this
leaves
...
noisy deprecation warnings in < Python 3.11 .
The lower bound (currently 3.9) would continue to test the old behavior
(pkg_resources) and the upper bound (python3.12) would test the new
behavior (importlib). It would be covered as well as all of the other upper
and lower bound testing we've done. Python3.10 and 3.11 would be "covered"
by the 3.9 test ensuring that pkg_resources continues to work. When 3.9 is
dropped 3.10 would take over this responsibility. Eventually we'll only
have the importlib behavior and python3.12/3.13/etc would cover it.
...
...
...
So my question here is ... Can we test all supported python versions,
not only min/max,
at least in check jobs ? This would helps us catch any regressions
caused by such
implementation.
We can. The main limitation is that we may not have a test platform
for every version of python depending on how the distro releases align with
python releases. Currently I think we have a platform for every release
though (centos/rocky 3.9, jammy 3.10, bookworm 3.11, noble 3.12). If we end
up without a platform for a version of python jobs may need to be updated
to install python from source or some other method.
...
acknowledged
...
...
I know there could be tradeoffs between test coverage and infra
resources, but IMO
this is the test coverage we should fill.
Yes, the main concern was simply to avoid unnecessary effort. In the
case of unittest jobs the vast majority of those are much cheaper in terms
of resources than tempest jobs. I think if we stuck to this primarily for
unittests the impact would be minimal. If we tried to expand the tempest
matrix that would become unwieldy.
...
Yes. I don't intend to run all tests in all versions but at least
having unit test jobs,
which are supposed to be cheap, may avoid 0 test coverage and would be
enough now.
In case we find any real problems then we can expand the scope to other
jobs.
...
...
...
Thank you,
--
Takashi Kajinami
irc: tkajinam
github: https://github.com/kajinamit
launchpad: https://launchpad.net/~kajinamit
------------------------------
Message: 2
Date: Fri, 4 Oct 2024 20:05:02 +0000
From: Elõd Illés <elod.illes@est.tech>
Subject: Re: Stop bumping major versions when dropping EOL Python
        versions
To: Jeremy Stanley <fungi@yuggoth.org>,
        "openstack-discuss@lists.openstack.org"
        <openstack-discuss@lists.openstack.org>
Message-ID:  <VI1P18901MB07518D7DBD7CA7D11AF91E18FF722@VI1P18901MB0751
        .EURP189.PROD.OUTLOOK.COM>
Content-Type: text/plain; charset="utf-8"
Indeed, we started bumping MAJOR version due to dropping py27 support.
Though, the rule is, that a change in required package's version, only
warrants a MINOR version bump. But python is not just a simple dependency,
and dropping support of a python version is more than just bumping any
python package's version. Anyway, if the community wants to stop
bumping major versions in case of dropping python version support,
then i'm probably OK with that.
Cheers,
Előd
________________________________________
From: Jeremy Stanley
Sent: Monday, September 23, 2024 15:22
To: openstack-discuss@lists.openstack.org
Subject: Re: Stop bumping major versions when dropping EOL Python versions
On 2024-09-23 12:25:09 +0100 (+0100), Stephen Finucane wrote:
[...]
...
I'd be in favour of getting rid of this particular constraint, and
instead focusing on API changes when deciding whether to bump the
major version. Is this a reasonable suggestion, and if so what
would we need to do to tweak our policy to allow it?
It makes sense to me. Dropping Python 2.x support was a more
significant event, but these days removing support for minor Python
versions after the corresponding CPython interpreter reaches EOL
upstream is fairly unremarkable. We don't perform major version
increments just for raising the minimum required versions of any
other dependencies, after all.
--
Jeremy Stanley
------------------------------
Message: 3
Date: Sat, 5 Oct 2024 12:50:04 +0200
From: Herve Beraud <hberaud@redhat.com>
Subject: Re: [oslo.messaging] Heartbeat in pthread
To: Michel Jouvin <michel.jouvin@ijclab.in2p3.fr>
Cc: openstack-discuss@lists.openstack.org
Message-ID:
        <
CAFDq9gXbkoiRj18zTuqbWAzhg5yhf00JPn6kXh0R83i2CdpyRw@mail.gmail.com>
Content-Type: multipart/alternative;
        boundary="0000000000008792280623b88de9"
Voici la version mise à jour avec la solution de **retry backoff** et les
liens supplémentaires :
Hey folks,
The major concern behind this thread is that RabbitMQ connection drops due
to the absence of a reliable heartbeat.
While `heartbeat_in_pthread=True` aimed to fix this, it introduced other
bugs.
Indeed, the Greenlet documentation is pretty clear, the limitations
between
Python threads and greenlets lead to issues.
As eventlet is itself based on greenlet it leads to recurring issues in
our
stacks.
The heartbeat_in_pthread bugs are living examples of this kind of issue.
For this reason, we support keeping `heartbeat_in_pthread` disabled by
default.
As a workaround, adjusting the RabbitMQ `heartbeat_timeout` and
`rabbit_heartbeat_timeout_threshold` can mitigate connection drops.
Additionally, oslo.messaging offers the `connection_retry_interval` and
`connection_retry_backoff` parameters,
which implement retry backoff strategies to better handle connection
drops.
This ensures that the system can manage reconnections more efficiently.
We encourage investigating these paths to mitigate the connection
problems.
For more details please read:
- https://greenlet.readthedocs.io/en/latest/python_threads.html
-
https://docs.openstack.org/oslo.messaging/xena/configuration/opts.html#oslo_...
- https://www.rabbitmq.com/docs/heartbeats
-
https://docs.openstack.org/oslo.messaging/xena/configuration/opts.html#oslo_...
-
https://docs.openstack.org/oslo.messaging/xena/configuration/opts.html#oslo_...
Le jeu. 3 oct. 2024 à 22:46, Michel Jouvin <michel.jouvin@ijclab.in2p3.fr
...
a écrit :
...
Hi Sean,
Not sure why we misunderstood eachother but we agree! I understood your
sentence as "people should avoid changing this option from its default
(False) to True." but I understand now you mean the opposite and I
totally agree based on our experience. Heat seems to be another service
that will be in trouble if it is changed.
Michel
...
On Wed, 2024-10-02 at 13:06 +0200, Michel Jouvin wrote:
...
Hi Sean,
As for the situation in our cloud after reverting to
heartbeat_in_pthread=false, it was a "black and white situation":
creating a cluster was impossible (because of the Heat issue
mentioned)
since we changed to heartbeat_in_pthread=true (but we didn't realize
immediately as we don't create clusters everyday) and restarted to
work
properly immediately after reverting to heartbeat_in_pthread=false.
There is a clear link between this parameter and Heat behaviour
(Caracal
version in our case, so Oslo client 24.0).
As for your last sentence "people should avoid cahnging this option
Le 02/10/2024 à 14:17, smooney@redhat.com a écrit :
form
...
...
tis default fo False.", I think/hope you wanted to say the opposite :
"people should avoid changing this option form its default to
True."...
no at least for nova we default to false in our downstream product
https://github.com/openstack-k8s-operators/nova-operator/blob/main/templates...
...
...
we had signifcant ci usee when we had it set to true orgianly because
of
the oslo.log issue
...
but we didn not revert to enabling this for nova-api after that was
backported because
we have not seen any sideefct from seting it to false.
we have never set this to true in OSP to my knolage for nova-comptue
puppet-nova considered it experimental
https://opendev.org/openstack/puppet-nova/src/commit/17bd61e042591305e461e5c...
...
...
in tripleo we disabled it in many services including heat and nova
https://github.com/openstack-archive/tripleo-heat-templates/commit/cf4d4f881...
...
...
in kolla it also default to false for nova-compute and other eventlet
services
...
https://github.com/openstack/kolla-ansible/blob/2218b7852fda94d0f498d5140f71...
...
...
although it is enabel for nova-api and some heat compoents
https://github.com/openstack/kolla-ansible/blob/2218b7852fda94d0f498d5140f71...
...
...
...
...
at least for nova i would not recommend using
heartbeat_in_pthread = True for any serrvice
nova-api runnign under uwsgi or mod_wsgi is the only possibel
excpetion
and even then i woudl discurage it.
...
i cant really speak to other services but i think
`heartbeat_in_pthread
= false` is generally the correct default.
...
...
Michel
Le 02/10/2024 à 11:44, smooney@redhat.com a écrit :
...
On Tue, 2024-10-01 at 22:32 +0200, Michel Jouvin wrote:
> Hi,
>
> I am not an expert in these matters but we recently suffered the
problem
...
...
...
> of client deconnection in RabbitMQ due to the heartbeat timeout
and I
> confirm it was a disaster for the cloud usage with many things not
> working properly (we are running Antelope, except
Barbican/Heat/Magnum
> where we run Caracal). The reason is still not clear for me, it was
> fixed by increasing the heartbeat timeout but at the same time, my
> colleague who implemented the change also defined
> heartbeat_in_pthread=true for all the services, something normally
> unnecessary as we configure uwsgi or Apache to use only one thread
(and
> several processes). Initially we didn't see any bad impact of this
> setting but a few days ago users started to report that Magnum
cluster
> creation was failing due a "response timeout" in Heat during the
master
> software deployment.
>
> Reading this thread this morning I had the idea if could be the
https://github.com/openstack/kolla-ansible/blob/2218b7852fda94d0f498d5140f71...
source
...
...
...
...
> of the problem (as the service was running properly a couple of
weeks
> ago, before the change). We reverted the change and defined
> heartbeat_in_pthread=false and it restored the normal behaviour of
Heat.
> We have not seen a negative impact on other services so far. So I
> confirm that setting this parameter to false by default seems a
good
> idea and that setting it to true can break some services like Heat.
thank you for the data point, im sure you will monitor the
situration
in your
clodu but please let use know in a week or two if the heat/magnum
issues you
obsevered retrun or if the could continue to fucntion normally,
i expect it to but again it would be a good data point.
> Cheers,
>
> Michel
>
> Le 01/10/2024 à 16:31, Arnaud Morin a écrit :
>> Hey,
>>
>> I totally agree about the fact that heartbeat_in_pthread and the
>> oslo.log PipeMutex are technical debt that we need to get rid of,
>> as well as eventlet.
>>
>> However, despite the fact that it seems purely cosmetic on your
side,
>> we believe it's not.
>> I can't prove / reproduce the issue on a small infra, but
definetely,
>> at large scale, having those tcp connections to be dropped by
rabbitmq
>> and recreated in a loop by agents is affecting the cluster.
>>
>> I know all the pain that these settings introduced in the past,
but
now
>> I feel we are in a stable situation regarding this, that's why I
am
>> surprised about deprecating heartbeat_in_pthread now.
deprecateign a config option requires the deprecation to be
advertised
in a slrup
before it can then be removed in a follwoing release.
Given the deprecation was done in dalmaition 2024.2 which is not a
slurp release the removal
cannot take effect in 2025.1, 2025.2 is the earliest release we
could
remove this option.
as a result i think maintaining the deprecation is correct here.
we may decied not to remove this until 2026.1 or later but i think
its
correct to send the
message that people should avoid cahnging this option form tis
default
fo False.
we coudl even tag this option as advanced to make that more clear
...
...
...
...
>> Can we, as least, make sure we keep all of this until we switch
off
...
>> eventlet?
>> In other words, can we get rid of eventlet, then remove this
https://docs.openstack.org/oslo.config/latest/reference/defining.html#advanc...
params?
...
...
...
...
>> and not the opposite?
>>
>> Regards,
>>
>> Arnaud
--
Hervé Beraud
Senior Software Engineer at Red Hat
irc: hberaud
https://github.com/4383/