[all] Eventlet broken again with SSL, this time under Python 3.9
Hi, Swift proxy fails with Python 3.9 under Debian Unstable with Python 3.9: http://paste.openstack.org/show/802063/ (that's when doing a simple "openstack container list") I got the same problem with neutron-rpc-server: http://paste.openstack.org/show/802071/ (this happens when neutron-rpc-server tries to tell Nova that my new VM port is up) I didn't really look further, but I guess so many services are affected. This looks like general to OpenStack, and feels like yet-another-problem-with-eventlet, and looks very similar to this one: https://github.com/eventlet/eventlet/issues/677 even though that's here a Python 3.9 issue, not 3.7. I'd very much appreciate if some Eventlet specialist could look into this problem, because that's beyond my skills. Cheers, Thomas Goirand (zigo)
On Thu, 2021-01-28 at 21:02 +0100, Thomas Goirand wrote:
Hi,
Swift proxy fails with Python 3.9 under Debian Unstable with Python 3.9: http://paste.openstack.org/show/802063/
(that's when doing a simple "openstack container list")
I got the same problem with neutron-rpc-server: http://paste.openstack.org/show/802071/
(this happens when neutron-rpc-server tries to tell Nova that my new VM port is up) im surprised you got this far i was still expecting nova-compute to hang connecting to rabbit it did in decemebr has the 0.30.0 release of eventlet fixed that?
I didn't really look further, but I guess so many services are affected. This looks like general to OpenStack, and feels like yet-another-problem-with-eventlet, and looks very similar to this one: https://github.com/eventlet/eventlet/issues/677
even though that's here a Python 3.9 issue, not 3.7.
I'd very much appreciate if some Eventlet specialist could look into this problem, because that's beyond my skills.
i dont think we have any on this list and as far as i know they still dont fully support python 3.9 in eventlet, if you get past this issue the next thing you will proably hit is the nova vnc proxy failing because websockify double monkey patches the ssl/websockt or something like that.
Cheers,
Thomas Goirand (zigo)
On 1/29/21 1:19 PM, Sean Mooney wrote:
On Thu, 2021-01-28 at 21:02 +0100, Thomas Goirand wrote:
Hi,
Swift proxy fails with Python 3.9 under Debian Unstable with Python 3.9: http://paste.openstack.org/show/802063/
(that's when doing a simple "openstack container list")
I got the same problem with neutron-rpc-server: http://paste.openstack.org/show/802071/
(this happens when neutron-rpc-server tries to tell Nova that my new VM port is up) im surprised you got this far i was still expecting nova-compute to hang connecting to rabbit it did in decemebr has the 0.30.0 release of eventlet fixed that?
The Debian package for Eventlet 0.26.1-4 contains these patches: https://github.com/eventlet/eventlet/commit/46fc185c8f92008c65aef2713fc1445b... https://github.com/eventlet/eventlet/pull/664 https://github.com/eventlet/eventlet/pull/672 Cheers, Thomas
On Fri, 2021-01-29 at 13:44 +0100, Thomas Goirand wrote:
On 1/29/21 1:19 PM, Sean Mooney wrote:
On Thu, 2021-01-28 at 21:02 +0100, Thomas Goirand wrote:
Hi,
Swift proxy fails with Python 3.9 under Debian Unstable with Python 3.9: http://paste.openstack.org/show/802063/
(that's when doing a simple "openstack container list")
I got the same problem with neutron-rpc-server: http://paste.openstack.org/show/802071/
(this happens when neutron-rpc-server tries to tell Nova that my new VM port is up) im surprised you got this far i was still expecting nova-compute to hang connecting to rabbit it did in decemebr has the 0.30.0 release of eventlet fixed that?
The Debian package for Eventlet 0.26.1-4 contains these patches:
https://github.com/eventlet/eventlet/commit/46fc185c8f92008c65aef2713fc1445b... https://github.com/eventlet/eventlet/pull/664 https://github.com/eventlet/eventlet/pull/672
https://github.com/eventlet/eventlet/pull/664 was not in the version i used in early decmber so that is likely what allows nova to not hang. the ssl issue i expect to still impact the vnc proxy but i might try it again and see. i think all 3 of those are in 0.30.0
Cheers,
Thomas
On 1/29/21 2:53 PM, Sean Mooney wrote:
On Fri, 2021-01-29 at 13:44 +0100, Thomas Goirand wrote:
On 1/29/21 1:19 PM, Sean Mooney wrote:
On Thu, 2021-01-28 at 21:02 +0100, Thomas Goirand wrote:
Hi,
Swift proxy fails with Python 3.9 under Debian Unstable with Python 3.9: http://paste.openstack.org/show/802063/
(that's when doing a simple "openstack container list")
I got the same problem with neutron-rpc-server: http://paste.openstack.org/show/802071/
(this happens when neutron-rpc-server tries to tell Nova that my new VM port is up) im surprised you got this far i was still expecting nova-compute to hang connecting to rabbit it did in decemebr has the 0.30.0 release of eventlet fixed that?
The Debian package for Eventlet 0.26.1-4 contains these patches:
https://github.com/eventlet/eventlet/commit/46fc185c8f92008c65aef2713fc1445b... https://github.com/eventlet/eventlet/pull/664 https://github.com/eventlet/eventlet/pull/672
https://github.com/eventlet/eventlet/pull/664 was not in the version i used in early decmber
Indeed. Neither the other 2.
so that is likely what allows nova to not hang.
Yeah.
the ssl issue i expect to still impact the vnc proxy but i might try it again and see.
i think all 3 of those are in 0.30.0
Eventlet being a very touchy dependency of OpenStack, we very much prefer cherry-picking patches whenever possible. However, I tried 0.30.0, and it didn't solve the SSL problem (so I'm back to our patched 0.26.1 Debian release). Cheers, Thomas Goirand (zigo)
In the meanwhile we see that most of the services fail to interact with rabbitmq over self-signed SSL in case RDO packages are used even with Python 3.6. We don't see this happening when installing things with pip packages though. Both rdo and pip version of eventlet we used was 0.30.0. RDO started failing for us several days back with: ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897) Not sure, maybe it's not related directly to eventlet, but sounds like it might be. 29.01.2021, 17:02, "Thomas Goirand" <zigo@debian.org>:
On 1/29/21 2:53 PM, Sean Mooney wrote:
On Fri, 2021-01-29 at 13:44 +0100, Thomas Goirand wrote:
On 1/29/21 1:19 PM, Sean Mooney wrote:
On Thu, 2021-01-28 at 21:02 +0100, Thomas Goirand wrote:
Hi,
Swift proxy fails with Python 3.9 under Debian Unstable with Python 3.9: http://paste.openstack.org/show/802063/
(that's when doing a simple "openstack container list")
I got the same problem with neutron-rpc-server: http://paste.openstack.org/show/802071/
(this happens when neutron-rpc-server tries to tell Nova that my new VM port is up) im surprised you got this far i was still expecting nova-compute to hang connecting to rabbit it did in decemebr has the 0.30.0 release of eventlet fixed that?
The Debian package for Eventlet 0.26.1-4 contains these patches:
https://github.com/eventlet/eventlet/commit/46fc185c8f92008c65aef2713fc1445b... https://github.com/eventlet/eventlet/pull/664 https://github.com/eventlet/eventlet/pull/672 https://github.com/eventlet/eventlet/pull/664 was not in the version i used in early decmber
Indeed. Neither the other 2.
so that is likely what allows nova to not hang.
Yeah.
the ssl issue i expect to still impact the vnc proxy but i might try it again and see.
i think all 3 of those are in 0.30.0
Eventlet being a very touchy dependency of OpenStack, we very much prefer cherry-picking patches whenever possible. However, I tried 0.30.0, and it didn't solve the SSL problem (so I'm back to our patched 0.26.1 Debian release).
Cheers,
Thomas Goirand (zigo)
-- Kind Regards, Dmitriy Rabotyagov
On 1/30/21 10:47 AM, Dmitriy Rabotyagov wrote:
In the meanwhile we see that most of the services fail to interact with rabbitmq over self-signed SSL in case RDO packages are used even with Python 3.6. We don't see this happening when installing things with pip packages though. Both rdo and pip version of eventlet we used was 0.30.0.
RDO started failing for us several days back with: ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)
Not sure, maybe it's not related directly to eventlet, but sounds like it might be.
Does RDO has version 5.0.3 of AMQP and version 5.0.2 of Kombu? That's what I had to do in Debian to pass this stage. Though the next issue is what I wrote, when a service tries to validate a keystone token (ie: keystoneauth1 calls requests that calls urllib3, which in turns calls Python 3.9 SSL, and then crash with maximum recursion depth exceeded). I'm no 100% sure the problem is in Eventlet, but it really looks like it, as it's similar to another SSL crash we had in Python 3.7. Cheers, Thomas Goirand (zigo)
Yeah, they do: [root@centos-distro openstack-ansible]# rpm -qa | egrep "amqp|kombu" python3-kombu-5.0.2-1.el8.noarch python3-amqp-5.0.3-1.el8.noarch [root@centos-distro openstack-ansible]# But not sure about keystoneauth1 since I see this at the point in oslo.messaging. Full error in systemd looks like this: Jan 30 11:51:04 aio1 nova-conductor[97314]: 2021-01-30 11:51:04.543 97314 ERROR oslo.messaging._drivers.impl_rabbit [req-61609624-b577-475d-996e-bc8f9899eae0 - - - - -] Connection failed: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897) 30.01.2021, 12:42, "Thomas Goirand" <zigo@debian.org>:
On 1/30/21 10:47 AM, Dmitriy Rabotyagov wrote:
In the meanwhile we see that most of the services fail to interact with rabbitmq over self-signed SSL in case RDO packages are used even with Python 3.6. We don't see this happening when installing things with pip packages though. Both rdo and pip version of eventlet we used was 0.30.0.
RDO started failing for us several days back with: ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)
Not sure, maybe it's not related directly to eventlet, but sounds like it might be.
Does RDO has version 5.0.3 of AMQP and version 5.0.2 of Kombu? That's what I had to do in Debian to pass this stage.
Though the next issue is what I wrote, when a service tries to validate a keystone token (ie: keystoneauth1 calls requests that calls urllib3, which in turns calls Python 3.9 SSL, and then crash with maximum recursion depth exceeded). I'm no 100% sure the problem is in Eventlet, but it really looks like it, as it's similar to another SSL crash we had in Python 3.7.
Cheers,
Thomas Goirand (zigo)
-- Kind Regards, Dmitriy Rabotyagov
On 1/30/21 1:11 PM, Dmitriy Rabotyagov wrote:
Yeah, they do: [root@centos-distro openstack-ansible]# rpm -qa | egrep "amqp|kombu" python3-kombu-5.0.2-1.el8.noarch python3-amqp-5.0.3-1.el8.noarch [root@centos-distro openstack-ansible]#
But not sure about keystoneauth1 since I see this at the point in oslo.messaging. Full error in systemd looks like this: Jan 30 11:51:04 aio1 nova-conductor[97314]: 2021-01-30 11:51:04.543 97314 ERROR oslo.messaging._drivers.impl_rabbit [req-61609624-b577-475d-996e-bc8f9899eae0 - - - - -] Connection failed: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)
If I'm not mistaking (it's hard to remember, because of the amount of brokenness...), this happens with Eventlet 0.30.0, but not with the patched version of 0.26.1 in Sid/Testing. Are you using a self-signed certificate for the RabbitMQ cluster, meaning having a root CA set in the configuration file, like I do? (my installer maintains a PKI internal to the deployed clusters) Cheers, Thomas Goirand (zigo)
We updated kombu and amqp on Jan 28th in RDO https://review.rdoproject.org/r/#/c/31661/ so it may be related to it. Could you point me to some logs about the failure? Best regards. Alfredo On Sat, Jan 30, 2021 at 1:15 PM Dmitriy Rabotyagov <noonedeadpunk@ya.ru> wrote:
Yeah, they do: [root@centos-distro openstack-ansible]# rpm -qa | egrep "amqp|kombu" python3-kombu-5.0.2-1.el8.noarch python3-amqp-5.0.3-1.el8.noarch [root@centos-distro openstack-ansible]#
But not sure about keystoneauth1 since I see this at the point in oslo.messaging. Full error in systemd looks like this: Jan 30 11:51:04 aio1 nova-conductor[97314]: 2021-01-30 11:51:04.543 97314 ERROR oslo.messaging._drivers.impl_rabbit [req-61609624-b577-475d-996e-bc8f9899eae0 - - - - -] Connection failed: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)
On 1/30/21 10:47 AM, Dmitriy Rabotyagov wrote:
In the meanwhile we see that most of the services fail to interact with rabbitmq over self-signed SSL in case RDO packages are used even with Python 3.6. We don't see this happening when installing things with pip packages
30.01.2021, 12:42, "Thomas Goirand" <zigo@debian.org>: though. Both rdo and pip version of eventlet we used was 0.30.0.
RDO started failing for us several days back with: ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
failed (_ssl.c:897)
Not sure, maybe it's not related directly to eventlet, but sounds like
it might be.
Does RDO has version 5.0.3 of AMQP and version 5.0.2 of Kombu? That's what I had to do in Debian to pass this stage.
Though the next issue is what I wrote, when a service tries to validate a keystone token (ie: keystoneauth1 calls requests that calls urllib3, which in turns calls Python 3.9 SSL, and then crash with maximum recursion depth exceeded). I'm no 100% sure the problem is in Eventlet, but it really looks like it, as it's similar to another SSL crash we had in Python 3.7.
Cheers,
Thomas Goirand (zigo)
-- Kind Regards, Dmitriy Rabotyagov
Yes, I can confirm that amqp of version 5.0.3 and later does not accept self-signed certificates in case root ca has not been provided. It has been bumped to 5.0.5 in u-c recently which made things fail for us everywhere now. However, in case of adding root CA into the system things continue working properly. 01.02.2021, 11:05, "Alfredo Moralejo Alonso" <amoralej@redhat.com>:
We updated kombu and amqp on Jan 28th in RDO https://review.rdoproject.org/r/#/c/31661/ so it may be related to it.
Could you point me to some logs about the failure?
Best regards.
Alfredo
On Sat, Jan 30, 2021 at 1:15 PM Dmitriy Rabotyagov <noonedeadpunk@ya.ru> wrote:
Yeah, they do: [root@centos-distro openstack-ansible]# rpm -qa | egrep "amqp|kombu" python3-kombu-5.0.2-1.el8.noarch python3-amqp-5.0.3-1.el8.noarch [root@centos-distro openstack-ansible]#
But not sure about keystoneauth1 since I see this at the point in oslo.messaging. Full error in systemd looks like this: Jan 30 11:51:04 aio1 nova-conductor[97314]: 2021-01-30 11:51:04.543 97314 ERROR oslo.messaging._drivers.impl_rabbit [req-61609624-b577-475d-996e-bc8f9899eae0 - - - - -] Connection failed: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)
30.01.2021, 12:42, "Thomas Goirand" <zigo@debian.org>:
On 1/30/21 10:47 AM, Dmitriy Rabotyagov wrote:
In the meanwhile we see that most of the services fail to interact with rabbitmq over self-signed SSL in case RDO packages are used even with Python 3.6. We don't see this happening when installing things with pip packages though. Both rdo and pip version of eventlet we used was 0.30.0.
RDO started failing for us several days back with: ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)
Not sure, maybe it's not related directly to eventlet, but sounds like it might be.
Does RDO has version 5.0.3 of AMQP and version 5.0.2 of Kombu? That's what I had to do in Debian to pass this stage.
Though the next issue is what I wrote, when a service tries to validate a keystone token (ie: keystoneauth1 calls requests that calls urllib3, which in turns calls Python 3.9 SSL, and then crash with maximum recursion depth exceeded). I'm no 100% sure the problem is in Eventlet, but it really looks like it, as it's similar to another SSL crash we had in Python 3.7.
Cheers,
Thomas Goirand (zigo)
-- Kind Regards, Dmitriy Rabotyagov
-- Kind Regards, Dmitriy Rabotyagov
On 2/1/21 1:21 PM, Dmitriy Rabotyagov wrote:
Yes, I can confirm that amqp of version 5.0.3 and later does not accept self-signed certificates in case root ca has not been provided. It has been bumped to 5.0.5 in u-c recently which made things fail for us everywhere now.
However, in case of adding root CA into the system things continue working properly.
I'm making a bit of progress over here. I found out that downgrading to python3-dnspython 1.16.0 made swift-proxy (and probably others) back to working. What version of dnspython are you using in RDO? Cheers, Thomas Goirand (zigo)
On Tue, Feb 2, 2021 at 12:36 PM Thomas Goirand <zigo@debian.org> wrote:
On 2/1/21 1:21 PM, Dmitriy Rabotyagov wrote:
Yes, I can confirm that amqp of version 5.0.3 and later does not accept self-signed certificates in case root ca has not been provided. It has been bumped to 5.0.5 in u-c recently which made things fail for us everywhere now.
However, in case of adding root CA into the system things continue working properly.
I'm making a bit of progress over here.
I found out that downgrading to python3-dnspython 1.16.0 made swift-proxy (and probably others) back to working.
What version of dnspython are you using in RDO?
The current build bundles 1.16.0. Regards, Alfredo
Cheers,
Thomas Goirand (zigo)
On 2021-02-02 12:32:29 +0100 (+0100), Thomas Goirand wrote: [...]
I found out that downgrading to python3-dnspython 1.16.0 made swift-proxy (and probably others) back to working. [...]
If memory serves, dnspython and eventlet both monkey-patch the stdlib in potentially conflicting ways, and we've seen them interact badly in the past. -- Jeremy Stanley
On Tue, 2021-02-02 at 15:57 +0000, Jeremy Stanley wrote:
On 2021-02-02 12:32:29 +0100 (+0100), Thomas Goirand wrote: [...]
I found out that downgrading to python3-dnspython 1.16.0 made swift-proxy (and probably others) back to working. [...]
If memory serves, dnspython and eventlet both monkey-patch the stdlib in potentially conflicting ways, and we've seen them interact badly in the past. upstream eventlet force 1.16.0 to be used via there requirement files in responce to us filing upstream bug after 2.0.0 was released so its known that you cant use dnspython 2.0.0 with eventlest currently
part of the issue however is that was not comunicated to distos well so fedora for example in f33 ships eventlest and dnspyton 2.0.0 so they are technially incompatable but since they dont have the upper limit in the eventlet spec file they were not aware of that. eventlet have fixt some fo the incompatibilte in the last few months but not all of them
On 2/2/21 5:35 PM, Sean Mooney wrote:
On Tue, 2021-02-02 at 15:57 +0000, Jeremy Stanley wrote:
On 2021-02-02 12:32:29 +0100 (+0100), Thomas Goirand wrote: [...]
I found out that downgrading to python3-dnspython 1.16.0 made swift-proxy (and probably others) back to working. [...]
If memory serves, dnspython and eventlet both monkey-patch the stdlib in potentially conflicting ways, and we've seen them interact badly in the past. upstream eventlet force 1.16.0 to be used via there requirement files in responce to us filing upstream bug after 2.0.0 was released so its known that you cant use dnspython 2.0.0 with eventlest currently
Setting such a upper bound is just a timebomb, that should be desactivated as fast as possible.
eventlet have fixt some fo the incompatibilte in the last few months but not all of them
I wonder where / how dnspython is doing the monkey patching of the SSL library. Is everything located in query.py ? Cheers, Thomas Goirand (zigo)
On 2/2/21 5:35 PM, Sean Mooney wrote:
On Tue, 2021-02-02 at 15:57 +0000, Jeremy Stanley wrote:
On 2021-02-02 12:32:29 +0100 (+0100), Thomas Goirand wrote: [...]
I found out that downgrading to python3-dnspython 1.16.0 made swift-proxy (and probably others) back to working. [...]
If memory serves, dnspython and eventlet both monkey-patch the stdlib in potentially conflicting ways, and we've seen them interact badly in the past. upstream eventlet force 1.16.0 to be used via there requirement files in responce to us filing upstream bug after 2.0.0 was released so its known that you cant use dnspython 2.0.0 with eventlest currently
Setting such a upper bound is just a timebomb, that should be desactivated as fast as possible. yes it was done because dnspython broke backwards comaptiatbly and reved a new major version
On Tue, 2021-02-02 at 19:52 +0100, Thomas Goirand wrote: the eventlet mainatienr were not aware of it and it was capped to give them time to fix it. they have merges some patches to make it work but i think some of it need to be fixed in dnspython too. this is the bug https://github.com/eventlet/eventlet/issues/619 the pin was put in place in august https://github.com/eventlet/eventlet/issues/619#issuecomment-681480014 but the fix submitted to eventlet https://github.com/eventlet/eventlet/pull/639 didnt actully fully fix it https://github.com/eventlet/eventlet/issues/619#issuecomment-689903897 https://github.com/rthalley/dnspython/issues/559 was the dnspython bug but that seams to be closed. https://github.com/rthalley/dnspython/issues/557 and https://github.com/rthalley/dnspython/issues/558 are also closed but its still not actully fixed.
eventlet have fixt some fo the incompatibilte in the last few months but not all of them
I wonder where / how dnspython is doing the monkey patching of the SSL library. Is everything located in query.py ?
Cheers,
Thomas Goirand (zigo)
On 2/2/21 4:57 PM, Jeremy Stanley wrote:
On 2021-02-02 12:32:29 +0100 (+0100), Thomas Goirand wrote: [...]
I found out that downgrading to python3-dnspython 1.16.0 made swift-proxy (and probably others) back to working. [...]
If memory serves, dnspython and eventlet both monkey-patch the stdlib in potentially conflicting ways, and we've seen them interact badly in the past.
According to upstream dnspython author, no, dnspython does not monkey-patch the SSL std lib. However, Eventlet monkey-patches dnspython, in a way which is incompatible with version 2.0.0. See Bob's comment on the Eventlet issue: https://github.com/eventlet/eventlet/issues/619#issuecomment-660250478 Cheers, Thomas Goirand
participants (5)
-
Alfredo Moralejo Alonso
-
Dmitriy Rabotyagov
-
Jeremy Stanley
-
Sean Mooney
-
Thomas Goirand