[Openstack-operators] [openstack-dev] [Octavia] [Kolla] SSL errors polling amphorae and missing tenant network interface
Michael Johnson
johnsomor at gmail.com
Fri Oct 26 00:34:02 UTC 2018
FYI, I took some time out this afternoon and wrote a detailed
certificate configuration guide. Hopefully this will help.
https://review.openstack.org/613454
Reviews would be welcome!
Michael
On Thu, Oct 25, 2018 at 7:00 AM Tobias Urdin <tobias.urdin at binero.se> wrote:
>
> Might as well throw it out here.
>
> After a lot of troubleshooting we were able to narrow our issue down to
> our test environment running qemu virtualization, we moved our compute
> node to hardware and
> used kvm full virtualization instead.
>
> We could properly reproduce the issue where generating a CSR from a
> private key and then trying to verify the CSR would fail complaining about
> "Signature did not match the certificate request"
>
> We suspect qemu floating point emulation caused this, the same OpenSSL
> function that validates a CSR is the one used when validating the SSL
> handshake which caused our issue.
> After going through the whole stack, we have Octavia working flawlessly
> without any issues at all.
>
> Best regards
> Tobias
>
> On 10/23/2018 04:31 PM, Tobias Urdin wrote:
> > Hello Erik,
> >
> > Could you specify the DNs you used for all certificates just so that I
> > can rule it out on my side.
> > You can redact anything sensitive with some to just get the feel on how
> > it's configured.
> >
> > Best regards
> > Tobias
> >
> > On 10/22/2018 04:47 PM, Erik McCormick wrote:
> >> On Mon, Oct 22, 2018 at 4:23 AM Tobias Urdin <tobias.urdin at binero.se> wrote:
> >>> Hello,
> >>>
> >>> I've been having a lot of issues with SSL certificates myself, on my
> >>> second trip now trying to get it working.
> >>>
> >>> Before I spent a lot of time walking through every line in the DevStack
> >>> plugin and fixing my config options, used the generate
> >>> script [1] and still it didn't work.
> >>>
> >>> When I got the "invalid padding" issue it was because of the DN I used
> >>> for the CA and the certificate IIRC.
> >>>
> >>> > 19:34 < tobias-urdin> 2018-09-10 19:43:15.312 15032 WARNING
> >>> octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect
> >>> to instance. Retrying.: SSLError: ("bad handshake: Error([('rsa
> >>> routines', 'RSA_padding_check_PKCS1_type_1', 'block type is not 01'),
> >>> ('rsa routines', 'RSA_EAY_PUBLIC_DECRYPT', 'padding check failed'),
> >>> ('SSL routines', 'ssl3_get_key_exchange', 'bad signature')],)",)
> >>> > 19:47 < tobias-urdin> after a quick google "The problem was that my
> >>> CA DN was the same as the certificate DN."
> >>>
> >>> IIRC I think that solved it, but then again I wouldn't remember fully
> >>> since I've been at so many different angles by now.
> >>>
> >>> Here is my IRC logs history from the #openstack-lbaas channel, perhaps
> >>> it can help you out
> >>> http://paste.openstack.org/show/732575/
> >>>
> >> Tobias, I owe you a beer. This was precisely the issue. I'm deploying
> >> Octavia with kolla-ansible. It only deploys a single CA. After hacking
> >> the templates and playbook to incorporate a separate server CA, the
> >> amphorae now load and provision the required namespace. I'm adding a
> >> kolla tag to the subject of this in hopes that someone might want to
> >> take on changing this behavior in the project. Hopefully after I get
> >> through Upstream Institute in Berlin I'll be able to do it myself if
> >> nobody else wants to do it.
> >>
> >> For certificate generation, I extracted the contents of
> >> octavia_certs_install.yml (which sets up the directory structure,
> >> openssl.cnf, and the client CA), and octavia_certs.yml (which creates
> >> the server CA and the client certificate) and mashed them into a
> >> separate playbook just for this purpose. At the end I get:
> >>
> >> ca_01.pem - Client CA Certificate
> >> ca_01.key - Client CA Key
> >> ca_server_01.pem - Server CA Certificate
> >> cakey.pem - Server CA Key
> >> client.pem - Concatenated Client Key and Certificate
> >>
> >> If it would help to have the playbook, I can stick it up on github
> >> with a huge "This is a hack" disclaimer on it.
> >>
> >>> -----
> >>>
> >>> Sorry for hijacking the thread but I'm stuck as well.
> >>>
> >>> I've in the past tried to generate the certificates with [1] but now
> >>> moved on to using the openstack-ansible way of generating them [2]
> >>> with some modifications.
> >>>
> >>> Right now I'm just getting: Could not connect to instance. Retrying.:
> >>> SSLError: [SSL: BAD_SIGNATURE] bad signature (_ssl.c:579)
> >>> from the amphoras, haven't got any further but I've eliminated a lot of
> >>> stuck in the middle.
> >>>
> >>> Tried deploying Ocatavia on Ubuntu with python3 to just make sure there
> >>> wasn't an issue with CentOS and OpenSSL versions since it tends to lag
> >>> behind.
> >>> Checking the amphora with openssl s_client [3] it gives the same one,
> >>> but the verification is successful just that I don't understand what the
> >>> bad signature
> >>> part is about, from browsing some OpenSSL code it seems to be related to
> >>> RSA signatures somehow.
> >>>
> >>> 140038729774992:error:1408D07B:SSL routines:ssl3_get_key_exchange:bad
> >>> signature:s3_clnt.c:2032:
> >>>
> >>> So I've basicly ruled out Ubuntu (openssl-1.1.0g) and CentOS
> >>> (openssl-1.0.2k) being the problem, ruled out signing_digest, so I'm
> >>> back to something related
> >>> to the certificates or the communication between the endpoints, or what
> >>> actually responds inside the amphora (gunicorn IIUC?). Based on the
> >>> "verify" functions actually causing that bad signature error I would
> >>> assume it's the generated certificate that the amphora presents that is
> >>> causing it.
> >>>
> >>> I'll have to continue the troubleshooting to the inside of the amphora,
> >>> I've used the test-only amphora image before but have now built my own
> >>> one that is
> >>> using the amphora-agent from the actual stable branch, but same issue
> >>> (bad signature).
> >>>
> >>> For verbosity this is the config options set for the certificates in
> >>> octavia.conf and which file it was copied from [4], same here, a
> >>> replication of what openstack-ansible does.
> >>>
> >>> Appreciate any feedback or help :)
> >>>
> >>> Best regards
> >>> Tobias
> >>>
> >>> [1]
> >>> https://github.com/openstack/octavia/blob/master/bin/create_certificates.sh
> >>> [2] http://paste.openstack.org/show/732483/
> >>> [3] http://paste.openstack.org/show/732486/
> >>> [4] http://paste.openstack.org/show/732487/
> >>>
> >>> On 10/20/2018 01:53 AM, Michael Johnson wrote:
> >>>> Hi Erik,
> >>>>
> >>>> Sorry to hear you are still having certificate issues.
> >>>>
> >>>> Issue #2 is probably caused by issue #1. Since we hot-plug the tenant
> >>>> network for the VIP, one of the first steps after the worker connects
> >>>> to the amphora agent is finishing the required configuration of the
> >>>> VIP interface inside the network namespace on the amphroa.
> >>>>
> >> Thanks for the hint on the workflow of this. I hadn't gotten deep
> >> enough into the code to find that yet, but I suspected it was blocking
> >> since the namespace never got created either. Thanks
> >>
> >>>> If I remember correctly, you are attempting to configure Octavia with
> >>>> the dual CA option (which is good for non-development use).
> >>>>
> >>>> This is what I have for notes:
> >>>>
> >>>> [certificates] gets the following:
> >>>> cert_generator = local_cert_generator
> >>>> ca_certificate = server CA's "server.pem" file
> >>>> ca_private_key = server CA's "server.key" file
> >>>> ca_private_key_passphrase = pass phrase for ca_private_key
> >>>> [controller_worker]
> >>>> client_ca = Client CA's ca_cert file
> >>>> [haproxy_amphora]
> >>>> client_cert = Client CA's client.pem file (I think with it's key
> >>>> concatenated is what rm_work said the other day)
> >>>> server_ca = Server CA's ca_cert file
> >>>>
> >> This is all very helpful. It's a bit difficult to know what goes where
> >> the way the documentation is written presently. For something that's
> >> going to be the defacto standard for loadbalancing, we as a community
> >> need to do a better job of documenting how to set up, configure, and
> >> manage this in production. I'm trying to capture my lessons learned
> >> and processes as I go to help with that if I can.
> >>
> >> -Erik
> >>
> >>>> That said, I can probably run through this and write something up next
> >>>> week that is more step-by-step/detailed.
> >>>>
> >>>> Michael
> >>>>
> >>>> On Fri, Oct 19, 2018 at 2:31 PM Erik McCormick
> >>>> <emccormick at cirrusseven.com> wrote:
> >>>>> Apologies for cross-posting, but in the event that these might be
> >>>>> worth filing as bugs, I wanted the Octavia devs to see it as well...
> >>>>>
> >>>>> I've been wrestling with getting Octavia up and running and have
> >>>>> become stuck on two issues. I'm hoping someone has run into these
> >>>>> before. My google foo has come up empty.
> >>>>>
> >>>>> Issue 1:
> >>>>> When the Octavia controller tries to poll the amphora instance, it
> >>>>> tries repeatedly and eventually fails. The error on the controller
> >>>>> side is:
> >>>>>
> >>>>> 2018-10-19 14:17:39.181 26 ERROR
> >>>>> octavia.amphorae.drivers.haproxy.rest_api_driver [-] Connection
> >>>>> retries (currently set to 300) exhausted. The amphora is unavailable.
> >>>>> Reason: HTTPSConnectionPool(host='10.7.0.112', port=9443): Max retries
> >>>>> exceeded with url: /0.5/plug/vip/10.250.20.15 (Caused by
> >>>>> SSLError(SSLError("bad handshake: Error([('rsa routines',
> >>>>> 'RSA_padding_check_PKCS1_type_1', 'invalid padding'), ('rsa routines',
> >>>>> 'rsa_ossl_public_decrypt', 'padding check failed'), ('asn1 encoding
> >>>>> routines', 'ASN1_item_verify', 'EVP lib'), ('SSL routines',
> >>>>> 'tls_process_server_certificate', 'certificate verify
> >>>>> failed')],)",),)): SSLError: HTTPSConnectionPool(host='10.7.0.112',
> >>>>> port=9443): Max retries exceeded with url: /0.5/plug/vip/10.250.20.15
> >>>>> (Caused by SSLError(SSLError("bad handshake: Error([('rsa routines',
> >>>>> 'RSA_padding_check_PKCS1_type_1', 'invalid padding'), ('rsa routines',
> >>>>> 'rsa_ossl_public_decrypt', 'padding check failed'), ('asn1 encoding
> >>>>> routines', 'ASN1_item_verify', 'EVP lib'), ('SSL routines',
> >>>>> 'tls_process_server_certificate', 'certificate verify
> >>>>> failed')],)",),))
> >>>>>
> >>>>> On the amphora side I see:
> >>>>> [2018-10-19 17:52:54 +0000] [1331] [DEBUG] Error processing SSL request.
> >>>>> [2018-10-19 17:52:54 +0000] [1331] [DEBUG] Invalid request from
> >>>>> ip=::ffff:10.7.0.40: [SSL: SSL_HANDSHAKE_FAILURE] ssl handshake
> >>>>> failure (_ssl.c:1754)
> >>>>>
> >>>>> I've generated certificates both with the script in the Octavia git
> >>>>> repo, and with the Openstack Ansible playbook. I can see that they are
> >>>>> present in /etc/octavia/certs.
> >>>>>
> >>>>> I'm using the Kolla (Queens) containers for the control plane so I'm
> >>>>> sure I've satisfied all the python library constraints.
> >>>>>
> >>>>> Issue 2:
> >>>>> I"m not sure how it gets configured, but the tenant network interface
> >>>>> (ens6) never comes up. I can spawn other instances on that network
> >>>>> with no issue, and I can see that Neutron has the port attached to the
> >>>>> instance. However, in the instance this is all I get:
> >>>>>
> >>>>> ubuntu at amphora-33e0aab3-8bc4-4fcb-bc42-b9b36afb16d4:~$ ip a
> >>>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
> >>>>> group default qlen 1
> >>>>> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >>>>> inet 127.0.0.1/8 scope host lo
> >>>>> valid_lft forever preferred_lft forever
> >>>>> inet6 ::1/128 scope host
> >>>>> valid_lft forever preferred_lft forever
> >>>>> 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast
> >>>>> state UP group default qlen 1000
> >>>>> link/ether fa:16:3e:30:c4:60 brd ff:ff:ff:ff:ff:ff
> >>>>> inet 10.7.0.112/16 brd 10.7.255.255 scope global ens3
> >>>>> valid_lft forever preferred_lft forever
> >>>>> inet6 fe80::f816:3eff:fe30:c460/64 scope link
> >>>>> valid_lft forever preferred_lft forever
> >>>>> 3: ens6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
> >>>>> default qlen 1000
> >>>>> link/ether fa:16:3e:89:a2:7f brd ff:ff:ff:ff:ff:ff
> >>>>>
> >>>>> There's no evidence of the interface anywhere else including udev rules.
> >>>>>
> >>>>> Any help with either or both issues would be greatly appreciated.
> >>>>>
> >>>>> Cheers,
> >>>>> Erik
> >>>>>
> >>>>> __________________________________________________________________________
> >>>>> OpenStack Development Mailing List (not for usage questions)
> >>>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>> __________________________________________________________________________
> >>>> OpenStack Development Mailing List (not for usage questions)
> >>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>>
> >>> __________________________________________________________________________
> >>> OpenStack Development Mailing List (not for usage questions)
> >>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> __________________________________________________________________________
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-operators
mailing list