[Openstack-operators] [OCTAVIA][QUEENS][KOLLA] - Amphora to Health-manager invalid UDP heartbeat.

Michael Johnson johnsomor at gmail.com
Tue Oct 23 17:09:13 UTC 2018


Are the controller and the amphora using the same version of Octavia?

We had a python3 issue where we had to change the HMAC digest used. If
you controller is running an older version of Octavia than your
amphora images, it may not have the compatibility code to support the
new format.  The compatibility code is here:
https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/health_daemon/status_message.py#L56

There is also a release note about the issue here:
https://docs.openstack.org/releasenotes/octavia/rocky.html#upgrade-notes

If that is not the issue, I would double check the heartbeat_key in
the health manager configuration files and inside one of the amphora.

Note, that this key is only used for health heartbeats and stats, it
is not used for the controller to amphora communication on port 9443.

Also, load balancers cannot get "stuck" in PENDING_* states unless
someone has killed the controller process that was actively working on
that load balancer. By killed I mean a non-graceful shutdown of the
process that was in the middle of working on the load balancer.
Otherwise all code paths lead back to ACTIVE or ERROR status after it
finishes the work or gives up retrying the requested action. Check
your controller logs to make sure this load balancer is not still
being worked on by one of the controllers. The default retry timeouts
(some are up to 25 minutes) are very long (it will keep trying to
accomplish the request) to accommodate very slow (virtual box) hosts
and the test gates. You will want to tune those down for a production
deployment.

Michael

On Tue, Oct 23, 2018 at 7:09 AM Gaƫl THEROND <gael.therond at gmail.com> wrote:
>
> Hi guys,
>
> I'm finishing to work on my POC for Octavia and after solving few issues with my configuration I'm close to get a properly working setup.
> However, I'm facing a small but yet annoying bug with the health-manager receiving amphora heartbeat UDP packet which it consider as not correct and so drop it.
>
> Here are the messages that can be found in logs:
>
> 2018-10-23 13:53:21.844 25 WARNING octavia.amphorae.backends.health_daemon.status_message [-] calculated hmac: faf73e41a0f843b826ee581c3995b7f7e56b5e5a294fca0b84eda426766f8415 not equal to msg hmac: 6137613337316432636365393832376431343337306537353066626130653261 dropping packet
>
> Which come from this part of the HM Code:
>
> https://docs.openstack.org/octavia/pike/_modules/octavia/amphorae/backends/health_daemon/status_message.html#get_payload
>
> The annoying thing is that I don't get why the UDP packet is considered as stale and how can I try to reproduce the payload which is send to the HealthManager.
> I'm willing to write a simple PY program to simulate the heartbeat payload but I don't now what's exactly the message and I think I miss some informations.
>
> Both HealthManager and the Amphora do use the same heartbeat_key and both can contact on the network as the initial Health-manager to Amphora 9443 connection is validated.
>
> As an effect to this situation, my loadbalancer is stuck in PENDING_UPDATE mode.
>
> Do you have any idea on how can I handle such thing or if it's something already seen out there for anyone else?
>
> Kind regards,
> G.
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



More information about the OpenStack-operators mailing list