[Octavia][Victoria] No service listening on port 9443 in the amphora instance

Luke Camilleri luke.camilleri at zylacomputing.com
Sun May 16 19:36:13 UTC 2021


HI Michael thanks as always for your input, below please find my replies:

1- I do not know why is this status being show as DOWN, the deployment 
for the entire cloud platform is a manual one (or bare metal 
installation as sometimes I have seen this being called). There must be 
a script somewhere that is checking the status of this port. Although I 
have no apparent issue from this I would like to see how I can get the 
status of this port in an ACTIVE state so any pointers would be helpful.

2- Agreed and that is in fact how it is setup, thanks.

3- Below please find commands run from amphora (192.168.1.11 is an 
ubuntu web server configured as a member server):

# export HAPROXY_SERVER_ADDR=192.168.1.11

# ip netns exec amphora-haproxy /var/lib/octavia/ping-wrapper.sh

# ip netns exec amphora-haproxy echo $?
0

# ip netns exec amphora-haproxy /usr/sbin/ping -q -n -w 1 -c 1 
$HAPROXY_SERVER_ADDR
PING 192.168.1.11 (192.168.1.11) 56(84) bytes of data.

--- 192.168.1.11 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.183/0.183/0.183/0.000 ms

And yet I still receive the below:

(openstack) loadbalancer healthmonitor list

| 34e1e2cf-826d-41b4-98b7-7866b680eea6 | hm3-ping | 
35a0fa65de1741619709485c5f6d989b | PING | True

(openstack) loadbalancer member list pool3
+--------------------------------------+----------+----------------------------------+---------------------+---------------+---------------+------------------+--------+
| id                                   | name     | 
project_id                       | provisioning_status | address       | 
protocol_port | operating_status | weight |
+--------------------------------------+----------+----------------------------------+---------------------+---------------+---------------+------------------+--------+
| 2ce664e2-c84c-4e71-a903-d5f650b0f0e7 | ubuntu-2 | 
35a0fa65de1741619709485c5f6d989b | ACTIVE              | 192.168.1.11  
|            80 | ERROR            |      1 |
| 376070b6-d290-455f-a718-aa957864e456 | ubuntu-1 | 
35a0fa65de1741619709485c5f6d989b | ACTIVE              | 192.168.1.235 
|            80 | ERROR            |      1 |
+--------------------------------------+----------+----------------------------------+---------------------+---------------+---------------+------------------+--------+
Ping is no rocket science and the amphora interface does not even need 
any routing to reach the member since they are on the same layer-2 
network. Below you can also see the MAC addresses of both members from 
the amphora:

# ip netns exec amphora-haproxy arp -n
Address                  HWtype  HWaddress           Flags 
Mask            Iface
192.168.1.235            ether   fa:16:3e:70:70:8e C                     
eth1
192.168.1.11             ether   fa:16:3e:cc:03:05 C                     
eth1

Deleting the health-monitor of type ping and adding an HTTP healthcheck 
on "/" works immediately (which clearly shows reachability to the member 
nodes). I agree with you 100% that a ping health-check in this day and 
age is something that one should not even consider but I want to make 
sure that I am not excluding a bigger issue here....

Thanks in advance

On 14/05/2021 20:12, Michael Johnson wrote:
> Hi Luke,
>
> 1. The "octavia-health-manager listent-port"(s) should be "ACTIVE" in
> neutron. Something may have gone wrong in the deployment tooling or in
> neutron for those ports.
> 2. As for the VIP ports on the amphora, the base port should be
> "ACTIVE", but the VIP port we use to store the VIP IP address should
> be "DOWN". The "ACTIVE" base ports will list the VIP IP as it's
> "allowed-address-pairs" port/ip.
> 3. On the issue with the health monitor of type PING, it's rarely used
> outside of some of our tests as it's a poor gauge of the health of an
> endpoint (https://docs.openstack.org/octavia/latest/user/guides/basic-cookbook.html#other-health-monitors).
> That said, it should be working. It's an external test in the amphora
> provider that runs "ping/ping6", so it's interesting that you can ping
> from the netns successfully.
>   Inside the amphora network namespace, can you try running the ping
> script directly?
> export HAPROXY_SERVER_ADDR=<IP of the member server>
> /var/lib/octavia/ping-wrapper.sh
> echo $?
> An answer of 0 means the ping was successful, 1 a failure.
>
> Michael
>
> On Fri, May 14, 2021 at 8:15 AM Luke Camilleri
> <luke.camilleri at zylacomputing.com> wrote:
>> Hi Michael, thanks as always for the below, I have watched the video and configured all the requirements as shown in those guides. Working great right now.
>>
>> I have noticed the following points and would like to know if you can give me some feedback please:
>>
>> In the Octavia project at the networks screen --> lb-mgmt-net --> Ports --> octavia-health-manager-listen-port (the IP bound to the health-manager service) has its status Down. It does not create any sort of issue but was wondering if this was normal behavior?
>> Similarly to the above point, in the tenant networks screen, every port that is "Attached Device - Octavia" has its status reported as "Down" ( these are the VIP addresses assigned to the amphora). Just need to confirm that this is normal behaviour
>> Creating a health monitor of type ping fails to get the operating status of the nodes and the nodes are in error (horizon) and the amphora reports that there are no backends and hence it is not working (I am using the same backend nodes with another loadbalancer but with an HTTP check and it is working fine. a security group is setup to allow ping from 0.0.0.0/0 and from the amphora-haproxy network namespace on the amphora instance I can ping both nodes without issues ). Below the amphora's haproxy.log
>>
>> May 14 15:00:50 amphora-9658d9ec-3bf1-407f-a134-86304899c015 haproxy[1984]: Server c0092bf4-d2a2-431f-8b7f-9dc3ace52933:e268db93-2d20-4395-bd6f-f6d835bce769/f04824b7-6fdf-46dc-bc83-b98b3b9f5be0 is DOWN, reason: Socket error, info: "Resource temporarily unavailable", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>>
>> May 14 15:00:50 amphora-9658d9ec-3bf1-407f-a134-86304899c015 haproxy[1984]: Server c0092bf4-d2a2-431f-8b7f-9dc3ace52933:e268db93-2d20-4395-bd6f-f6d835bce769/f04824b7-6fdf-46dc-bc83-b98b3b9f5be0 is DOWN, reason: Socket error, info: "Resource temporarily unavailable", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>>
>> May 14 15:00:50 amphora-9658d9ec-3bf1-407f-a134-86304899c015 haproxy[1984]: backend c0092bf4-d2a2-431f-8b7f-9dc3ace52933:e268db93-2d20-4395-bd6f-f6d835bce769 has no server available!
>>
>> May 14 15:00:50 amphora-9658d9ec-3bf1-407f-a134-86304899c015 haproxy[1984]: backend c0092bf4-d2a2-431f-8b7f-9dc3ace52933:e268db93-2d20-4395-bd6f-f6d835bce769 has no server available!
>>
>> Thanks in advance
>>
>> On 13/05/2021 18:33, Michael Johnson wrote:
>>
>> You are correct that two IPs are being allocated for the VIP, one is a
>> secondary IP which neutron implements as an "allowed address pairs"
>> port. We do this to allow failover of the amphora instance should nova
>> fail the service VM. We hold the VIP IP in a special port so the IP is
>> not lost while we rebuild the service VM.
>> If you are using the active/standby topology (or an Octavia flavor
>> with active/standby enabled), this failover is accelerated with nearly
>> no visible impact to the flows through the load balancer.
>> Active/Standby has been an Octavia feature since the Mitaka release. I
>> gave a demo of it at the Tokoyo summit here:
>> https://youtu.be/8n7FGhtOiXk?t=1420
>>
>> You can enable active/standby as the default by setting the
>> "loadbalancer_topology" setting in the configuration file
>> (https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.loadbalancer_topology)
>> or by creating an Octavia flavor that creates the load balancer with
>> an active/standby topology
>> (https://docs.openstack.org/octavia/latest/admin/flavors.html).
>>
>> Michael
>>
>> On Thu, May 13, 2021 at 4:23 AM Luke Camilleri
>> <luke.camilleri at zylacomputing.com> wrote:
>>
>> HI Michael, thanks a lot for the below information it is very helpful. I
>> ended up setting the o-hm0 interface statically in the
>> octavia-interface.sh script which is called by the service and also
>> added a delay to make sure that the bridges are up before trying to
>> create a veth pair and connect the endpoints.
>>
>> Also I edited the unit section of the health-manager service and at the
>> after option I added octavia-interface.service or else on startup the
>> health manager will not bind to the lb-mgmt-net since it would not be up yet
>>
>> The floating IPs part was a bit tricky until I understood what was
>> really going on with the VIP concept and how better and more flexible it
>> is to set the VIP on the tenant network and then associate with public
>> ip to the VIP.
>>
>> With this being said I noticed that 2 IPs are being assigned to the
>> amphora instance and that the actual port assigned to the instance has
>> an allowed pair with the VIP port. I checked online and it seems that
>> there is an active/standby project going on with VRRP/keepalived and in
>> fact the keepalived daemon is running in the amphora instance.
>>
>> Am I on the right track with the active/standby feature and if so do you
>> have any installation/project links to share please so that I can test it?
>>
>> Regards
>>
>> On 12/05/2021 08:37, Michael Johnson wrote:
>>
>> Answers inline below.
>>
>> Michael
>>
>> On Mon, May 10, 2021 at 5:15 PM Luke Camilleri
>> <luke.camilleri at zylacomputing.com> wrote:
>>
>> Hi Michael and thanks a lot for the detailed answer below.
>>
>> I believe I have got most of this sorted out apart from some small issues below:
>>
>> If the o-hm0 interface gets the IP information from the DHCP server setup by neutron for the lb-mgmt-net, then the management node will always have 2 default gateways and this will bring along issues, the same DHCP settings when deployed to the amphora do not have the same issue since the amphora only has 1 IP assigned on the lb-mgmt-net. Can you please confirm this?
>>
>> The amphorae do not have issues with DHCP and gateways as we control
>> the DHCP client configuration inside the amphora. It does only have
>> one IP on the lb-mgmt-net, it will honor gateways provided by neutron
>> for the lb-mgmt-net traffic, but a gateway is not required on the
>> lb-mgmt-network unless you are routing the lb-mgmt-net traffic across
>> subnets.
>>
>> How does the amphora know where to locate the worker and housekeeping processes or does the traffic originate from the services instead? Maybe the addresses are "injected" from the config file?
>>
>> The worker and housekeeping processes only create connections to the
>> amphora, they do not receive connections from them. The amphora send a
>> heartbeat packet to the health manager endpoints every ten seconds by
>> default. The list of valid health manager endpoints is included in the
>> amphora agent configuration file that is injected into the service VM
>> at boot time. It can be updated using the Octavia admin API for
>> refreshing the amphora agent configuration.
>>
>> Can you please confirm if the same floating IP concept runs from public (external) IP to the private (tenant) and from private to lb-mgmt-net please?
>>
>> Octavia does not use floating IPs. Users can create and assign
>> floating IPs via neutron if they would like, but they are not
>> necessary. Octavia VIPs can be created directly on neutron "external"
>> networks, avoiding the NAT overhead of floating IPs.
>> There is no practical reason to assign a floating IP to a port on the
>> lb-mgmt-net as tenant traffic is never on or accessible from that
>> network.
>>
>> Thanks in advance for any feedback
>>
>> On 06/05/2021 22:46, Michael Johnson wrote:
>>
>> Hi Luke,
>>
>> 1. I agree that DHCP is technically unnecessary for the o-hm0
>> interface if you can manage your address allocation on the network you
>> are using for the lb-mgmt-net.
>>    I don't have detailed information about the Ubuntu install
>> instructions, but I suspect it was done to simplify the IPAM to be
>> managed by whatever is providing DHCP on the lb-mgmt-net provided (be
>> it neutron or some other resource on a provider network).
>> The lb-mgmt-net is simply a neutron network that the amphora
>> management address is on. It is routable and does not require external
>> access. The only tricky part to it is the worker, health manager, and
>> housekeeping processes need to be reachable from the amphora, and the
>> controllers need to reach the amphora over the network(s). There are
>> many ways to accomplish this.
>>
>> 2. See my above answer. Fundamentally the lb-mgmt-net is just a
>> neutron network that nova can use to attach an interface to the
>> amphora instances for command and control traffic. As long as the
>> controllers can reach TCP 9433 on the amphora, and the amphora can
>> send UDP 5555 back to the health manager endpoints, it will work fine.
>>
>> 3. Octavia, with the amphora driver, does not require any special
>> configuration in Neutron (beyond the advanced services RBAC policy
>> being available for the neutron service account used in your octavia
>> configuration file). The neutron_lbaas.conf and services_lbaas.conf
>> are legacy configuration files/settings that were used for
>> neutron-lbaas which is now end of life. See the wiki page for
>> information on the deprecation of neutron-lbaas:
>> https://wiki.openstack.org/wiki/Neutron/LBaaS/Deprecation.
>>
>> Michael
>>
>> On Thu, May 6, 2021 at 12:30 PM Luke Camilleri
>> <luke.camilleri at zylacomputing.com> wrote:
>>
>> Hi Michael and thanks a lot for your help on this, after following your
>> steps the agent got deployed successfully in the amphora-image.
>>
>> I have some other queries that I would like to ask mainly related to the
>> health-manager/load-balancer network setup and IP assignment. First of
>> all let me point out that I am using a manual installation process, and
>> it might help others to understand the underlying infrastructure
>> required to make this component work as expected.
>>
>> 1- The installation procedure contains this step:
>>
>> $ sudo cp octavia/etc/dhcp/dhclient.conf /etc/dhcp/octavia
>>
>> which is later on called to assign the IP to the o-hm0 interface which
>> is connected to the lb-management network as shown below:
>>
>> $ sudo dhclient -v o-hm0 -cf /etc/dhcp/octavia
>>
>> Apart from having a dhcp config for a single IP seems a bit of an
>> overkill, using these steps is injecting an additional routing table
>> into the default namespace as shown below in my case:
>>
>> # route -n
>> Kernel IP routing table
>> Destination     Gateway         Genmask         Flags Metric Ref    Use
>> Iface
>> 0.0.0.0         172.16.0.1      0.0.0.0         UG    0 0        0 o-hm0
>> 0.0.0.0         10.X.X.1        0.0.0.0         UG    100 0        0 ensX
>> 10.X.X.0        0.0.0.0         255.255.255.0   U     100 0        0 ensX
>> 169.254.169.254 172.16.0.100    255.255.255.255 UGH   0 0        0 o-hm0
>> 172.16.0.0      0.0.0.0         255.240.0.0     U     0 0        0 o-hm0
>>
>> Since the load-balancer management network does not need any external
>> connectivity (but only communication between health-manager service and
>> amphora-agent), why is a gateway required and why isn't the IP address
>> allocated as part of the interface creation script which is called when
>> the service is started or stopped (example below)?
>>
>> ---
>>
>> #!/bin/bash
>>
>> set -ex
>>
>> MAC=$MGMT_PORT_MAC
>> BRNAME=$BRNAME
>>
>> if [ "$1" == "start" ]; then
>>      ip link add o-hm0 type veth peer name o-bhm0
>>      brctl addif $BRNAME o-bhm0
>>      ip link set o-bhm0 up
>>      ip link set dev o-hm0 address $MAC
>>     *** ip addr add 172.16.0.2/12 dev o-hm0
>>     ***ip link set o-hm0 mtu 1500
>>      ip link set o-hm0 up
>>      iptables -I INPUT -i o-hm0 -p udp --dport 5555 -j ACCEPT
>> elif [ "$1" == "stop" ]; then
>>      ip link del o-hm0
>> else
>>      brctl show $BRNAME
>>      ip a s dev o-hm0
>> fi
>>
>> ---
>>
>> 2- Is there a possibility to specify a fixed vlan outside of tenant
>> range for the load balancer management network?
>>
>> 3- Are the configuration changes required only in neutron.conf or also
>> in additional config files like neutron_lbaas.conf and
>> services_lbaas.conf, similar to the vpnaas configuration?
>>
>> Thanks in advance for any assistance, but its like putting together a
>> puzzle of information :-)
>>
>> On 05/05/2021 20:25, Michael Johnson wrote:
>>
>> Hi Luke.
>>
>> Yes, the amphora-agent will listen on 9443 in the amphorae instances.
>> It uses TLS mutual authentication, so you can get a TLS response, but
>> it will not let you into the API without a valid certificate. A simple
>> "openssl s_client" is usually enough to prove that it is listening and
>> requesting the client certificate.
>>
>> I can't talk to the "openstack-octavia-diskimage-create" package you
>> found in centos, but I can discuss how to build an amphora image using
>> the OpenStack tools.
>>
>> If you get Octavia from git or via a release tarball, we provide a
>> script to build the amphora image. This is how we build our images for
>> the testing gates, etc. and is the recommended way (at least from the
>> OpenStack Octavia community) to create amphora images.
>>
>> https://opendev.org/openstack/octavia/src/branch/master/diskimage-create
>>
>> For CentOS 8, the command would be:
>>
>> diskimage-create.sh -g stable/victoria -i centos-minimal -d 8 -s 3 (3
>> is the minimum disk size for centos images, you may want more if you
>> are not offloading logs)
>>
>> I just did a run on a fresh centos 8 instance:
>> git clone https://opendev.org/openstack/octavia
>> python3 -m venv dib
>> source dib/bin/activate
>> pip3 install diskimage-builder PyYAML six
>> sudo dnf install yum-utils
>> ./diskimage-create.sh -g stable/victoria -i centos-minimal -d 8 -s 3
>>
>> This built an image.
>>
>> Off and on we have had issues building CentOS images due to issues in
>> the tools we rely on. If you run into issues with this image, drop us
>> a note back.
>>
>> Michael
>>
>> On Wed, May 5, 2021 at 9:37 AM Luke Camilleri
>> <luke.camilleri at zylacomputing.com> wrote:
>>
>> Hi there, i am trying to get Octavia running on a Victoria deployment on
>> CentOS 8. It was a bit rough getting to the point to launch an instance
>> mainly due to the load-balancer management network and the lack of
>> documentation
>> (https://docs.openstack.org/octavia/victoria/install/install.html) to
>> deploy this oN CentOS. I will try to fix this once I have my deployment
>> up and running to help others on the way installing and configuring this :-)
>>
>> At this point a LB can be launched by the tenant and the instance is
>> spawned in the Octavia project and I can ping and SSH into the amphora
>> instance from the Octavia node where the octavia-health-manager service
>> is running using the IP within the same subnet of the amphoras
>> (172.16.0.0/12).
>>
>> Unfortunately I keep on getting these errors in the log file of the
>> worker log (/var/log/octavia/worker.log):
>>
>> 2021-05-05 01:54:49.368 14521 WARNING
>> octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect
>> to instance. Retrying.: requests.exceptions.ConnectionError:
>> HTTPSConnectionPool(host='172.16.4.46', p
>> ort=9443): Max retries exceeded with url: // (Caused by
>> NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object
>> at 0x7f83e0181550>: Failed to establish a new connection: [Errno 111]
>> Connection ref
>> used',))
>>
>> 2021-05-05 01:54:54.374 14521 ERROR
>> octavia.amphorae.drivers.haproxy.rest_api_driver [-] Connection retries
>> (currently set to 120) exhausted.  The amphora is unavailable. Reason:
>> HTTPSConnectionPool(host='172.16
>> .4.46', port=9443): Max retries exceeded with url: // (Caused by
>> NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object
>> at 0x7f83e0181550>: Failed to establish a new connection: [Errno 111] Conne
>> ction refused',))
>>
>> 2021-05-05 01:54:54.374 14521 ERROR
>> octavia.controller.worker.v1.tasks.amphora_driver_tasks [-] Amphora
>> compute instance failed to become reachable. This either means the
>> compute driver failed to fully boot the
>> instance inside the timeout interval or the instance is not reachable
>> via the lb-mgmt-net.:
>> octavia.amphorae.driver_exceptions.exceptions.TimeOutException:
>> contacting the amphora timed out
>>
>> obviously the instance is deleted then and the task fails from the
>> tenant's perspective.
>>
>> The main issue here is that there is no service running on port 9443 on
>> the amphora instance. I am assuming that this is in fact the
>> amphora-agent service that is running on the instance which should be
>> listening on this port 9443 but the service does not seem to be up or
>> not installed at all.
>>
>> To create the image I have installed the CentOS package
>> "openstack-octavia-diskimage-create" which provides the utility
>> disk-image-create but from what I can conclude the amphora-agent is not
>> being installed (thought this was done automatically by default :-( )
>>
>> Can anyone let me know if the amphora-agent is what gets queried on port
>> 9443 ?
>>
>> If the agent is not installed/injected by default when building the
>> amphora image?
>>
>> The command to inject the amphora-agent into the amphora image when
>> using the disk-image-create command?
>>
>> Thanks in advance for any assistance
>>
>>



More information about the openstack-discuss mailing list