[Openstack] Instances lost connectivity with metadata service.

Itxaka Serrano Garcia igarcia at suse.com
Mon Feb 26 12:44:37 UTC 2018


Hi!


On 26/02/18 12:53, Jorge Luiz Correa wrote:
> I would like some help to identify (and correct) a problem with 
> instances metadata during booting. My environment is a Mitaka 
> instalation, under Ubuntu 16.04 LTS, with 1 controller, 1 network node 
> and 5 compute nodes. I'm using classic OVS as network setup.
>
> The problem ocurs after some period of time in some projects (not all 
> projects at same time). When booting a Ubuntu Cloud Image with 
> cloud-init, instances lost conection with API metadata and doesn't get 
> their information like key-pairs and cloud-init scripts.
>
> [  118.924311] cloud-init[932]: 2018-02-23 18:27:05,003 - 
> url_helper.py[WARNING]: Calling 
> 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed 
> [101/120s]: request error [HTTPConnectionPool(host='169.254.169.254', 
> port=80): Max retries exceeded with url: 
> /2009-04-04/meta-data/instance-id (Caused by 
> ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection 
> object at 0x7faabcd6fa58>, 'Connection to 169.254.169.254 timed out. 
> (connect timeout=50.0)'))]
> [  136.959361] cloud-init[932]: 2018-02-23 18:27:23,038 - 
> url_helper.py[WARNING]: Calling 
> 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed 
> [119/120s]: request error [HTTPConnectionPool(host='169.254.169.254', 
> port=80): Max retries exceeded with url: 
> /2009-04-04/meta-data/instance-id (Caused by 
> ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection 
> object at 0x7faabcd7f240>, 'Connection to 169.254.169.254 timed out. 
> (connect timeout=17.0)'))]
> [  137.967469] cloud-init[932]: 2018-02-23 18:27:24,040 - 
> DataSourceEc2.py[CRITICAL]: Giving up on md from 
> ['http://169.254.169.254/2009-04-04/meta-data/instance-id'] after 120 
> seconds
> [  137.972226] cloud-init[932]: 2018-02-23 18:27:24,048 - 
> url_helper.py[WARNING]: Calling 
> 'http://192.168.0.7/latest/meta-data/instance-id' failed [0/120s]: 
> request error [HTTPConnectionPool(host='192.168.0.7', port=80): Max 
> retries exceeded with url: /latest/meta-data/instance-id (Caused by 
> NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection 
> object at 0x7faabcd7fc18>: Failed to establish a new connection: 
> [Errno 111] Connection refused',))]
> [  138.974223] cloud-init[932]: 2018-02-23 18:27:25,053 - 
> url_helper.py[WARNING]: Calling 
> 'http://192.168.0.7/latest/meta-data/instance-id' failed [1/120s]: 
> request error [HTTPConnectionPool(host='192.168.0.7', port=80): Max 
> retries exceeded with url: /latest/meta-data/instance-id (Caused by 
> NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection 
> object at 0x7faabcd7fa58>: Failed to establish a new connection: 
> [Errno 111] Connection refused',))]
>
> After give up 169.254.169.254 it tries 192.168.0.7 that is the dhcp 
> address for the project.
>
> I've checked that neutron-l3-agent is running, without errors. On 
> compute node where VM is running, agents and vswitch is running. I 
> could check the namespace of a problematic project and saw an iptables 
> rules redirecting traffic from 169.254.169.254:80 
> <http://169.254.169.254:80> to 0.0.0.0:9697 <http://0.0.0.0:9697>, and 
> there is a process neutron-ns-medata_proxy_ID  that opens that port. 
> So, it look like the metadata-proxy is running fine. But, as we can 
> see in logs there is a timeout.
>

Did you check if port 80 is listening inside the dhcp namespace with "ip 
netns exec NAMESPACE netstat -punta" ?

We recently hit something similar in which the ns-proxy was up and the 
metadata-agent as well but the port 80 was missing inside the namespace, 
a restart fixed it but there was no logs of a failure anywhere so it may 
be similar.

> If I restart all services on network node sometimes solves the 
> problem. In some cases I have to restart services on controller node 
> (nova-api). So, all work fine for some time and start to have problems 
> again.
>
> Where can I investigate to try finding the cause of the problem?
>
> I appreciate any help. Thank you!
>
> - JLC
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20180226/0205e146/attachment.html>


More information about the Openstack mailing list