Hi all,
It looks like the issue was actually with the Ubuntu image for both 16.04 and 18.04. We changed the dhcp timeout in "/etc/dhcp/dhclient.conf" from 300 seconds down to 2 seconds and the instances then worked absolutely fine. Not sure why it was only happening for some tenants and not others but that has resolved it.
I am still going to look into the metadata service as the fix doesn't feel right to me still.
Thanks again for all your help.
Grant
Maybe the following could provide a bit more data :
- Launch a test instance in the tenant project experiencing the issue.
- tcpdump directly on the instance TAP interface - confirm if you are seeing DHCP DISCOVER/REQUEST/OFFER
- Would also allow you to see the Cloudinit traffic.
On Wed, Dec 4, 2019 at 7:06 PM Grant Morley <grant@civo.com> wrote:
Hi Cory,
Thanks for the response. I'll take a look at the metadata service from the instance and from OpenStack itself tomorrow now. It's midnight here in the UK and I need to get some rest. Thanks for the tip, hopefully I'll find something useful to go on from there.
Grant,
On 04/12/2019 23:49, Cory Hawkless wrote:
Are they failing to contact the metadata service and hanging during the boot process while they try and receive metadata?
From the VM can you hit http://169.254.169.254 – That’s the default IP of the metadata server, it should respond with a basic page showing some date based subdirectories
If it doesn’t respond you can start following the metadata service path instead of DHCP
Given that the machines come up with an IP eventually leads me to think the DHCP service is actually working ok.
From: Grant Morley [mailto:grant@civo.com]
Sent: Thursday, 5 December 2019 10:10 AM
To: Eric K. Miller <emiller@genesishosting.com>; openstack-operators@lists.openstack.org
Cc: Ian Banks <ian@civo.com>
Subject: Re: DHCP timeout when creating instances for specific tenants
Hi Eric,
Thanks for getting back to me. I am fairly sure it is a DHCP error. The instances are getting an IP when they eventually boot, it is just taking a long time for them to bring up networking. The strange thing is, it only seems to be new tenants. All existing tenants are absolutely fine.
I can check DNS as well just to be on the safe side, however I wasn't seeing any errors in the Nova or Neutron logs when the instance(s) were being created.
Regards,
On 04/12/2019 22:47, Eric K. Miller wrote:
Hi Grant,
Are you sure this is a DHCP timeout and not a DNS resolution issue? I ask because we have seen a strange DNS issue occur that can cause something similar.
Are the VMs being assigned an IP after they finally boot?
Eric K. Miller
Genesis Hosting Solutions, LLC
Try our Genesis Public Cloud - powered by OpenStack!eut
From: Grant Morley [mailto:grant@civo.com]
Sent: Wednesday, December 04, 2019 11:00 AM
To: openstack-operators@lists.openstack.org
Cc: Ian Banks
Subject: DHCP timeout when creating instances for specific tenants
Hi all,
I wonder if anyone can help shed any light on an odd issue we are seeing with only a couple of specific tenants. Basically if they launch an instance they are taking about 5 minutes to launch rather than our usual 30 second or so launch.
We are seeing the following on the instance logs:
Weirdly it only seems to be happening for 1 or 2 new tenants. I have tested this on our personal account and a few other customers have tested and their instances launch really quickly as expected.
Is there anything specific during the tenant creation that can cause this issue? Or are there any logs in nova / neutron I should be looking out for that might shed some light?
I haven't seen anything that is obvious. Any help would be much appreciated as we are a little stumped at the moment.
Many thanks,