DHCP timeout when creating instances for specific tenants
Hi all, I wonder if anyone can help shed any light on an odd issue we are seeing with only a couple of specific tenants. Basically if they launch an instance they are taking about 5 minutes to launch rather than our usual 30 second or so launch. We are seeing the following on the instance logs: https://pastebin.com/hDstsd8G Weirdly it only seems to be happening for 1 or 2 new tenants. I have tested this on our personal account and a few other customers have tested and their instances launch really quickly as expected. Is there anything specific during the tenant creation that can cause this issue? Or are there any logs in nova / neutron I should be looking out for that might shed some light? I haven't seen anything that is obvious. Any help would be much appreciated as we are a little stumped at the moment. Many thanks, -- Grant Morley Cloud Lead, Civo Ltd www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
Hi Grant, Are you sure this is a DHCP timeout and not a DNS resolution issue? I ask because we have seen a strange DNS issue occur that can cause something similar. Are the VMs being assigned an IP after they finally boot? Eric K. Miller Genesis Hosting Solutions, LLC Try our Genesis Public Cloud - powered by OpenStack! https://genesishosting.com/ <https://genesishosting.com/> From: Grant Morley [mailto:grant@civo.com] Sent: Wednesday, December 04, 2019 11:00 AM To: openstack-operators@lists.openstack.org Cc: Ian Banks Subject: DHCP timeout when creating instances for specific tenants Hi all, I wonder if anyone can help shed any light on an odd issue we are seeing with only a couple of specific tenants. Basically if they launch an instance they are taking about 5 minutes to launch rather than our usual 30 second or so launch. We are seeing the following on the instance logs: https://pastebin.com/hDstsd8G Weirdly it only seems to be happening for 1 or 2 new tenants. I have tested this on our personal account and a few other customers have tested and their instances launch really quickly as expected. Is there anything specific during the tenant creation that can cause this issue? Or are there any logs in nova / neutron I should be looking out for that might shed some light? I haven't seen anything that is obvious. Any help would be much appreciated as we are a little stumped at the moment. Many thanks, -- <https://www.civo.com/images/email-logo.jpg> Grant Morley Cloud Lead, Civo Ltd www.civo.com <https://www.civo.com/> | Signup for an account! <https://www.civo.com/signup>
Hi Eric, Thanks for getting back to me. I am fairly sure it is a DHCP error. The instances are getting an IP when they eventually boot, it is just taking a long time for them to bring up networking. The strange thing is, it only seems to be new tenants. All existing tenants are absolutely fine. I can check DNS as well just to be on the safe side, however I wasn't seeing any errors in the Nova or Neutron logs when the instance(s) were being created. Regards, On 04/12/2019 22:47, Eric K. Miller wrote:
Hi Grant,
Are you sure this is a DHCP timeout and not a DNS resolution issue? I ask because we have seen a strange DNS issue occur that can cause something similar.
Are the VMs being assigned an IP after they finally boot?
Eric K. Miller
Genesis Hosting Solutions, LLC
Try our Genesis Public Cloud - powered by OpenStack!eut
*From:*Grant Morley [mailto:grant@civo.com] *Sent:* Wednesday, December 04, 2019 11:00 AM *To:* openstack-operators@lists.openstack.org *Cc:* Ian Banks *Subject:* DHCP timeout when creating instances for specific tenants
Hi all,
I wonder if anyone can help shed any light on an odd issue we are seeing with only a couple of specific tenants. Basically if they launch an instance they are taking about 5 minutes to launch rather than our usual 30 second or so launch.
We are seeing the following on the instance logs:
Weirdly it only seems to be happening for 1 or 2 new tenants. I have tested this on our personal account and a few other customers have tested and their instances launch really quickly as expected.
Is there anything specific during the tenant creation that can cause this issue? Or are there any logs in nova / neutron I should be looking out for that might shed some light?
I haven't seen anything that is obvious. Any help would be much appreciated as we are a little stumped at the moment.
Many thanks,
--
Grant Morley
Cloud Lead, Civo Ltd
www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
Hi Grant, I didn't see any DNS errors either. The solution was to explicitly configure the dns servers in the subnet. Are you doing this already? Or are you relying on the dnsmasq processes created for the router to respond to DNS queries (and forward them respectively)? Eric From: Grant Morley [mailto:grant@civo.com] Sent: Wednesday, December 04, 2019 5:40 PM To: Eric K. Miller; openstack-operators@lists.openstack.org Cc: Ian Banks Subject: Re: DHCP timeout when creating instances for specific tenants Hi Eric, Thanks for getting back to me. I am fairly sure it is a DHCP error. The instances are getting an IP when they eventually boot, it is just taking a long time for them to bring up networking. The strange thing is, it only seems to be new tenants. All existing tenants are absolutely fine. I can check DNS as well just to be on the safe side, however I wasn't seeing any errors in the Nova or Neutron logs when the instance(s) were being created. Regards, On 04/12/2019 22:47, Eric K. Miller wrote: Hi Grant, Are you sure this is a DHCP timeout and not a DNS resolution issue? I ask because we have seen a strange DNS issue occur that can cause something similar. Are the VMs being assigned an IP after they finally boot? Eric K. Miller Genesis Hosting Solutions, LLC Try our Genesis Public Cloud - powered by OpenStack!eut https://genesishosting.com/ From: Grant Morley [mailto:grant@civo.com] Sent: Wednesday, December 04, 2019 11:00 AM To: openstack-operators@lists.openstack.org Cc: Ian Banks Subject: DHCP timeout when creating instances for specific tenants Hi all, I wonder if anyone can help shed any light on an odd issue we are seeing with only a couple of specific tenants. Basically if they launch an instance they are taking about 5 minutes to launch rather than our usual 30 second or so launch. We are seeing the following on the instance logs: https://pastebin.com/hDstsd8G Weirdly it only seems to be happening for 1 or 2 new tenants. I have tested this on our personal account and a few other customers have tested and their instances launch really quickly as expected. Is there anything specific during the tenant creation that can cause this issue? Or are there any logs in nova / neutron I should be looking out for that might shed some light? I haven't seen anything that is obvious. Any help would be much appreciated as we are a little stumped at the moment. Many thanks, -- Grant Morley Cloud Lead, Civo Ltd www.civo.com <https://www.civo.com/> | Signup for an account! <https://www.civo.com/signup>
HI Eric, We are indeed already setting the DNS servers explicitly in the subnet so I don't think that is the issue ( from what I can tell ) I was wondering if it could be an issue with Neutron not responding in time for the DHCP request from the instance, however I haven't yet found any evidence of this. The only other thought I had was that it could be an issue with RabbitMQ somehow and potentially increasing the "rpc timeout" on neutron to see if that helps as I have seen some errors stating that RabbitMQ didn't respond to a message request in time. However I think it could be a red herring as I would assume if RabbitMQ was to blame, existing tenants and instances would also be suffering. Grant On 04/12/2019 23:46, Eric K. Miller wrote:
Hi Grant,
I didn't see any DNS errors either. The solution was to explicitly configure the dns servers in the subnet. Are you doing this already? Or are you relying on the dnsmasq processes created for the router to respond to DNS queries (and forward them respectively)?
Eric
*From:*Grant Morley [mailto:grant@civo.com] *Sent:* Wednesday, December 04, 2019 5:40 PM *To:* Eric K. Miller; openstack-operators@lists.openstack.org *Cc:* Ian Banks *Subject:* Re: DHCP timeout when creating instances for specific tenants
Hi Eric,
Thanks for getting back to me. I am fairly sure it is a DHCP error. The instances are getting an IP when they eventually boot, it is just taking a long time for them to bring up networking. The strange thing is, it only seems to be new tenants. All existing tenants are absolutely fine.
I can check DNS as well just to be on the safe side, however I wasn't seeing any errors in the Nova or Neutron logs when the instance(s) were being created.
Regards,
On 04/12/2019 22:47, Eric K. Miller wrote:
Hi Grant,
Are you sure this is a DHCP timeout and not a DNS resolution issue? I ask because we have seen a strange DNS issue occur that can cause something similar.
Are the VMs being assigned an IP after they finally boot?
Eric K. Miller
Genesis Hosting Solutions, LLC
Try our Genesis Public Cloud - powered by OpenStack!eut
*From:*Grant Morley [mailto:grant@civo.com] *Sent:* Wednesday, December 04, 2019 11:00 AM *To:* openstack-operators@lists.openstack.org <mailto:openstack-operators@lists.openstack.org> *Cc:* Ian Banks *Subject:* DHCP timeout when creating instances for specific tenants
Hi all,
I wonder if anyone can help shed any light on an odd issue we are seeing with only a couple of specific tenants. Basically if they launch an instance they are taking about 5 minutes to launch rather than our usual 30 second or so launch.
We are seeing the following on the instance logs:
Weirdly it only seems to be happening for 1 or 2 new tenants. I have tested this on our personal account and a few other customers have tested and their instances launch really quickly as expected.
Is there anything specific during the tenant creation that can cause this issue? Or are there any logs in nova / neutron I should be looking out for that might shed some light?
I haven't seen anything that is obvious. Any help would be much appreciated as we are a little stumped at the moment.
Many thanks,
--
Image removed by sender.
Grant Morley
Cloud Lead, Civo Ltd
www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
Are they failing to contact the metadata service and hanging during the boot process while they try and receive metadata? From the VM can you hit http://169.254.169.254 – That’s the default IP of the metadata server, it should respond with a basic page showing some date based subdirectories If it doesn’t respond you can start following the metadata service path instead of DHCP Given that the machines come up with an IP eventually leads me to think the DHCP service is actually working ok. From: Grant Morley [mailto:grant@civo.com] Sent: Thursday, 5 December 2019 10:10 AM To: Eric K. Miller <emiller@genesishosting.com>; openstack-operators@lists.openstack.org Cc: Ian Banks <ian@civo.com> Subject: Re: DHCP timeout when creating instances for specific tenants Hi Eric, Thanks for getting back to me. I am fairly sure it is a DHCP error. The instances are getting an IP when they eventually boot, it is just taking a long time for them to bring up networking. The strange thing is, it only seems to be new tenants. All existing tenants are absolutely fine. I can check DNS as well just to be on the safe side, however I wasn't seeing any errors in the Nova or Neutron logs when the instance(s) were being created. Regards, On 04/12/2019 22:47, Eric K. Miller wrote: Hi Grant, Are you sure this is a DHCP timeout and not a DNS resolution issue? I ask because we have seen a strange DNS issue occur that can cause something similar. Are the VMs being assigned an IP after they finally boot? Eric K. Miller Genesis Hosting Solutions, LLC Try our Genesis Public Cloud - powered by OpenStack!eut https://genesishosting.com/ From: Grant Morley [mailto:grant@civo.com] Sent: Wednesday, December 04, 2019 11:00 AM To: openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org> Cc: Ian Banks Subject: DHCP timeout when creating instances for specific tenants Hi all, I wonder if anyone can help shed any light on an odd issue we are seeing with only a couple of specific tenants. Basically if they launch an instance they are taking about 5 minutes to launch rather than our usual 30 second or so launch. We are seeing the following on the instance logs: https://pastebin.com/hDstsd8G Weirdly it only seems to be happening for 1 or 2 new tenants. I have tested this on our personal account and a few other customers have tested and their instances launch really quickly as expected. Is there anything specific during the tenant creation that can cause this issue? Or are there any logs in nova / neutron I should be looking out for that might shed some light? I haven't seen anything that is obvious. Any help would be much appreciated as we are a little stumped at the moment. Many thanks, -- [https://www.civo.com/images/email-logo.jpg] Grant Morley Cloud Lead, Civo Ltd www.civo.com<https://www.civo.com/> | Signup for an account!<https://www.civo.com/signup>
Hi Cory, Thanks for the response. I'll take a look at the metadata service from the instance and from OpenStack itself tomorrow now. It's midnight here in the UK and I need to get some rest. Thanks for the tip, hopefully I'll find something useful to go on from there. Grant, On 04/12/2019 23:49, Cory Hawkless wrote:
Are they failing to contact the metadata service and hanging during the boot process while they try and receive metadata?
From the VM can you hit http://169.254.169.254 – That’s the default IP of the metadata server, it should respond with a basic page showing some date based subdirectories
If it doesn’t respond you can start following the metadata service path instead of DHCP
Given that the machines come up with an IP eventually leads me to think the DHCP service is actually working ok.
*From:*Grant Morley [mailto:grant@civo.com] *Sent:* Thursday, 5 December 2019 10:10 AM *To:* Eric K. Miller <emiller@genesishosting.com>; openstack-operators@lists.openstack.org *Cc:* Ian Banks <ian@civo.com> *Subject:* Re: DHCP timeout when creating instances for specific tenants
Hi Eric,
Thanks for getting back to me. I am fairly sure it is a DHCP error. The instances are getting an IP when they eventually boot, it is just taking a long time for them to bring up networking. The strange thing is, it only seems to be new tenants. All existing tenants are absolutely fine.
I can check DNS as well just to be on the safe side, however I wasn't seeing any errors in the Nova or Neutron logs when the instance(s) were being created.
Regards,
On 04/12/2019 22:47, Eric K. Miller wrote:
Hi Grant,
Are you sure this is a DHCP timeout and not a DNS resolution issue? I ask because we have seen a strange DNS issue occur that can cause something similar.
Are the VMs being assigned an IP after they finally boot?
Eric K. Miller
Genesis Hosting Solutions, LLC
Try our Genesis Public Cloud - powered by OpenStack!eut
*From:*Grant Morley [mailto:grant@civo.com] *Sent:* Wednesday, December 04, 2019 11:00 AM *To:* openstack-operators@lists.openstack.org <mailto:openstack-operators@lists.openstack.org> *Cc:* Ian Banks *Subject:* DHCP timeout when creating instances for specific tenants
Hi all,
I wonder if anyone can help shed any light on an odd issue we are seeing with only a couple of specific tenants. Basically if they launch an instance they are taking about 5 minutes to launch rather than our usual 30 second or so launch.
We are seeing the following on the instance logs:
Weirdly it only seems to be happening for 1 or 2 new tenants. I have tested this on our personal account and a few other customers have tested and their instances launch really quickly as expected.
Is there anything specific during the tenant creation that can cause this issue? Or are there any logs in nova / neutron I should be looking out for that might shed some light?
I haven't seen anything that is obvious. Any help would be much appreciated as we are a little stumped at the moment.
Many thanks,
--
Grant Morley
Cloud Lead, Civo Ltd
www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
Maybe the following could provide a bit more data : - Launch a test instance in the tenant project experiencing the issue. - tcpdump directly on the instance TAP interface - confirm if you are seeing DHCP DISCOVER/REQUEST/OFFER - Would also allow you to see the Cloudinit traffic. On Wed, Dec 4, 2019 at 7:06 PM Grant Morley <grant@civo.com> wrote:
Hi Cory,
Thanks for the response. I'll take a look at the metadata service from the instance and from OpenStack itself tomorrow now. It's midnight here in the UK and I need to get some rest. Thanks for the tip, hopefully I'll find something useful to go on from there.
Grant, On 04/12/2019 23:49, Cory Hawkless wrote:
Are they failing to contact the metadata service and hanging during the boot process while they try and receive metadata?
From the VM can you hit http://169.254.169.254 – That’s the default IP of the metadata server, it should respond with a basic page showing some date based subdirectories
If it doesn’t respond you can start following the metadata service path instead of DHCP
Given that the machines come up with an IP eventually leads me to think the DHCP service is actually working ok.
*From:* Grant Morley [mailto:grant@civo.com <grant@civo.com>] *Sent:* Thursday, 5 December 2019 10:10 AM *To:* Eric K. Miller <emiller@genesishosting.com> <emiller@genesishosting.com>; openstack-operators@lists.openstack.org *Cc:* Ian Banks <ian@civo.com> <ian@civo.com> *Subject:* Re: DHCP timeout when creating instances for specific tenants
Hi Eric,
Thanks for getting back to me. I am fairly sure it is a DHCP error. The instances are getting an IP when they eventually boot, it is just taking a long time for them to bring up networking. The strange thing is, it only seems to be new tenants. All existing tenants are absolutely fine.
I can check DNS as well just to be on the safe side, however I wasn't seeing any errors in the Nova or Neutron logs when the instance(s) were being created.
Regards,
On 04/12/2019 22:47, Eric K. Miller wrote:
Hi Grant,
Are you sure this is a DHCP timeout and not a DNS resolution issue? I ask because we have seen a strange DNS issue occur that can cause something similar.
Are the VMs being assigned an IP after they finally boot?
Eric K. Miller
Genesis Hosting Solutions, LLC
Try our Genesis Public Cloud - powered by OpenStack!eut
*From:* Grant Morley [mailto:grant@civo.com <grant@civo.com>] *Sent:* Wednesday, December 04, 2019 11:00 AM *To:* openstack-operators@lists.openstack.org *Cc:* Ian Banks *Subject:* DHCP timeout when creating instances for specific tenants
Hi all,
I wonder if anyone can help shed any light on an odd issue we are seeing with only a couple of specific tenants. Basically if they launch an instance they are taking about 5 minutes to launch rather than our usual 30 second or so launch.
We are seeing the following on the instance logs:
Weirdly it only seems to be happening for 1 or 2 new tenants. I have tested this on our personal account and a few other customers have tested and their instances launch really quickly as expected.
Is there anything specific during the tenant creation that can cause this issue? Or are there any logs in nova / neutron I should be looking out for that might shed some light?
I haven't seen anything that is obvious. Any help would be much appreciated as we are a little stumped at the moment.
Many thanks,
--
Grant Morley
Cloud Lead, Civo Ltd
www.civo.com | Signup for an account! <https://www.civo.com/signup>
Hi all, It looks like the issue was actually with the Ubuntu image for both 16.04 and 18.04. We changed the dhcp timeout in "/etc/dhcp/dhclient.conf" from 300 seconds down to 2 seconds and the instances then worked absolutely fine. Not sure why it was only happening for some tenants and not others but that has resolved it. I am still going to look into the metadata service as the fix doesn't feel right to me still. Thanks again for all your help. Grant On 05/12/2019 00:09, Laurent Dumont wrote:
Maybe the following could provide a bit more data :
* Launch a test instance in the tenant project experiencing the issue. * tcpdump directly on the instance TAP interface - confirm if you are seeing DHCP DISCOVER/REQUEST/OFFER * Would also allow you to see the Cloudinit traffic.
On Wed, Dec 4, 2019 at 7:06 PM Grant Morley <grant@civo.com <mailto:grant@civo.com>> wrote:
Hi Cory,
Thanks for the response. I'll take a look at the metadata service from the instance and from OpenStack itself tomorrow now. It's midnight here in the UK and I need to get some rest. Thanks for the tip, hopefully I'll find something useful to go on from there.
Grant,
On 04/12/2019 23:49, Cory Hawkless wrote:
Are they failing to contact the metadata service and hanging during the boot process while they try and receive metadata?
From the VM can you hit http://169.254.169.254 – That’s the default IP of the metadata server, it should respond with a basic page showing some date based subdirectories
If it doesn’t respond you can start following the metadata service path instead of DHCP
Given that the machines come up with an IP eventually leads me to think the DHCP service is actually working ok.
*From:*Grant Morley [mailto:grant@civo.com] *Sent:* Thursday, 5 December 2019 10:10 AM *To:* Eric K. Miller <emiller@genesishosting.com> <mailto:emiller@genesishosting.com>; openstack-operators@lists.openstack.org <mailto:openstack-operators@lists.openstack.org> *Cc:* Ian Banks <ian@civo.com> <mailto:ian@civo.com> *Subject:* Re: DHCP timeout when creating instances for specific tenants
Hi Eric,
Thanks for getting back to me. I am fairly sure it is a DHCP error. The instances are getting an IP when they eventually boot, it is just taking a long time for them to bring up networking. The strange thing is, it only seems to be new tenants. All existing tenants are absolutely fine.
I can check DNS as well just to be on the safe side, however I wasn't seeing any errors in the Nova or Neutron logs when the instance(s) were being created.
Regards,
On 04/12/2019 22:47, Eric K. Miller wrote:
Hi Grant,
Are you sure this is a DHCP timeout and not a DNS resolution issue? I ask because we have seen a strange DNS issue occur that can cause something similar.
Are the VMs being assigned an IP after they finally boot?
Eric K. Miller
Genesis Hosting Solutions, LLC
Try our Genesis Public Cloud - powered by OpenStack!eut
*From:*Grant Morley [mailto:grant@civo.com] *Sent:* Wednesday, December 04, 2019 11:00 AM *To:* openstack-operators@lists.openstack.org <mailto:openstack-operators@lists.openstack.org> *Cc:* Ian Banks *Subject:* DHCP timeout when creating instances for specific tenants
Hi all,
I wonder if anyone can help shed any light on an odd issue we are seeing with only a couple of specific tenants. Basically if they launch an instance they are taking about 5 minutes to launch rather than our usual 30 second or so launch.
We are seeing the following on the instance logs:
Weirdly it only seems to be happening for 1 or 2 new tenants. I have tested this on our personal account and a few other customers have tested and their instances launch really quickly as expected.
Is there anything specific during the tenant creation that can cause this issue? Or are there any logs in nova / neutron I should be looking out for that might shed some light?
I haven't seen anything that is obvious. Any help would be much appreciated as we are a little stumped at the moment.
Many thanks,
--
Grant Morley
Cloud Lead, Civo Ltd
www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
--
Grant Morley Cloud Lead, Civo Ltd www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
participants (4)
-
Cory Hawkless
-
Eric K. Miller
-
Grant Morley
-
Laurent Dumont