[Openstack] Need help - Compute Node restarted - Ramdom Instances doesn't get and IP anymore

Razique Mahroua razique.mahroua at gmail.com
Tue Nov 19 01:18:43 UTC 2013


Yah, well it's always better to have a production-deployment schedule 
based on a specific OpenStack release and stick to it.
Consider pushing to production the last "stable" version after extended 
tests. Thruth is, every deployment should be made after a bench, a 
specific testing protocol to make sure the new version will work as 
expected in production.
I haven't extensively tested the network on Havana, so I can't really 
tell. Maybe you could fill a bug if you are able to reproduce that 
everytime?



On 18 Nov 2013, at 13:09, Martinx - ジェームズ wrote:

> I did nothing besides a Compute Node reboot... Everything is back to 
> normal
> now, few hours after restarting it...
>
> It is easily to reproduce this, every time I reboot a Compute Node, 
> some
> instances doesn't get its IP... Need to wait hours to get it back to
> normal, without any intervention...
>
> Unfortunately, Havana have a LOT of network problems, I believe that 
> all of
> its problems is related to "Per-Tenant Router with Private Networks"
> topology, it is almost useless (it is definitely not production-ready 
> (this
> topology)), I mean, I'm using it here but, I'm "living on the edge" 
> with
> Havana... One more problematic step, my cloud is dead...
>
> Nothing unusual appeared at the Network Node DHCP log during this 
> outage. I
> can enable the DEBUG and restart the Compute Node again, to see if I 
> catch
> something but, it will interfere with my running tenants...
>
> Tks!
> Thiago
>
>
> On 18 November 2013 16:50, Razique Mahroua 
> <razique.mahroua at gmail.com>wrote:
>
>> now, that's interesting, you didn't even restarted a service?
>> Did you found something into the dhcp-agent logs?
>>
>> On 18 Nov 2013, at 10:46, Martinx - ジェームズ wrote:
>>
>>> Thank you Razique!
>>>
>>> Out of nothing, all instances gets its IP automatically again, 
>>> without
>> even
>>> restarting it... Have no idea about what had happened.
>>>
>>> But, this is very weird, every time I restart a compute node, those
>> network
>>> problems appear... No idea about the source of this problem...
>>>
>>> Tks again!
>>>
>>> Best,
>>> Thiago
>>>
>>>
>>> On 18 November 2013 16:36, Razique Mahroua 
>>> <razique.mahroua at gmail.com
>>> wrote:
>>>
>>>> Check the dhcp-agent logs especially when you force a dhcp renew on
>> these
>>>> instances.
>>>> Meanwhile, use tcpdump with:
>>>> tcpdump -i ROUTER-INTERFACE -vvv -s 1500 '((port 67 or port 68) and
>>>> (udp[8:1] = 0x1))'
>>>>
>>>> if you want to check the DHCP paquets for a particular instance, 
>>>> get its
>>>> mac and:
>>>> tcpdump -i ROUTER-INTERFACE -vvv -s 1500 '((port 67 or port 68) and
>>>> (udp[38:4] = 0xMAC-ADDR))'
>>>>
>>>> Razique
>>>>
>>>> On 18 Nov 2013, at 10:10, Martinx - ジェームズ wrote:
>>>>
>>>> Okay... I'm calm... :-P
>>>>
>>>> This is the second time I'm seeing this with Havana.
>>>>
>>>> Compute Node reboots, lots of Instances doesn't get its IP anymore,
>> look:
>>>> ------------------------------
>>>>
>>>> cloud-init start-local running: Mon, 18 Nov 2013 16:34:06 +0000. up
>> 18.19
>>>> seconds
>>>> no instance data found in start-local
>>>> cloud-init-nonet waiting 120 seconds for a network device.
>>>> cloud-init-nonet gave up waiting for a network device.
>>>> ci-info: lo : 1 127.0.0.1 255.0.0.0 .
>>>> ci-info: eth0 : 1 . . fa:16:3e:a2:71:74
>>>> route_info failed
>>>> Waiting for network configuration...
>>>> Waiting up to 60 more seconds for network configuration...
>>>> Booting system without full network configuration...
>>>>
>>>> New Instances that I launch right now, get its IP normally.
>>>>
>>>> The only way to put the website online again now is: take a 
>>>> snapshot of
>> a
>>>> Instance without IP, launch a new instance based on that image, 
>>>> voialá!
>>>> Instance gets its IP again... But this is non-viable.
>>>>
>>>> I appreciate any help!
>>>>
>>>> Tks!
>>>> Thiago
>>>>
>>>> On 18 November 2013 15:35, Razique Mahroua
>> razique.mahroua at gmail.comwrote:
>>>>
>>>> Hey Martin :)
>>>> On 18 Nov 2013, at 8:40, Martinx - ジェームズ wrote:
>>>>
>>>> Guys,
>>>>
>>>> My Havana (Ubuntu based) Compute Node was restarted and lots of
>> Instances
>>>> does not get an IP anymore.
>>>>
>>>> Tips?!
>>>>
>>>> Stay clam
>>>>
>>>> It is ramdom, I mean, some instances of this same compute node are
>>>>
>>>> normal,
>>>>
>>>> while others have no IP.
>>>>
>>>> Are you reffering to the private IPs pool or the public. When you 
>>>> say
>>>> "don't have", you mean they don't get allocated or the instances 
>>>> (DHCP?
>> )
>>>> don't retrieve it?
>>>>
>>>> I really need help here because my client's web site is completely 
>>>> off
>>>>
>>>> line
>>>>
>>>> now...
>>>>
>>>> I'm using Per-Tenant router with private networks + VXLAN.
>>>>
>>>> Tks!
>>>> Thiago
>>>> ------------------------------
>>>>
>>>> Mailing list:
>>>>
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>
>>>> Post to : openstack at lists.openstack.org
>>>> Unsubscribe :
>>>>
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>
>>>>
>>




More information about the Openstack mailing list