Hey all,<div><br></div><div>many many thanks for all your replies, and while already having raised the dhcp timeouts</div><div>just by now, I'll have now enough time to sleep to actually apply the dnsmasq fix</div><div>

tomorrow then.</div><div><br></div><div>Yes, I am running in VLAN-mode, since this is also the propagated way.</div><div><br></div><div>Maybe OpenStack (nova-network) should check the version number of dnsmasq and</div><div>

if running in vlan mode, it really should issue a (critical) warning into the logs,</div><div>especially where this kind of error can lead to disasters in datacenters. :)</div><div><br></div><div>I also hope that Ubuntu 12.04 will pick up this patch soon enough, so the "us" won't</div>

<div>end up in a patch-dominated distribution :-)</div><div><br></div><div>Good night all,</div><div>Christian.<br><br><div class="gmail_quote">On Fri, Jun 15, 2012 at 1:16 AM, Narayan Desai <span dir="ltr"><<a href="mailto:narayan.desai@gmail.com" target="_blank">narayan.desai@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I vaguely recall Vish mentioning a bug in dnsmasq that had a somewhat<br>

similar problem. (it had to do with lease renewal problems on ip<br>

aliases or something like that).<br>

<br>

This issue was particularly pronounced with windows VMs, apparently.<br>

 -nld<br>

<div class="HOEnZb"><div class="h5"><br>

On Thu, Jun 14, 2012 at 6:02 PM, Christian Parpart <<a href="mailto:trapni@gmail.com">trapni@gmail.com</a>> wrote:<br>

> Hey,<br>

><br>

> thanks for your reply. Unfortunately there was no process restart in<br>

> nova-network nor in dnsmasq,<br>

> both processes seem to have been up for about 2 and 3 days.<br>

><br>

> However, why is the default dhcp_lease_time value equal 120s? Not having<br>

> this one overridden<br>

> causes the clients to actually re-acquire a new DHCP lease every 42 seconds<br>

> (at least on my nodes),<br>

> which is completely ridiculous.<br>

> OTOH, I took a look at the sources (linux_net.py) and found out, why the<br>

> max_lease_time is<br>

> set to 2048, because that is the size of my network.<br>

> So why is the max lease time the size of my network?<br>

> I've written a tiny patch to allow overriding this value in nova.conf, and<br>

> will submit it to launchpad<br>

> soon - and hope it'll be accepted and then also applied to essex, since this<br>

> is a very straight forward<br>

> few-liner helpful thing.<br>

><br>

> Nevertheless, that does not clarify on why now I had 2 (well, 3 actually)<br>

> instances getting<br>

> no DHCP replies/offers after some hours/days anymore.<br>

><br>

> The one host that caused issues today (a few hours ago), I fixed it by hard<br>

> rebooting the instance,<br>

> however, just about 40 minutes later, it again forgot its IP, so one might<br>

> say, that it<br>

> maybe did not get any reply from the dhcp server (dnsmasq) almost right<br>

> after it got<br>

> a lease on instance boot.<br>

><br>

> So long,<br>

> Christian.<br>

><br>

> On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton<br>

> <<a href="mailto:nathanael.i.burton@gmail.com">nathanael.i.burton@gmail.com</a>> wrote:<br>

>><br>

>> Has nova-network been restarted? There was an issue where nova-network was<br>

>> signalling dnsmasq which would cause dnsmasq to stop responding to requests<br>

>> yet appear to be running fine.<br>

>><br>

>> You can see if killing dnsmasq, restarting nova-network, and rebooting an<br>

>> instance allows it to get a dhcp address again ...<br>

>><br>

>> Nate<br>

>><br>

>> On Jun 14, 2012 4:46 PM, "Christian Parpart" <<a href="mailto:trapni@gmail.com">trapni@gmail.com</a>> wrote:<br>

>>><br>

>>> Hey all,<br>

>>><br>

>>> I feel really sad with saying this, now, that we have quite a few<br>

>>> instances in producgtion<br>

>>> since about 5 days at least, I now have encountered the second instance<br>

>>> loosing its<br>

>>> IP address due to "No DHCPOFFER" (as of syslog in the instance).<br>

>>><br>

>>> I checked the logs in the central nova-network and gateway node and found<br>

>>> dnsmasq still to reply on requests from all the other instances and it<br>

>>> even<br>

>>> got the request from the instance in question and even sent an OFFER, as<br>

>>> of what<br>

>>> I can tell by now (i'm investigating / posting logs asap), but while it<br>

>>> seemed<br>

>>> that the dnsmasq sends an offer, the instances says it didn't receive one<br>

>>> - wtf?<br>

>>><br>

>>> Please tell me what I can do to actually *fix* this issue, since this is<br>

>>> by far very fatal.<br>

>>><br>

>>> One chance I'd see (as a workaround) is, to let created instanced<br>

>>> retrieve<br>

>>> its IP via dhcp, but then reconfigure /etc/network/instances to continue<br>

>>> with<br>

>>> static networking setup. However, I'd just like the dhcp thingy to get<br>

>>> fixed.<br>

>>><br>

>>> I'm very open to any kind of helping comments, :)<br>

>>><br>

>>> So long,<br>

>>> Christian.<br>

>>><br>

>>><br>

>>> _______________________________________________<br>

>>> Mailing list: <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>

>>> Post to     : <a href="mailto:openstack@lists.launchpad.net">openstack@lists.launchpad.net</a><br>

>>> Unsubscribe : <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>

>>> More help   : <a href="https://help.launchpad.net/ListHelp" target="_blank">https://help.launchpad.net/ListHelp</a><br>

>>><br>

><br>

><br>

> _______________________________________________<br>

> Mailing list: <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>

> Post to     : <a href="mailto:openstack@lists.launchpad.net">openstack@lists.launchpad.net</a><br>

> Unsubscribe : <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>

> More help   : <a href="https://help.launchpad.net/ListHelp" target="_blank">https://help.launchpad.net/ListHelp</a><br>

><br>

</div></div></blockquote></div><br></div>