<div dir="ltr"><div><div><div><div><div><div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote"><div>Do you have this log in agent log file :</div><div>
2013-08-07 13:21:46 WARNING [quantum.<span class="">openstack</span>.common.loopingcall] task run outlasted interval by 2.375859 sec</div></blockquote>Yes, i have:<br>"WARNING [quantum.openstack.common.loopingcall] task run outlasted interval by 4.738189 sec"<br>
<br>I set report_interval = 15 and agent_down_time = 30, then launch 50 instances simultaneously.<br></div>Now, every instances is ok, I can ping them all. But in Dashboard, I still see a bug, some instanes have 2 IP addresses (screenshot attached) - and ofcourse, with each instance I can only ping 1 IP address.<br>
<br></div>For example, with first instance in attached image, I check file Dnsmasq's host, see 2 entries:<br><b>fa:16:3e:7c:42:bb</b>,10-2-1-41.openstacklocal,10.2.1.41<br>fa:16:3e:0c:bc:4b,10-2-1-52.openstacklocal,10.2.1.52<br>
</div>Only can ping 10.2.1.52<br><br></div>I check Quantum DB, i saw that <b>fa:16:3e:7c:42:bb</b> still exist in 'ports' table.<br>('6260622a6b324557bc9064698c8c03ed','<span style="color:rgb(255,0,0)"><b>f3e79e1b-2236-4189-8516-fb18dc7e58a9</b></span>','','dbc59888-e2be-4b31-b579-0a4575159bb1',<b>'fa:16:3e:7c:42:bb</b>',1,'DOWN','8420f945-2d88-4204-8444-9c078491def0','compute:None')<br>
</div><div>Same result with quantum port-list:<br>| <span style="color:rgb(255,0,0)"><b>f3e79e1b-2236-4189-8516-fb18dc7e58a9</b></span> | | fa:16:3e:7c:42:bb | {"subnet_id": "4d238201-a8d5-4175-a9b4-c1d13efb5e2e", "ip_address": "10.2.1.41"} |<br>
<br></div><div><br></div>Then, I check Nova DB, and found a record in <span style="color:rgb(0,0,255)"><b>instance_info_caches</b></span> table:<br>| 2013-08-08 04:34:38 | 2013-08-08 04:38:19 | NULL | 2730 | [{"ovs_interfaceid": "43802d05-ee1f-401d-9d61-055444da8df4", "network": {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "10.2.1.52"}], "version": 4, "meta": {"dhcp_server": "10.2.1.9"}, "dns": [], "routes": [], "cidr": "<a href="http://10.2.1.0/24">10.2.1.0/24</a>", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "10.2.1.1"}}], "meta": {"injected": false, "tenant_id": "6260622a6b324557bc9064698c8c03ed"}, "id": "dbc59888-e2be-4b31-b579-0a4575159bb1", "label": "net_minhcd_proj1"}, "devname": "tap43802d05-ee", "qbh_params": null, "meta": {}, "address": "fa:16:3e:0c:bc:4b", "type": "ovs", "id": "43802d05-ee1f-401d-9d61-055444da8df4", "qbg_params": null}, {"ovs_interfaceid": "<span style="color:rgb(255,0,0)"><b>f3e79e1b-2236-4189-8516-fb18dc7e58a9</b></span>", "network": {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "10.2.1.41"}], "version": 4, "meta": {"dhcp_server": "10.2.1.9"}, "dns": [], "routes": [], "cidr": "<a href="http://10.2.1.0/24">10.2.1.0/24</a>", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "10.2.1.1"}}], "meta": {"injected": false, "tenant_id": "6260622a6b324557bc9064698c8c03ed"}, "id": "dbc59888-e2be-4b31-b579-0a4575159bb1", "label": "net_minhcd_proj1"}, "devname": "tapf3e79e1b-22", "qbh_params": null, "meta": {}, "address": "fa:16:3e:7c:42:bb", "type": "ovs", "id": "<span style="color:rgb(255,0,0)"><b>f3e79e1b-2236-4189-8516-fb18dc7e58a9</b></span>", "qbg_params": null}] | 8420f945-2d88-4204-8444-9c078491def0 | 0 |<br>
</div><br><div><div><div><div><div><div>In quantum-server.log:<br>2013-08-08 11:30:14 DEBUG [quantum.openstack.common.rpc.amqp] received {u'_context_roles': [u'admin'], u'_context_read_deleted': u'no', u'_context_tenant_id': None, u'args': {u'network_id': u'dbc59888-e2be-4b31-b579-0a4575159bb1', u'lease_remaining': 0, u'host': u'thor-quantum-01.localdomain', u'ip_address': u'10.2.1.41'}, u'_unique_id': u'49f419ee040d4d77822ecf696533e484', u'_context_is_admin': True, u'version': u'1.0', u'_context_project_id': None, u'_context_timestamp': u'2013-08-08 04:26:09.092921', u'_context_user_id': None, u'method': u'update_lease_expiration'}<br>
2013-08-08 11:30:16 DEBUG [quantum.db.dhcp_rpc_base] Updating lease expiration for 10.2.1.41 on network dbc59888-e2be-4b31-b579-0a4575159bb1 from thor-quantum-01.localdomain.<br>2013-08-08 11:33:53 DEBUG [quantum.db.db_base_plugin_v2] Recycle 10.2.1.41<br>
2013-08-08 11:33:53 DEBUG [quantum.db.db_base_plugin_v2] Recycle: updated last 10.2.1.39-10.2.1.41<br>2013-08-08 11:33:53 DEBUG [quantum.db.db_base_plugin_v2] Delete allocated IP 10.2.1.41 (dbc59888-e2be-4b31-b579-0a4575159bb1/4d238201-a8d5-4175-a9b4-c1d13efb5e2e)<br>
2013-08-08 11:33:53 DEBUG [quantum.db.db_base_plugin_v2] Recycle: last match for 10.2.1.39-10.2.1.41<br>2013-08-08 11:35:29 DEBUG [quantum.db.db_base_plugin_v2] Allocated IP - 10.2.1.41 from 10.2.1.41 to 10.2.1.42<br>
2013-08-08 11:35:29 DEBUG [quantum.db.db_base_plugin_v2] Allocated IP 10.2.1.41 (dbc59888-e2be-4b31-b579-0a4575159bb1/4d238201-a8d5-4175-a9b4-c1d13efb5e2e/f3e79e1b-2236-4189-8516-fb18dc7e58a9)<br></div><div>(seem normal?)<br>
<br>And when i deleted all instances, some entries still exists in Dnsmasq's host file --> can't ping on next launching.<br></div><div>Maybe I need to increase report_interval more, because I still see the message "WARNING [quantum.openstack.common.loopingcall] task run outlasted interval by X seconds" on high stressed test.<br>
<br></div><div>But the question is, how much is enough?<br></div><div>Could i fix this bug thoroughly? (apply patch? but need to rename Quantum<->Neutron first)<br><br></div><div>Thank you very much!<br></div></div>
</div></div></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Aug 7, 2013 at 9:46 PM, Édouard Thuleau <span dir="ltr"><<a href="mailto:thuleau@gmail.com" target="_blank">thuleau@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I think we have found (Sylvain and me) a problem that can explain this trouble:<div><br></div><div><div>
When the load is too heavy (update dnsmasq host file and send lease update) on DHCP agent, the report state to Neutron server is delayed and the Neutron sever considers that agent is down and doesn't sent the port creation to the agent. So the dnsmasq host file isn't updated to serve that IP port's.</div>
<div><br></div><div>Do you have this log in agent log file :</div><div>2013-08-07 13:21:46 WARNING [quantum.openstack.common.loopingcall] task run outlasted interval by 2.375859 sec</div><div><br></div><div>You can increase the 'report_interval' flag on the agent and the 'agent_down_time' flag on the Neutron server side.</div>
<div>This problem should be corrected with this bp: <a href="https://blueprints.launchpad.net/neutron/+spec/remove-dhcp-lease" target="_blank">https://blueprints.launchpad.net/neutron/+spec/remove-dhcp-lease</a></div><div>
Meanwhile, I think we should add log warning in the neutron server code to prevent that it cannot notify any DHCP agent for a port creation. And backport that on the Grizzly release.</div>
<div><br></div><div>What do you think ?</div><div><br></div></div><div>I had this comment on the bug <a href="https://bugs.launchpad.net/neutron/+bug/1185916" target="_blank">https://bugs.launchpad.net/neutron/+bug/1185916</a></div>
<div><br></div><div>Édouard.</div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote"><div><div class="h5">On Fri, Aug 2, 2013 at 11:45 AM, Chu Duc Minh <span dir="ltr"><<a href="mailto:chu.ducminh@gmail.com" target="_blank">chu.ducminh@gmail.com</a>></span> wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div dir="ltr"><div><div>After i deleted 2 instances: 10.2.1.10 & 10.2.1.12<br></div>
The Dnsmasq's hosts file is:<br>
<span style="font-family:courier new,monospace"><div>fa:16:3e:01:d1:70,10-2-1-1.openstacklocal,10.2.1.1<br>
fa:16:3e:71:6a:4e,10-2-1-11.openstacklocal,10.2.1.11<br></div><b>fa:16:3e:cf:0f:c1,10-2-1-12.openstacklocal,10.2.1.12</b> <b><span style="color:rgb(255,0,0)"><span style="font-family:arial,helvetica,sans-serif"><-- still exist, problem?!</span></span></b><div>
<br>
fa:16:3e:35:a1:72,10-2-1-9.openstacklocal,10.2.1.9</div></span><br><br></div>BR,<br></div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Aug 2, 2013 at 4:27 PM, Chu Duc Minh <span dir="ltr"><<a href="mailto:chu.ducminh@gmail.com" target="_blank">chu.ducminh@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Hi, i have the same problem when create -> terminate -> create instances.<br></div>This problem only occur when the new instances have the same IP as deleted instances.<br>
</div><div><br>I check the dnsmasq's host file /var/lib/quantum/dhcp/dbc59888-e2be-4b31-b579-0a4575159bb1/host,<br>
</div><div>sometimes it's not update.<br></div><div><br>I think this problem maybe not only related to Dnsmasq, it may related to firewall rules (generated by Quantum) on compute-node too. Because i see some dropped DHCP packet:<br>
<span style="font-family:courier new,monospace">Aug 2 14:08:11 thor-compute-03 kernel: [95971.005423] IN=qbr23c67719-14 OUT=qbr23c67719-14 PHYSIN=qvb23c67719-14 PHYSOUT=tap23c67719-<br>14 MAC=ff:ff:ff:ff:ff:ff:fa:16:3e:34:72:05:08:00 SRC=0.0.0.0 DST=255.255.255.255 LEN=328 TOS=0x10 PREC=0x00 TTL=128 ID=0 <b>PROTO=UDP SPT=68 DPT=67</b> LEN=308 <br>
</span></div><div><span style="font-family:courier new,monospace">(DHCP Discovery packet?)<br></span></div><div><span style="font-family:courier new,monospace"></span></div><div>It dropped in chain quantum-openvswi-sg-fallback, then instance can't get IP. Although in Dashboard i see instance got IP.<br>
<br></div><div>I tried many times, and got a strange case: duplicate IP in Dnsmasq's host file: <br><span style="font-family:courier new,monospace">fa:16:3e:01:d1:70,10-2-1-1.openstacklocal,10.2.1.1<br>fa:16:3e:71:6a:4e,10-2-1-11.openstacklocal,10.2.1.11<br>
<b>fa:16:3e:78:b5:2f,10-2-1-10.openstacklocal,10.2.1.10</b><br>fa:16:3e:35:a1:72,10-2-1-9.openstacklocal,10.2.1.9<br>fa:16:3e:cf:0f:c1,10-2-1-12.openstacklocal,10.2.1.12<br><b>fa:16:3e:c7:ea:0c,10-2-1-10.openstacklocal,10.2.1.10</b></span><br>
<br></div><div>My newest instance is <span style="font-family:courier new,monospace"><b>10.2.1.10</b><font face="arial,helvetica,sans-serif">, and I can't ping it. In boot log of this instance, i found:<br></font></span><pre>
cloudinitnonet waiting 120 seconds for a network device.
cloudinitnonet gave up waiting for a network device.
ciinfo: lo : 1 127.0.0.1 255.0.0.0 .
ciinfo: eth0 : 1 . . fa:16:3e:c7:ea:0c
route_info failed</pre></div><div>Restart instance didn't make it work, but restart quantum-dhcp-agent on Quantum-node make it work.<br></div><div>After restart, content of Dnsmasq's host file is:<br><span style="font-family:courier new,monospace">fa:16:3e:01:d1:70,10-2-1-1.openstacklocal,10.2.1.1<br>
fa:16:3e:71:6a:4e,10-2-1-11.openstacklocal,10.2.1.11<br>fa:16:3e:cf:0f:c1,10-2-1-12.openstacklocal,10.2.1.12<br>fa:16:3e:35:a1:72,10-2-1-9.openstacklocal,10.2.1.9<br><b>fa:16:3e:c7:ea:0c,10-2-1-10.openstacklocal,10.2.1.10</b></span><br>
<br></div><div>I think it a serious problem, hope someone could fix it soon.. :)<br><br>Best Regards,<br></div></div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Jul 2, 2013 at 8:01 PM, James Page <span dir="ltr"><<a href="mailto:james.page@ubuntu.com" target="_blank">james.page@ubuntu.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>On 20/05/13 07:51, Heinonen, Johanna (NSN - FI/Espoo) wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
I have installed grizzly with quantum and ovs-plugin. It seems that<br>
grizzly allocates the third address of each subnet for dhcp. (In folsom<br>
it was the second address). This means that the VMs will get addresses<br>
</blockquote>
<br></div>
This sound alot like <a href="https://bugs.launchpad.net/ubuntu/+source/quantum/+bug/1189909" target="_blank">https://bugs.launchpad.net/<u></u>ubuntu/+source/quantum/+bug/<u></u>1189909</a>; I'll raise a task for dnsmasq as well.<br>
<br>
Cheers<span><font color="#888888"><br>
<br>
James<br>
<br>
-- <br>
James Page<br>
Ubuntu Core Developer<br>
Debian Maintainer<br>
<a href="mailto:james.page@ubuntu.com" target="_blank">james.page@ubuntu.com</a></font></span><div><div><br>
<br>
______________________________<u></u>_________________<br>
Mailing list: <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~<u></u>openstack</a><br>
Post to : <a href="mailto:openstack@lists.launchpad.net" target="_blank">openstack@lists.launchpad.net</a><br>
Unsubscribe : <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~<u></u>openstack</a><br>
More help : <a href="https://help.launchpad.net/ListHelp" target="_blank">https://help.launchpad.net/<u></u>ListHelp</a><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div><br></div></div>_______________________________________________<br>
Mailing list: <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack</a><br>
Post to : <a href="mailto:openstack@lists.openstack.org" target="_blank">openstack@lists.openstack.org</a><br>
Unsubscribe : <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack</a><br>
<br></blockquote></div><br></div>
</blockquote></div><br></div>