[Openstack] DHCP problem in grizzly

Chu Duc Minh chu.ducminh at gmail.com
Thu Aug 8 05:52:11 UTC 2013


>
> Do you have this log in agent log file :
> 2013-08-07 13:21:46  WARNING [quantum.openstack.common.loopingcall] task
> run outlasted interval by 2.375859 sec
>
Yes, i have:
"WARNING [quantum.openstack.common.loopingcall] task run outlasted interval
by 4.738189 sec"

I set report_interval = 15 and agent_down_time = 30, then launch 50
instances simultaneously.
Now, every instances is ok, I can ping them all. But in Dashboard, I still
see a bug, some instanes have 2 IP addresses (screenshot attached) - and
ofcourse, with each instance I can only ping 1 IP address.

For example, with first instance in attached image, I check file Dnsmasq's
host, see 2 entries:
*fa:16:3e:7c:42:bb*,10-2-1-41.openstacklocal,10.2.1.41
fa:16:3e:0c:bc:4b,10-2-1-52.openstacklocal,10.2.1.52
Only can ping 10.2.1.52

I check Quantum DB, i saw that *fa:16:3e:7c:42:bb* still exist in 'ports'
table.
('6260622a6b324557bc9064698c8c03ed','*f3e79e1b-2236-4189-8516-fb18dc7e58a9*
','','dbc59888-e2be-4b31-b579-0a4575159bb1',*'fa:16:3e:7c:42:bb*
',1,'DOWN','8420f945-2d88-4204-8444-9c078491def0','compute:None')
Same result with quantum port-list:
| *f3e79e1b-2236-4189-8516-fb18dc7e58a9* |      | fa:16:3e:7c:42:bb |
{"subnet_id": "4d238201-a8d5-4175-a9b4-c1d13efb5e2e", "ip_address":
"10.2.1.41"}     |


Then, I check Nova DB, and found a record in *instance_info_caches* table:
| 2013-08-08 04:34:38 | 2013-08-08 04:38:19 | NULL       | 2730 |
[{"ovs_interfaceid": "43802d05-ee1f-401d-9d61-055444da8df4", "network":
{"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 4,
"type": "fixed", "floating_ips": [], "address": "10.2.1.52"}], "version":
4, "meta": {"dhcp_server": "10.2.1.9"}, "dns": [], "routes": [], "cidr": "
10.2.1.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway",
"address": "10.2.1.1"}}], "meta": {"injected": false, "tenant_id":
"6260622a6b324557bc9064698c8c03ed"}, "id":
"dbc59888-e2be-4b31-b579-0a4575159bb1", "label": "net_minhcd_proj1"},
"devname": "tap43802d05-ee", "qbh_params": null, "meta": {}, "address":
"fa:16:3e:0c:bc:4b", "type": "ovs", "id":
"43802d05-ee1f-401d-9d61-055444da8df4", "qbg_params": null},
{"ovs_interfaceid": "*f3e79e1b-2236-4189-8516-fb18dc7e58a9*", "network":
{"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 4,
"type": "fixed", "floating_ips": [], "address": "10.2.1.41"}], "version":
4, "meta": {"dhcp_server": "10.2.1.9"}, "dns": [], "routes": [], "cidr": "
10.2.1.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway",
"address": "10.2.1.1"}}], "meta": {"injected": false, "tenant_id":
"6260622a6b324557bc9064698c8c03ed"}, "id":
"dbc59888-e2be-4b31-b579-0a4575159bb1", "label": "net_minhcd_proj1"},
"devname": "tapf3e79e1b-22", "qbh_params": null, "meta": {}, "address":
"fa:16:3e:7c:42:bb", "type": "ovs", "id": "*
f3e79e1b-2236-4189-8516-fb18dc7e58a9*", "qbg_params": null}] |
8420f945-2d88-4204-8444-9c078491def0 |       0 |

In quantum-server.log:
2013-08-08 11:30:14    DEBUG [quantum.openstack.common.rpc.amqp] received
{u'_context_roles': [u'admin'], u'_context_read_deleted': u'no',
u'_context_tenant_id': None, u'args': {u'network_id':
u'dbc59888-e2be-4b31-b579-0a4575159bb1', u'lease_remaining': 0, u'host':
u'thor-quantum-01.localdomain', u'ip_address': u'10.2.1.41'},
u'_unique_id': u'49f419ee040d4d77822ecf696533e484', u'_context_is_admin':
True, u'version': u'1.0', u'_context_project_id': None,
u'_context_timestamp': u'2013-08-08 04:26:09.092921', u'_context_user_id':
None, u'method': u'update_lease_expiration'}
2013-08-08 11:30:16    DEBUG [quantum.db.dhcp_rpc_base] Updating lease
expiration for 10.2.1.41 on network dbc59888-e2be-4b31-b579-0a4575159bb1
from thor-quantum-01.localdomain.
2013-08-08 11:33:53    DEBUG [quantum.db.db_base_plugin_v2] Recycle
10.2.1.41
2013-08-08 11:33:53    DEBUG [quantum.db.db_base_plugin_v2] Recycle:
updated last 10.2.1.39-10.2.1.41
2013-08-08 11:33:53    DEBUG [quantum.db.db_base_plugin_v2] Delete
allocated IP 10.2.1.41
(dbc59888-e2be-4b31-b579-0a4575159bb1/4d238201-a8d5-4175-a9b4-c1d13efb5e2e)
2013-08-08 11:33:53    DEBUG [quantum.db.db_base_plugin_v2] Recycle: last
match for 10.2.1.39-10.2.1.41
2013-08-08 11:35:29    DEBUG [quantum.db.db_base_plugin_v2] Allocated IP -
10.2.1.41 from 10.2.1.41 to 10.2.1.42
2013-08-08 11:35:29    DEBUG [quantum.db.db_base_plugin_v2] Allocated IP
10.2.1.41
(dbc59888-e2be-4b31-b579-0a4575159bb1/4d238201-a8d5-4175-a9b4-c1d13efb5e2e/f3e79e1b-2236-4189-8516-fb18dc7e58a9)
(seem normal?)

And when i deleted all instances, some entries still exists in Dnsmasq's
host file --> can't ping on next launching.
Maybe I need to increase report_interval more, because I still see the
message "WARNING [quantum.openstack.common.loopingcall] task run outlasted
interval by X seconds" on high stressed test.

But the question is, how much is enough?
Could i fix this bug thoroughly? (apply patch? but need to rename
Quantum<->Neutron first)

Thank you very much!


On Wed, Aug 7, 2013 at 9:46 PM, Édouard Thuleau <thuleau at gmail.com> wrote:

> I think we have found (Sylvain and me) a problem that can explain this
> trouble:
>
> When the load is too heavy (update dnsmasq host file and send lease
> update) on DHCP agent, the report state to Neutron server is delayed and
> the Neutron sever considers that agent is down and doesn't sent the port
> creation to the agent. So the dnsmasq host file isn't updated to serve that
> IP port's.
>
> Do you have this log in agent log file :
> 2013-08-07 13:21:46  WARNING [quantum.openstack.common.loopingcall] task
> run outlasted interval by 2.375859 sec
>
> You can increase the 'report_interval' flag on the agent and the
> 'agent_down_time' flag on the Neutron server side.
> This problem should be corrected with this bp:
> https://blueprints.launchpad.net/neutron/+spec/remove-dhcp-lease
> Meanwhile, I think we should add log warning in the neutron server code to
> prevent that it cannot notify any DHCP agent for a port creation. And
> backport that on the Grizzly release.
>
> What do you think ?
>
> I had this comment on the bug
> https://bugs.launchpad.net/neutron/+bug/1185916
>
> Édouard.
>
>
> On Fri, Aug 2, 2013 at 11:45 AM, Chu Duc Minh <chu.ducminh at gmail.com>wrote:
>
>> After i deleted 2 instances: 10.2.1.10 & 10.2.1.12
>> The Dnsmasq's hosts file is:
>> fa:16:3e:01:d1:70,10-2-1-1.openstacklocal,10.2.1.1
>> fa:16:3e:71:6a:4e,10-2-1-11.openstacklocal,10.2.1.11
>> *fa:16:3e:cf:0f:c1,10-2-1-12.openstacklocal,10.2.1.12* *<-- still exist,
>> problem?!*
>>
>> fa:16:3e:35:a1:72,10-2-1-9.openstacklocal,10.2.1.9
>>
>>
>> BR,
>>
>>
>> On Fri, Aug 2, 2013 at 4:27 PM, Chu Duc Minh <chu.ducminh at gmail.com>wrote:
>>
>>> Hi, i have the same problem when create -> terminate -> create instances.
>>> This problem only occur when the new instances have the same IP as
>>> deleted instances.
>>>
>>> I check the dnsmasq's host file
>>> /var/lib/quantum/dhcp/dbc59888-e2be-4b31-b579-0a4575159bb1/host,
>>> sometimes it's not update.
>>>
>>> I think this problem maybe not only related to Dnsmasq, it may related
>>> to firewall rules (generated by Quantum) on compute-node too. Because i see
>>> some dropped DHCP packet:
>>> Aug  2 14:08:11 thor-compute-03 kernel: [95971.005423] IN=qbr23c67719-14
>>> OUT=qbr23c67719-14 PHYSIN=qvb23c67719-14 PHYSOUT=tap23c67719-
>>> 14 MAC=ff:ff:ff:ff:ff:ff:fa:16:3e:34:72:05:08:00 SRC=0.0.0.0
>>> DST=255.255.255.255 LEN=328 TOS=0x10 PREC=0x00 TTL=128 ID=0 *PROTO=UDP
>>> SPT=68 DPT=67* LEN=308
>>> (DHCP Discovery packet?)
>>> It dropped in chain quantum-openvswi-sg-fallback, then instance can't
>>> get IP. Although in Dashboard i see instance got IP.
>>>
>>> I tried many times, and got a strange case: duplicate IP in Dnsmasq's
>>> host file:
>>> fa:16:3e:01:d1:70,10-2-1-1.openstacklocal,10.2.1.1
>>> fa:16:3e:71:6a:4e,10-2-1-11.openstacklocal,10.2.1.11
>>> *fa:16:3e:78:b5:2f,10-2-1-10.openstacklocal,10.2.1.10*
>>> fa:16:3e:35:a1:72,10-2-1-9.openstacklocal,10.2.1.9
>>> fa:16:3e:cf:0f:c1,10-2-1-12.openstacklocal,10.2.1.12
>>> *fa:16:3e:c7:ea:0c,10-2-1-10.openstacklocal,10.2.1.10*
>>>
>>> My newest instance is *10.2.1.10*, and I can't ping it. In boot log of
>>> this instance, i found:
>>>
>>> cloudinitnonet waiting 120 seconds for a network device.
>>> cloudinitnonet gave up waiting for a network device.
>>> ciinfo: lo    : 1 127.0.0.1       255.0.0.0       .
>>> ciinfo: eth0  : 1 .               .               fa:16:3e:c7:ea:0c
>>> route_info failed
>>>
>>> Restart instance didn't make it work, but restart quantum-dhcp-agent on
>>> Quantum-node make it work.
>>> After restart, content of Dnsmasq's host file is:
>>> fa:16:3e:01:d1:70,10-2-1-1.openstacklocal,10.2.1.1
>>> fa:16:3e:71:6a:4e,10-2-1-11.openstacklocal,10.2.1.11
>>> fa:16:3e:cf:0f:c1,10-2-1-12.openstacklocal,10.2.1.12
>>> fa:16:3e:35:a1:72,10-2-1-9.openstacklocal,10.2.1.9
>>> *fa:16:3e:c7:ea:0c,10-2-1-10.openstacklocal,10.2.1.10*
>>>
>>> I think it a serious problem, hope someone could fix it soon.. :)
>>>
>>> Best Regards,
>>>
>>>
>>> On Tue, Jul 2, 2013 at 8:01 PM, James Page <james.page at ubuntu.com>wrote:
>>>
>>>> On 20/05/13 07:51, Heinonen, Johanna (NSN - FI/Espoo) wrote:
>>>>
>>>>> Hi,
>>>>> I have installed grizzly with quantum and ovs-plugin. It seems that
>>>>> grizzly allocates the third address of each subnet for dhcp. (In folsom
>>>>> it was the second address). This means that the VMs will get addresses
>>>>>
>>>>
>>>> This sound alot like https://bugs.launchpad.net/**
>>>> ubuntu/+source/quantum/+bug/**1189909<https://bugs.launchpad.net/ubuntu/+source/quantum/+bug/1189909>;
>>>> I'll raise a task for dnsmasq as well.
>>>>
>>>> Cheers
>>>>
>>>> James
>>>>
>>>> --
>>>> James Page
>>>> Ubuntu Core Developer
>>>> Debian Maintainer
>>>> james.page at ubuntu.com
>>>>
>>>>
>>>> ______________________________**_________________
>>>> Mailing list: https://launchpad.net/~**openstack<https://launchpad.net/~openstack>
>>>> Post to     : openstack at lists.launchpad.net
>>>> Unsubscribe : https://launchpad.net/~**openstack<https://launchpad.net/~openstack>
>>>> More help   : https://help.launchpad.net/**ListHelp<https://help.launchpad.net/ListHelp>
>>>>
>>>
>>>
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20130808/106286e7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Grizzly_DHCP_error1.png
Type: image/png
Size: 50922 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20130808/106286e7/attachment.png>


More information about the Openstack mailing list