[Openstack] DHCP problem in grizzly

Édouard Thuleau thuleau at gmail.com
Fri Aug 9 06:30:32 UTC 2013


I cannot reproduce this problem.
Do you changed the DHCP lease time ?

Édouard.


On Thu, Aug 8, 2013 at 7:52 AM, Chu Duc Minh <chu.ducminh at gmail.com> wrote:

> Do you have this log in agent log file :
>> 2013-08-07 13:21:46  WARNING [quantum.openstack.common.loopingcall] task
>> run outlasted interval by 2.375859 sec
>>
> Yes, i have:
> "WARNING [quantum.openstack.common.loopingcall] task run outlasted
> interval by 4.738189 sec"
>
> I set report_interval = 15 and agent_down_time = 30, then launch 50
> instances simultaneously.
> Now, every instances is ok, I can ping them all. But in Dashboard, I still
> see a bug, some instanes have 2 IP addresses (screenshot attached) - and
> ofcourse, with each instance I can only ping 1 IP address.
>
> For example, with first instance in attached image, I check file Dnsmasq's
> host, see 2 entries:
> *fa:16:3e:7c:42:bb*,10-2-1-41.openstacklocal,10.2.1.41
> fa:16:3e:0c:bc:4b,10-2-1-52.openstacklocal,10.2.1.52
> Only can ping 10.2.1.52
>
> I check Quantum DB, i saw that *fa:16:3e:7c:42:bb* still exist in 'ports'
> table.
> ('6260622a6b324557bc9064698c8c03ed','*f3e79e1b-2236-4189-8516-fb18dc7e58a9
> *','','dbc59888-e2be-4b31-b579-0a4575159bb1',*'fa:16:3e:7c:42:bb*
> ',1,'DOWN','8420f945-2d88-4204-8444-9c078491def0','compute:None')
> Same result with quantum port-list:
> | *f3e79e1b-2236-4189-8516-fb18dc7e58a9* |      | fa:16:3e:7c:42:bb |
> {"subnet_id": "4d238201-a8d5-4175-a9b4-c1d13efb5e2e", "ip_address":
> "10.2.1.41"}     |
>
>
> Then, I check Nova DB, and found a record in *instance_info_caches* table:
> | 2013-08-08 04:34:38 | 2013-08-08 04:38:19 | NULL       | 2730 |
> [{"ovs_interfaceid": "43802d05-ee1f-401d-9d61-055444da8df4", "network":
> {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 4,
> "type": "fixed", "floating_ips": [], "address": "10.2.1.52"}], "version":
> 4, "meta": {"dhcp_server": "10.2.1.9"}, "dns": [], "routes": [], "cidr": "
> 10.2.1.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway",
> "address": "10.2.1.1"}}], "meta": {"injected": false, "tenant_id":
> "6260622a6b324557bc9064698c8c03ed"}, "id":
> "dbc59888-e2be-4b31-b579-0a4575159bb1", "label": "net_minhcd_proj1"},
> "devname": "tap43802d05-ee", "qbh_params": null, "meta": {}, "address":
> "fa:16:3e:0c:bc:4b", "type": "ovs", "id":
> "43802d05-ee1f-401d-9d61-055444da8df4", "qbg_params": null},
> {"ovs_interfaceid": "*f3e79e1b-2236-4189-8516-fb18dc7e58a9*", "network":
> {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 4,
> "type": "fixed", "floating_ips": [], "address": "10.2.1.41"}], "version":
> 4, "meta": {"dhcp_server": "10.2.1.9"}, "dns": [], "routes": [], "cidr": "
> 10.2.1.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway",
> "address": "10.2.1.1"}}], "meta": {"injected": false, "tenant_id":
> "6260622a6b324557bc9064698c8c03ed"}, "id":
> "dbc59888-e2be-4b31-b579-0a4575159bb1", "label": "net_minhcd_proj1"},
> "devname": "tapf3e79e1b-22", "qbh_params": null, "meta": {}, "address":
> "fa:16:3e:7c:42:bb", "type": "ovs", "id": "*
> f3e79e1b-2236-4189-8516-fb18dc7e58a9*", "qbg_params": null}] |
> 8420f945-2d88-4204-8444-9c078491def0 |       0 |
>
> In quantum-server.log:
> 2013-08-08 11:30:14    DEBUG [quantum.openstack.common.rpc.amqp] received
> {u'_context_roles': [u'admin'], u'_context_read_deleted': u'no',
> u'_context_tenant_id': None, u'args': {u'network_id':
> u'dbc59888-e2be-4b31-b579-0a4575159bb1', u'lease_remaining': 0, u'host':
> u'thor-quantum-01.localdomain', u'ip_address': u'10.2.1.41'},
> u'_unique_id': u'49f419ee040d4d77822ecf696533e484', u'_context_is_admin':
> True, u'version': u'1.0', u'_context_project_id': None,
> u'_context_timestamp': u'2013-08-08 04:26:09.092921', u'_context_user_id':
> None, u'method': u'update_lease_expiration'}
> 2013-08-08 11:30:16    DEBUG [quantum.db.dhcp_rpc_base] Updating lease
> expiration for 10.2.1.41 on network dbc59888-e2be-4b31-b579-0a4575159bb1
> from thor-quantum-01.localdomain.
> 2013-08-08 11:33:53    DEBUG [quantum.db.db_base_plugin_v2] Recycle
> 10.2.1.41
> 2013-08-08 11:33:53    DEBUG [quantum.db.db_base_plugin_v2] Recycle:
> updated last 10.2.1.39-10.2.1.41
> 2013-08-08 11:33:53    DEBUG [quantum.db.db_base_plugin_v2] Delete
> allocated IP 10.2.1.41
> (dbc59888-e2be-4b31-b579-0a4575159bb1/4d238201-a8d5-4175-a9b4-c1d13efb5e2e)
> 2013-08-08 11:33:53    DEBUG [quantum.db.db_base_plugin_v2] Recycle: last
> match for 10.2.1.39-10.2.1.41
> 2013-08-08 11:35:29    DEBUG [quantum.db.db_base_plugin_v2] Allocated IP -
> 10.2.1.41 from 10.2.1.41 to 10.2.1.42
> 2013-08-08 11:35:29    DEBUG [quantum.db.db_base_plugin_v2] Allocated IP
> 10.2.1.41
> (dbc59888-e2be-4b31-b579-0a4575159bb1/4d238201-a8d5-4175-a9b4-c1d13efb5e2e/f3e79e1b-2236-4189-8516-fb18dc7e58a9)
> (seem normal?)
>
> And when i deleted all instances, some entries still exists in Dnsmasq's
> host file --> can't ping on next launching.
> Maybe I need to increase report_interval more, because I still see the
> message "WARNING [quantum.openstack.common.loopingcall] task run outlasted
> interval by X seconds" on high stressed test.
>
> But the question is, how much is enough?
> Could i fix this bug thoroughly? (apply patch? but need to rename
> Quantum<->Neutron first)
>
> Thank you very much!
>
>
> On Wed, Aug 7, 2013 at 9:46 PM, Édouard Thuleau <thuleau at gmail.com> wrote:
>
>> I think we have found (Sylvain and me) a problem that can explain this
>> trouble:
>>
>> When the load is too heavy (update dnsmasq host file and send lease
>> update) on DHCP agent, the report state to Neutron server is delayed and
>> the Neutron sever considers that agent is down and doesn't sent the port
>> creation to the agent. So the dnsmasq host file isn't updated to serve that
>> IP port's.
>>
>> Do you have this log in agent log file :
>> 2013-08-07 13:21:46  WARNING [quantum.openstack.common.loopingcall] task
>> run outlasted interval by 2.375859 sec
>>
>> You can increase the 'report_interval' flag on the agent and the
>> 'agent_down_time' flag on the Neutron server side.
>> This problem should be corrected with this bp:
>> https://blueprints.launchpad.net/neutron/+spec/remove-dhcp-lease
>> Meanwhile, I think we should add log warning in the neutron server code
>> to prevent that it cannot notify any DHCP agent for a port creation. And
>> backport that on the Grizzly release.
>>
>> What do you think ?
>>
>> I had this comment on the bug
>> https://bugs.launchpad.net/neutron/+bug/1185916
>>
>> Édouard.
>>
>>
>> On Fri, Aug 2, 2013 at 11:45 AM, Chu Duc Minh <chu.ducminh at gmail.com>wrote:
>>
>>> After i deleted 2 instances: 10.2.1.10 & 10.2.1.12
>>> The Dnsmasq's hosts file is:
>>> fa:16:3e:01:d1:70,10-2-1-1.openstacklocal,10.2.1.1
>>> fa:16:3e:71:6a:4e,10-2-1-11.openstacklocal,10.2.1.11
>>> *fa:16:3e:cf:0f:c1,10-2-1-12.openstacklocal,10.2.1.12* *<-- still
>>> exist, problem?!*
>>>
>>> fa:16:3e:35:a1:72,10-2-1-9.openstacklocal,10.2.1.9
>>>
>>>
>>> BR,
>>>
>>>
>>> On Fri, Aug 2, 2013 at 4:27 PM, Chu Duc Minh <chu.ducminh at gmail.com>wrote:
>>>
>>>> Hi, i have the same problem when create -> terminate -> create
>>>> instances.
>>>> This problem only occur when the new instances have the same IP as
>>>> deleted instances.
>>>>
>>>> I check the dnsmasq's host file
>>>> /var/lib/quantum/dhcp/dbc59888-e2be-4b31-b579-0a4575159bb1/host,
>>>> sometimes it's not update.
>>>>
>>>> I think this problem maybe not only related to Dnsmasq, it may related
>>>> to firewall rules (generated by Quantum) on compute-node too. Because i see
>>>> some dropped DHCP packet:
>>>> Aug  2 14:08:11 thor-compute-03 kernel: [95971.005423]
>>>> IN=qbr23c67719-14 OUT=qbr23c67719-14 PHYSIN=qvb23c67719-14
>>>> PHYSOUT=tap23c67719-
>>>> 14 MAC=ff:ff:ff:ff:ff:ff:fa:16:3e:34:72:05:08:00 SRC=0.0.0.0
>>>> DST=255.255.255.255 LEN=328 TOS=0x10 PREC=0x00 TTL=128 ID=0 *PROTO=UDP
>>>> SPT=68 DPT=67* LEN=308
>>>> (DHCP Discovery packet?)
>>>> It dropped in chain quantum-openvswi-sg-fallback, then instance can't
>>>> get IP. Although in Dashboard i see instance got IP.
>>>>
>>>> I tried many times, and got a strange case: duplicate IP in Dnsmasq's
>>>> host file:
>>>> fa:16:3e:01:d1:70,10-2-1-1.openstacklocal,10.2.1.1
>>>> fa:16:3e:71:6a:4e,10-2-1-11.openstacklocal,10.2.1.11
>>>> *fa:16:3e:78:b5:2f,10-2-1-10.openstacklocal,10.2.1.10*
>>>> fa:16:3e:35:a1:72,10-2-1-9.openstacklocal,10.2.1.9
>>>> fa:16:3e:cf:0f:c1,10-2-1-12.openstacklocal,10.2.1.12
>>>> *fa:16:3e:c7:ea:0c,10-2-1-10.openstacklocal,10.2.1.10*
>>>>
>>>> My newest instance is *10.2.1.10*, and I can't ping it. In boot log of
>>>> this instance, i found:
>>>>
>>>> cloudinitnonet waiting 120 seconds for a network device.
>>>> cloudinitnonet gave up waiting for a network device.
>>>> ciinfo: lo    : 1 127.0.0.1       255.0.0.0       .
>>>> ciinfo: eth0  : 1 .               .               fa:16:3e:c7:ea:0c
>>>> route_info failed
>>>>
>>>> Restart instance didn't make it work, but restart quantum-dhcp-agent on
>>>> Quantum-node make it work.
>>>> After restart, content of Dnsmasq's host file is:
>>>> fa:16:3e:01:d1:70,10-2-1-1.openstacklocal,10.2.1.1
>>>> fa:16:3e:71:6a:4e,10-2-1-11.openstacklocal,10.2.1.11
>>>> fa:16:3e:cf:0f:c1,10-2-1-12.openstacklocal,10.2.1.12
>>>> fa:16:3e:35:a1:72,10-2-1-9.openstacklocal,10.2.1.9
>>>> *fa:16:3e:c7:ea:0c,10-2-1-10.openstacklocal,10.2.1.10*
>>>>
>>>> I think it a serious problem, hope someone could fix it soon.. :)
>>>>
>>>> Best Regards,
>>>>
>>>>
>>>> On Tue, Jul 2, 2013 at 8:01 PM, James Page <james.page at ubuntu.com>wrote:
>>>>
>>>>> On 20/05/13 07:51, Heinonen, Johanna (NSN - FI/Espoo) wrote:
>>>>>
>>>>>> Hi,
>>>>>> I have installed grizzly with quantum and ovs-plugin. It seems that
>>>>>> grizzly allocates the third address of each subnet for dhcp. (In
>>>>>> folsom
>>>>>> it was the second address). This means that the VMs will get addresses
>>>>>>
>>>>>
>>>>> This sound alot like https://bugs.launchpad.net/**
>>>>> ubuntu/+source/quantum/+bug/**1189909<https://bugs.launchpad.net/ubuntu/+source/quantum/+bug/1189909>;
>>>>> I'll raise a task for dnsmasq as well.
>>>>>
>>>>> Cheers
>>>>>
>>>>> James
>>>>>
>>>>> --
>>>>> James Page
>>>>> Ubuntu Core Developer
>>>>> Debian Maintainer
>>>>> james.page at ubuntu.com
>>>>>
>>>>>
>>>>> ______________________________**_________________
>>>>> Mailing list: https://launchpad.net/~**openstack<https://launchpad.net/~openstack>
>>>>> Post to     : openstack at lists.launchpad.net
>>>>> Unsubscribe : https://launchpad.net/~**openstack<https://launchpad.net/~openstack>
>>>>> More help   : https://help.launchpad.net/**ListHelp<https://help.launchpad.net/ListHelp>
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Mailing list:
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>> Post to     : openstack at lists.openstack.org
>>> Unsubscribe :
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20130809/93b8dd7c/attachment.html>


More information about the Openstack mailing list