[Openstack] pacemaker would be wrong when both node have same hostname
Marica Antonacci
marica.antonacci at gmail.com
Wed May 14 11:49:11 UTC 2014
Hi,
in attachment you can find our modified resource agent…we have noticed that the network namespaces (router and dhcp) are automatically re-created on the new node when the resource manager migrates the network controller on the other physical node (we have grouped all the services related to the network node).
Please, note that the attached script contains also other patches wrt to the RA available at https://raw.githubusercontent.com/madkiss/openstack-resource-agents/master/ocf/neutron-agent-l3 because we found some issues with the resource agent parameters and the port used to check the established connection with the server; moreover we have added the start/stop operations for the neutron-plugin-openvswitch-agent since there is no available RA at the moment for this service.
Cheers,
Marica
---------------------------
dhcp117:~ marica$ diff -w -b neutron-agent-l3 neutron-agent-l3.1
20c20
< # OCF_RESKEY_agent_config
---
> # OCF_RESKEY_plugin_config
37c37
< OCF_RESKEY_agent_config_default="/etc/neutron/l3_agent.ini"
---
> OCF_RESKEY_plugin_config_default="/etc/neutron/l3_agent.ini"
40c40
< OCF_RESKEY_neutron_server_port_default="5672"
---
> OCF_RESKEY_neutron_server_port_default="9696"
44c44
< : ${OCF_RESKEY_agent_config=${OCF_RESKEY_agent_config_default}}
---
> : ${OCF_RESKEY_plugin_config=${OCF_RESKEY_plugin_config_default}}
98c98
< <parameter name="agent_config" unique="0" required="0">
---
> <parameter name="plugin config" unique="0" required="0">
103c103
< <content type="string" default="${OCF_RESKEY_agent_config_default}" />
---
> <content type="string" default="${OCF_RESKEY_plugin_config_default}" />
241,247d240
< # Aleita
< # change hostname
< hostname network-controller
<
< #Marant: temporary patch - restart neutron-plugin-openvswitch-agent
< service neutron-plugin-openvswitch-agent start
<
251c244
< --config-file=$OCF_RESKEY_agent_config --log-file=/var/log/neutron/l3-agent.log $OCF_RESKEY_additional_parameters"' >> \
---
> --config-file=$OCF_RESKEY_plugin_config --log-file=/var/log/neutron/l3-agent.log $OCF_RESKEY_additional_parameters"' >> \
320,325d312
< # Aleita
< # restore old hostname
< hostname node1
<
< #Marant:
< service neutron-plugin-openvswitch-agent stop
Il giorno 14/mag/2014, alle ore 13:04, walterxj <walterxj at gmail.com> ha scritto:
> Hi Marica:
> Can you give me your modified RA to me?And is that with your RA you configure the pacemaker just like the guide?I mean I notice that when the former network node down,the l3-agent of the back node must bind the router which belong to the former node.
>
> walterxj
>
> From: Marica Antonacci
> Date: 2014-05-14 18:55
> To: walterxj
> CC: openstack
> Subject: Re: [Openstack] pacemaker would be wrong when both node have same hostname
> Hi all,
>
> we are currently using pacemaker to manage 2 network nodes (node1, node2) and we have modified the neutron L3 agent RA in order to dynamically change the hostname of the active network node: start() function sets the hostname “network-controller" to be used by the scheduler; the stop() function restores the old hostname (“node1” or “node2”). It seems to work, yet it’s a rude patch :) A more general solution that exploits neutron functionalities would be very appreciated!
>
> Best,
> Marica
>
> Il giorno 14/mag/2014, alle ore 12:34, walterxj <walterxj at gmail.com> ha scritto:
>
>> hi:
>> the high-availability-guide (http://docs.openstack.org/high-availability-guide/content/ch-network.html) says that Both nodes should have the same hostname since the Networking scheduler will be aware of one node, for example a virtual router attached to a single L3 node.
>>
>> But when I test it on two servers with same hostname,after installing corosync and pacemaker service on them(with no resource configured),the crm_mon output goes into endless loop.And in the log of corosync,there are so many messages like:May 09 22:25:40 [2149] TEST crmd: warning: crm_get_peer: Node 'TEST' and 'TEST' share the same cluster nodeid: 1678901258.After this I set diffrent nodeid in /etc/corosync/corosync.conf of each test node,but it didn't help.
>> So,I set diffrent hostname for each server,and then configure pacemaker just like the manual except the hostname,the neutron-dhcp-agent and neutron-metadata-agent works well,but neutron-l3-agent not(VM instance can't not access the external net,further more the gateway of the VM instance can't be accessed either).
>> After two days checking,finally I found that we can use "netron l3-agent-router-remove network1_l3_agentid external-routeid" and "netron l3-agent-router-add network2_l3_agentid external-routeid" to let the backup l3-agent to work when the former network node is down.(assume the two node's names are network1 and network2),alternatively,we can update the mysql table routerl3agentbindings in neutron base directly.If it make sense,I think we can change the scrip neutron-agent-l3 , in it's neutron_l3_agent_start() function,only need few lines to make it work well.
>>
>> Walter Xu
>> _______________________________________________
>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openstack at lists.openstack.org
>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20140514/d690e905/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: neutron-agent-l3
Type: application/octet-stream
Size: 12038 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20140514/d690e905/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20140514/d690e905/attachment-0001.html>
More information about the Openstack
mailing list