[Openstack] 答复: 答复: Ceilometer high availability in active-active

Pan, Fengyun panfy.fnst at cn.fujitsu.com
Wed Mar 18 03:16:07 UTC 2015


Thank you !

I have set the backend_url of compute node and controller node as follows: 
	backend_url=redis://193.168.196.246:6379
the ip of my compute node is "193.168.196.246".
And compute node have installed redis.
	# rpm -qa | grep redis
	redis-2.8.15-2.el7ost.x86_64
	python-redis-2.10.3-1.el7ost.noarch
so running ceilometer-agent-central service on compute node , it can connect to redis service successfully.
But when running ceilometer-agent-central service on controller onde, we will get the log as follows:
______________________________
2015-03-18 18:48:05.948 16236 ERROR ceilometer.coordination [-] Error connecting to coordination backend.
2015-03-18 18:48:05.948 16236 TRACE ceilometer.coordination Traceback (most recent call last):
2015-03-18 18:48:05.948 16236 TRACE ceilometer.coordination   File "/usr/lib/python2.7/site-packages/ceilometer/coordination.py", line 70, in start
2015-03-18 18:48:05.948 16236 TRACE ceilometer.coordination     self._coordinator.start()
2015-03-18 18:48:05.948 16236 TRACE ceilometer.coordination   File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 182, in start
2015-03-18 18:48:05.948 16236 TRACE ceilometer.coordination     self._start()
2015-03-18 18:48:05.948 16236 TRACE ceilometer.coordination   File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 354, in _start
2015-03-18 18:48:05.948 16236 TRACE ceilometer.coordination     self._server_info = self._client.info()
2015-03-18 18:48:05.948 16236 TRACE ceilometer.coordination   File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
2015-03-18 18:48:05.948 16236 TRACE ceilometer.coordination     self.gen.throw(type, value, traceback)
2015-03-18 18:48:05.948 16236 TRACE ceilometer.coordination   File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 78, in _translate_failures
2015-03-18 18:48:05.948 16236 TRACE ceilometer.coordination     raise coordination.ToozConnectionError(utils.exception_message(e))
2015-03-18 18:48:05.948 16236 TRACE ceilometer.coordination ToozConnectionError: Error 113 connecting to 193.168.196.246:6379. EHOSTUNREACH.
2015-03-18 18:48:05.948 16236 TRACE ceilometer.coordination
2015-03-18 18:48:36.953 16236 ERROR ceilometer.coordination [-] Error sending a heartbeat to coordination backend.
2015-03-18 18:48:36.953 16236 TRACE ceilometer.coordination Traceback (most recent call last):
2015-03-18 18:48:36.953 16236 TRACE ceilometer.coordination   File "/usr/lib/python2.7/site-packages/ceilometer/coordination.py", line 86, in heartbeat
2015-03-18 18:48:36.953 16236 TRACE ceilometer.coordination     self._coordinator.heartbeat()
2015-03-18 18:48:36.953 16236 TRACE ceilometer.coordination   File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 408, in heartbeat
2015-03-18 18:48:36.953 16236 TRACE ceilometer.coordination     value=b"Not dead!")
2015-03-18 18:48:36.953 16236 TRACE ceilometer.coordination   File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
2015-03-18 18:48:36.953 16236 TRACE ceilometer.coordination     self.gen.throw(type, value, traceback)
2015-03-18 18:48:36.953 16236 TRACE ceilometer.coordination   File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 78, in _translate_failures
2015-03-18 18:48:36.953 16236 TRACE ceilometer.coordination     raise coordination.ToozConnectionError(utils.exception_message(e))
2015-03-18 18:48:36.953 16236 TRACE ceilometer.coordination ToozConnectionError: Error connecting to 193.168.196.246:6379. timed out.
2015-03-18 18:48:36.953 16236 TRACE ceilometer.coordination
_____________________________
Why is it timed out ?
Is there some trouble in my configuration? 

-----邮件原件-----
发件人: Chris Dent [mailto:chdent at redhat.com] 
发送时间: 2015年3月11日 21:08
收件人: Pan, Fengyun/潘 风云
抄送: Vijaya Bhaskar; openstack
主题: Re: 答复: [Openstack] Ceilometer high availability in active-active

On Wed, 11 Mar 2015, Pan, Fengyun wrote:

> We kown that:
> backend_url',
>               default=None,
>               help='The backend URL to use for distributed coordination. If '
>                    'left empty, per-deployment central agent and per-host '
>                    'compute agent won\'t do workload '
>                    'partitioning and will only function correctly if a '
>                    'single instance of that service is running.'), But 
> how to set the ‘backend_url’?

This appears to be an oversight in the documentation. The main starting point is here:

    http://docs.openstack.org/admin-guide-cloud/content/section_telemetry-cetral-compute-agent-ha.html

but nothing there nor what it links to actually says what should go as the value of the setting. It's entirely dependent on the backend being used and how that backend is being configured. Each of the tooz drivers has some information on some of the options, but again, it is not fully documented yet.

For reference, what I use in my own testing is redis as follows:

    redis://localhost:6379

This uses a single redis server, so introduces another single point of failure. It's possible to use sentinel to improve upon this situation:

    http://docs.openstack.org/developer/tooz/developers.html#redis

The other drivers work in similar ways with their own unique arguments.

I'm sorry I'm not able to point to more complete information but I can say that it is in the process of being improved.

--
Chris Dent tw:@anticdent freenode:cdent
https://tank.peermore.com/tanks/cdent


More information about the Openstack mailing list