[Openstack] [senlin] Unable to get senlin health monitoring to work
Peter White
Peter.White at metaswitch.com
Wed Jul 20 11:34:27 UTC 2016
I have been unable to get senlin health policies to work at all, and I'm confused about what I might be doing wrong, though have some idea that it is related to credentials. Any help would be much appreciated.
What I have tried is the following (using a heat template, though same error making senlin commands directly from the command line).
* Create a cluster with a single node in it (initial size), and a health policy attached to it using NODE_STATUS_POLLING.
* Verify that the cluster exists, that a node has been created with a VM, and that the health policy exists and has been linked to the cluster.
* Nuke the VM (nova delete) to try and trigger healing.
I would expect that senlin health policy would detect that the VM has gone, and do healing. However, that does not happen. If I do "senlin node-check" then the node state changes to ERROR and the cluster state changes to WARNING (so it can tell that the cluster is in a bad way). However, the health policy does not do as I would expect (replacing the Senlin node).
I'm seeing some odd log extracts that make me think that the issue is that the health policy does not have access to the right credentials in order to issue the polling requests. I have found http://docs.openstack.org/developer/senlin/developer/authorization.html but cannot see quite how it relates.
I'm using stable/mitaka and devstack on a single ubuntu server, heat template below, and also the extract from the logs.
Can anybody suggest what I might be doing wrong or point me at some documentation that explains how healing / authentication in Senlin should / does work?
Thanks, Peter White
Heat template
heat_template_version: 2016-04-08
description: Simple template to test healing
resources:
profile:
type: OS::Senlin::Profile
properties:
type: os.nova.server-1.0
properties:
image: cirros-0.3.4-x86_64-uec
flavor: m1.tiny
cluster1:
type: OS::Senlin::Cluster
properties:
name: cluster1
profile: {get_resource: profile}
desired_capacity: 1
min_size: 1
heal_policy:
type: OS::Senlin::Policy
properties:
type: senlin.policy.health-1.0
bindings:
- cluster: {get_resource: cluster1}
properties:
detection:
type: NODE_STATUS_POLLING
options:
interval: 60
recovery:
actions:
- RECREATE
#fencing: # Not sure what this does, but didn't seem to make any difference.
# - COMPUTE
Senlin log extract
2016-07-20 11:20:04.379 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 3b7f3c9c28074c7eb14af8e50ba10a42 from (pid=21537) __call__ /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:302
2016-07-20 11:20:04.379 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'senlin.engine.health_manager.HealthManager._poll_cluster' failed
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall Traceback (most recent call last):
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_service/loopingcall.py", line 136, in _run_loop
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall result = func(*self.args, **self.kw)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/engine/health_manager.py", line 110, in _poll_cluster
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall self.rpc_client.cluster_check(self.ctx, cluster_id)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/rpc/client.py", line 217, in cluster_check
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall params=params))
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/rpc/client.py", line 50, in call
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return client.call(ctxt, method, **kwargs)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 413, in call
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return self.prepare().call(ctxt, method, **kwargs)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall retry=self.retry)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall timeout=timeout, retry=retry)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall retry=retry)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 461, in _send
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall raise result
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall ValueError: Field `user' cannot be None
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall Traceback (most recent call last):
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall incoming.message))
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 185, in _dispatch
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return self._do_dispatch(endpoint, method, ctxt, args)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall result = func(ctxt, **new_args)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/engine/service.py", line 68, in wrapped
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return func(self, ctx, *args, **kwargs)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/engine/service.py", line 1328, in cluster_check
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall consts.CLUSTER_CHECK, **params)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/engine/actions/base.py", line 282, in create
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return obj.store(context)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/engine/actions/base.py", line 187, in store
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall action = ao.Action.create(context, values)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/objects/action.py", line 52, in create
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return cls._from_db_object(context, cls(context), obj)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/objects/base.py", line 43, in _from_db_object
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall obj[field] = db_obj[field]
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 727, in __setitem__
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall setattr(self, name, value)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 72, in setter
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall field_value = field.coerce(self, name, value)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 190, in coerce
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return self._null(obj, attr)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 168, in _null
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall raise ValueError(_("Field `%s' cannot be None") % attr)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall ValueError: Field `user' cannot be None
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.462 INFO senlin.engine.event [req-6ce9b961-acdf-4523-a04b-ef98d7752f85 None None] cluster1 [6f720478] CLUSTER_ATTACH_POLICY - SUCCEEDED: Policy attached.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20160720/153365d3/attachment.html>
More information about the Openstack
mailing list