[Openstack] [Ceilometer][Architecture] Transformers in Kilo vs Liberty(and Mitaka)

Nadya Shakhat nprivalova at mirantis.com
Tue Apr 12 13:20:01 UTC 2016


For those who want see the picture this one should be available:
https://docs.google.com/drawings/d/1OjnthkzKYnNLpIumYJYcs1ASFa_npbeyot8afo4LQgw/edit?usp=sharing

Thanks

On Tue, Apr 12, 2016 at 4:13 PM, Nadya Shakhat <nprivalova at mirantis.com>
wrote:

> Hello colleagues,
>
>     I'd like to discuss one question with you. Perhaps, you remember that
> in Liberty we decided to get rid of transformers on polling agents [1]. I'd
> like to describe several issues we are facing now because of this decision.
> 1. pipeline.yaml inconsistency.
>     Ceilometer pipeline consists from the two basic things: source and
> sink. In source, we describe how to get data, in sink - how to deal with
> the data. After the refactoring described in [1], on polling agents we
> apply only "source" definition, on notification agents we apply only "sink"
> one. It causes the problems described in the mailing thread [2]: the "pipe"
> concept is actually broken. To make it work more or less correctly, the
> user should care that from a polling agent he/she doesn't send duplicated
> samples. In the example below, we send "cpu" Sample twice each 600 seconds
> from a compute agents:
>
> sources:
> - name: meter_source
> interval: 600
> meters:
> - "*"
> sinks:
> - meter_sink
> - name: cpu_source
> interval: 60
> meters:
> - "cpu"
> sinks:
> - cpu_sink
> - cpu_delta_sink
>
> If we apply the same configuration on notification agent, each "cpu"
> Sample will be processed by all of the 3 sinks. Please refer to the mailing
> thread [2] for more details.
>     As I understood from the specification, the main reason for [1] is
> making the pollster code more readable. That's why I call this change a
> "refactoring". Please correct me if I miss anything here.
>
> 2. Coordination stuff.
>     TBH, coordination for notification agents is the most painful thing
> for me because of several reasons:
>
> a. Stateless service has became stateful. Here I'd like to note that tooz
> usage for central agents and alarm-evaluators may be called "optional". If
> you want to have these services scalable, it is recommended to use tooz,
> i.e. install Redis/Zookeeper. But you may have your puppets unchanged and
> everything continue to work with one service (central agent or
> alarm-evaluator) per cloud. If we are talking about notification agent,
> it's not the case. You must change the deployment: eighter rewrite the
> puppets for notification agent deployment (to have only one notification
> agent per cloud)  or make tooz installation with Redis/Zookeeper required.
> One more option: remove transformations completely - that's what we've done
> in our company's product by default.
>
> b. RabbitMQ high utilisation. As you know, tooz does only one part of
> coordination for a notification agent. In Ceilometer, we use IPC queues
> mechanism to be sure that samples with the one metric and from the one
> resource are processed by exactly the one notification agent (to make it
> possible to use a local cache). I'd like to remind you that without
> coordination (but with [1] applied) each compute agent polls each instances
> and send the result as one message to a notification agent. The
> notification agent processes all the samples and sends as many messages to
> a collector as many sinks it is defined (2-4, not many). If [1] if not
> applied, one "publishing" round is skipped. But with [1] and coordination
> (it's the most recommended deployment), amount of publications increases
> dramatically because we publish each Sample as a separate message. Instead
> of 3-5 "publish" calls, we do 1+2*instance_amount_on_compute publishings
> per each compute. And it's by design, i.e. it's not a bug but a feature.
>
> c. Samples ordering in the queues. It may be considered as a corner case,
> but anyway I'd like to describe it here too. We have a lot of
> order-sensitive transformers (cpu.delta, cpu_util), but we can guarantee
> message ordering only in the "main" polling queue, but not in IPC queues. At
> the picture below (hope it will be displayed) there are 3 agents A1, A2 and
> A3 and 3 time-ordered messages in the MQ. Let's assume that at the same
> time 3 agents start to read messages from the MQ. All the messages are
> related to only one resource, that’s why they will go to only the one IPC
> queue. Let it be IPC queue for A1 agent. At this point, we cannot guarantee
> that the order will be kept, i.e. we cannot do order-sensitive
> transformations without some loss.
>
>
>   Now I'd like to remind you that we need this coordination _only_ to
> support transformations. Take a look on these specs: [3], [4]
> From [3]: The issue that arises is that if we want to implement a
> pipeline to process events, we cannot guarantee what event each agent
> worker will get and because of that, we cannot enable transformers which
> aggregate/collate some relationship across similar events.
>
> We don't have events transformations. In default pipeline.yaml we event
> don't use transformations for notification-based samples (perhaps, we get
> cpu from instance.exist, but we can drop it without any impact). The most
> common case is transformations only for polling-based metrics. Please,
> correct me if I'm wrong here.
>
> tl;dr
> I suggest the following:
> 1. Return transformations to polling agents
> 2. Have a special format for pipeline.yaml on notification agents without
> "interval" and "transformations". Notification-based transformations is
> better to be done "offline".
>
>
> [1]
> https://github.com/openstack/telemetry-specs/blob/master/specs/liberty/pollsters-no-transform.rst
> [2] http://www.gossamer-threads.com/lists/openstack/dev/53983
> [3]
> https://github.com/openstack/ceilometer-specs/blob/master/specs/kilo/notification-coordiation.rst
> [4]
> https://github.com/openstack/ceilometer-specs/blob/master/specs/liberty/distributed-coordinated-notifications.rst
>
> Thanks for you attention,
> Nadya
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20160412/c091ef4b/attachment.html>


More information about the Openstack mailing list