[Openstack] [Ceilometer][Architecture] Transformers in Kilo vs Liberty(and Mitaka)

Nadya Shakhat nprivalova at mirantis.com
Tue Apr 12 13:13:50 UTC 2016


Hello colleagues,

    I'd like to discuss one question with you. Perhaps, you remember that
in Liberty we decided to get rid of transformers on polling agents [1]. I'd
like to describe several issues we are facing now because of this decision.
1. pipeline.yaml inconsistency.
    Ceilometer pipeline consists from the two basic things: source and
sink. In source, we describe how to get data, in sink - how to deal with
the data. After the refactoring described in [1], on polling agents we
apply only "source" definition, on notification agents we apply only "sink"
one. It causes the problems described in the mailing thread [2]: the "pipe"
concept is actually broken. To make it work more or less correctly, the
user should care that from a polling agent he/she doesn't send duplicated
samples. In the example below, we send "cpu" Sample twice each 600 seconds
from a compute agents:

sources:
- name: meter_source
interval: 600
meters:
- "*"
sinks:
- meter_sink
- name: cpu_source
interval: 60
meters:
- "cpu"
sinks:
- cpu_sink
- cpu_delta_sink

If we apply the same configuration on notification agent, each "cpu" Sample
will be processed by all of the 3 sinks. Please refer to the mailing thread
[2] for more details.
    As I understood from the specification, the main reason for [1] is
making the pollster code more readable. That's why I call this change a
"refactoring". Please correct me if I miss anything here.

2. Coordination stuff.
    TBH, coordination for notification agents is the most painful thing for
me because of several reasons:

a. Stateless service has became stateful. Here I'd like to note that tooz
usage for central agents and alarm-evaluators may be called "optional". If
you want to have these services scalable, it is recommended to use tooz,
i.e. install Redis/Zookeeper. But you may have your puppets unchanged and
everything continue to work with one service (central agent or
alarm-evaluator) per cloud. If we are talking about notification agent,
it's not the case. You must change the deployment: eighter rewrite the
puppets for notification agent deployment (to have only one notification
agent per cloud)  or make tooz installation with Redis/Zookeeper required.
One more option: remove transformations completely - that's what we've done
in our company's product by default.

b. RabbitMQ high utilisation. As you know, tooz does only one part of
coordination for a notification agent. In Ceilometer, we use IPC queues
mechanism to be sure that samples with the one metric and from the one
resource are processed by exactly the one notification agent (to make it
possible to use a local cache). I'd like to remind you that without
coordination (but with [1] applied) each compute agent polls each instances
and send the result as one message to a notification agent. The
notification agent processes all the samples and sends as many messages to
a collector as many sinks it is defined (2-4, not many). If [1] if not
applied, one "publishing" round is skipped. But with [1] and coordination
(it's the most recommended deployment), amount of publications increases
dramatically because we publish each Sample as a separate message. Instead
of 3-5 "publish" calls, we do 1+2*instance_amount_on_compute publishings
per each compute. And it's by design, i.e. it's not a bug but a feature.

c. Samples ordering in the queues. It may be considered as a corner case,
but anyway I'd like to describe it here too. We have a lot of
order-sensitive transformers (cpu.delta, cpu_util), but we can guarantee
message ordering only in the "main" polling queue, but not in IPC queues. At
the picture below (hope it will be displayed) there are 3 agents A1, A2 and
A3 and 3 time-ordered messages in the MQ. Let's assume that at the same
time 3 agents start to read messages from the MQ. All the messages are
related to only one resource, that’s why they will go to only the one IPC
queue. Let it be IPC queue for A1 agent. At this point, we cannot guarantee
that the order will be kept, i.e. we cannot do order-sensitive
transformations without some loss.


  Now I'd like to remind you that we need this coordination _only_ to
support transformations. Take a look on these specs: [3], [4]
>From [3]: The issue that arises is that if we want to implement a pipeline
to process events, we cannot guarantee what event each agent worker will
get and because of that, we cannot enable transformers which
aggregate/collate some relationship across similar events.

We don't have events transformations. In default pipeline.yaml we event
don't use transformations for notification-based samples (perhaps, we get
cpu from instance.exist, but we can drop it without any impact). The most
common case is transformations only for polling-based metrics. Please,
correct me if I'm wrong here.

tl;dr
I suggest the following:
1. Return transformations to polling agents
2. Have a special format for pipeline.yaml on notification agents without
"interval" and "transformations". Notification-based transformations is
better to be done "offline".


[1]
https://github.com/openstack/telemetry-specs/blob/master/specs/liberty/pollsters-no-transform.rst
[2] http://www.gossamer-threads.com/lists/openstack/dev/53983
[3]
https://github.com/openstack/ceilometer-specs/blob/master/specs/kilo/notification-coordiation.rst
[4]
https://github.com/openstack/ceilometer-specs/blob/master/specs/liberty/distributed-coordinated-notifications.rst

Thanks for you attention,
Nadya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20160412/518a7e2f/attachment.html>


More information about the Openstack mailing list