Open Stack

Sun Jul 6 13:28:47 UTC 2014

> This is an attempt to summarize a really useful discussion that Victor,
> Flavio and I have been having today. At the bottom are some background
> links - basically what I have open in my browser right now thinking
> through all of this.

Thanks for the detailed summary, it puts a more flesh on the bones
than a brief conversation on the fringes of the Paris mid-cycle.

Just a few clarifications and suggestions inline to add into the
mix.

> We're attempting to take baby-steps towards moving completely from
> eventlet to asyncio/trollius. The thinking is for Ceilometer to be the
> first victim.

First beneficiary, I hope :)

> Ceilometer's code is run in response to various I/O events like REST API
> requests, RPC calls, notifications received, etc. We eventually want the
> asyncio event loop to be what schedules Ceilometer's code in response to
> these events. Right now, it is eventlet doing that.

Yes.

And there is one other class of stimulus, also related to eventlet,
that is very important for triggering the execution of ceilometer
logic. That would be the timed tasks that drive polling of:

 * REST APIs provided by other openstack services 
 * the local hypervisor running on each compute node
 * the SNMP daemons running at host-level etc.

and also trigger periodic alarm evaluation.

IIUC these tasks are all mediated via the oslo threadgroup's
usage of eventlet.greenpool[1]. Would this logic also be replaced
as part of this effort?

> Now, because we're using eventlet, the code that is run in response to
> these events looks like synchronous code that makes a bunch of
> synchronous calls. For example, the code might do some_sync_op() and
> that will cause a context switch to a different greenthread (within the
> same native thread) where we might handle another I/O event (like a REST
> API request)

Just to make the point that most of the agents in the ceilometer
zoo tend to react to just a single type of stimulus, as opposed
to a mix of dispatching from both message bus and the REST API.

So to classify, we'd have:

 * compute-agent: timer tasks for polling
 * central-agent: timer tasks for polling
 * notification-agent: dispatch of "external" notifications from
   the message bus
 * collector: dispatch of "internal" metering messages from the
   message bus
 * api-service: dispatch of REST API calls
 * alarm-evaluator: timer tasks for alarm evaluation
 * alarm-notifier: dispatch of "internal" alarm notifications

IIRC, the only case where there's a significant mix of trigger
styles is the partitioned alarm evaluator, where assignments of
alarm subsets for evaluation is driven over RPC, whereas the
actual thresholding is triggered by a timer.

> Porting from eventlet's implicit async approach to asyncio's explicit
> async API will be seriously time consuming and we need to be able to do
> it piece-by-piece.

Yes, I agree, a step-wise approach is the key here.

So I'd love to have some sense of the time horizon for this
effort. It clearly feels like a multi-cycle effort, so the main
question in my mind right now is whether we should be targeting
the first deliverables for juno-3?

That would provide a proof-point in advance of the K* summit,
where I presume the task would be get wider buy-in for the idea.

If it makes sense to go ahead and aim the first baby steps for
juno-3, then we'd need to have a ceilometer-spec detailing these
changes. This would need to be proposed by say EoW and then
landed before the spec acceptance deadline for juno (~July 21st).

We could use this spec proposal to dig into the perceived benefits
of this effort:

 * the obvious win around getting rid of the eventlet black-magic
 * plus possibly other benefits such as code clarity and ease of
   maintenance

and OTOH get a heads-up on the risks:

 * possible immaturity in the new framework?
 * overhead involved in contributors getting to grips with the
   new coroutine model

> The question then becomes what do we need to do in order to port a
> single oslo.messaging RPC endpoint method in Ceilometer to asyncio's
> explicit async approach?

One approach would be to select one well-defined area of ceilometer
as an initial test-bed for these ideas.

And one potential candidate for that would be the partitioned alarm
evaluator, which uses:

 1. fan-out RPC for the heartbeats underpinning master-slave
    coordination
 2. RPC calls for alarm allocations and assignments

I spoke to Cyril Roelandt at the mid-cycle, who is interested in:

 * replacing #1 with the tooz distributed co-ordination library[2]
 * and also possibly replacing #2 with taskflow

The benefit of using taskflow for "sticky" task assignments isn't
100% clear, so it may actually make better sense to just use tooz
for the leadership election, and the new asyncio model for #2.

Starting there would have the advantage of being out on the side
of the main ceilometer pipeline.

However, if we do decide to go ahead with taskflow, then we could
fine another good starting point for asyncio as an alternative.

>   - when all of ceilometer has been ported over to asyncio coroutines,
>     we can stop monkey patching, stop using greenio and switch to the
>     asyncio event loop

... kick back and light a cigar! :)

Cheers,
Eoghan

[1] https://github.com/openstack/oslo-incubator/blob/master/openstack/common/threadgroup.py#L72
[2] https://github.com/stackforge/tooz
[3] https://wiki.openstack.org/wiki/TaskFlow

Open Stack

[openstack-dev] [oslo] Asyncio and oslo.messaging

OpenStack

Community

Documentation

Branding & Legal