[openstack-dev] [oslo][versionedobjects][ceilometer] explain the benefits of ceilometer+versionedobjects

Alec Hothan (ahothan) ahothan at cisco.com
Wed Sep 2 15:25:09 UTC 2015






On 9/1/15, 11:31 AM, "gord chung" <gord at live.ca> wrote:

>
>
>On 28/08/2015 5:18 PM, Alec Hothan (ahothan) wrote:
>>
>>
>>
>>
>> On 8/28/15, 11:39 AM, "gord chung" <gord at live.ca> wrote:
>>
>>> i should start by saying i re-read my subject line and it arguably comes
>>> off aggressive -- i should probably have dropped 'explain' :)
>>>
>>> On 28/08/15 01:47 PM, Alec Hothan (ahothan) wrote:
>>>> On 8/28/15, 10:07 AM, "gord chung" <gord at live.ca> wrote:
>>>>
>>>>> On 28/08/15 12:18 PM, Roman Dobosz wrote:
>>>>>> So imagine we have new versions of the schema for the events, alarms or
>>>>>> samples in ceilometer introduced in Mitaka release while you have all
>>>>>> your ceilo services on Liberty release. To upgrade ceilometer you'll
>>>>>> have to stop all services to avoid data corruption. With
>>>>>> versionedobjects you can do this one by one without disrupting
>>>>>> telemetry jobs.
>>>>> are versions checked for every single message? has anyone considered the
>>>>> overhead to validating each message? since ceilometer is queue based, we
>>>>> could technically just publish to a new queue when schema changes... and
>>>>> the consuming services will listen to the queue it knows of.
>>>>>
>>>>> ie. our notification service changes schema so it will now publish to a
>>>>> v2 queue, the existing collector service consumes the v1 queue until
>>>>> done at which point you can upgrade it and it will listen to v2 queue.
>>>>>
>>>>> this way there is no need to validate/convert anything and you can still
>>>>> take services down one at a time. this support doesn't exist currently
>>>>> (i just randomly thought of it) but assuming there's no flaw in my idea
>>>>> (which there may be) isn't this more efficient?
>>>> If high performance is a concern for ceilometer (and it should) then maybe
>>>> there might be better options than JSON?
>>>> JSON is great for many applications but can be inappropriate for other
>>>> demanding applications.
>>>> There are other popular open source encoding options that yield much more
>>>> compact wire payload, more efficient encoding/decoding and handle
>>>> versioning to a reasonable extent.
>>> i should clarify. we let oslo.messaging serialise our dictionary how it
>>> does... i believe it's JSON. i'd be interested to switch it to something
>>> more efficient. maybe it's time we revive the msgpacks patch[1] or are
>>> there better alternatives? (hoping i didn't just unleash a storm of
>>> 'this is better' replies)
>> I'd be curious to know if there is any benchmark on the oslo serializer for msgpack and how it compares to JSON?
>> More important is to make sure we're optimizing in the right area.
>> Do we have a good understanding of where ceilometer needs to improve to scale or is it still not quite clear cut?
>
>re: serialisation, that probably isn't the biggest concern for 
>Ceilometer performance. the main items are storage -- to be addressed by 
>Gnocchi/tsdb, and polling load. i just thought i'd point out an existing 
>serialisation patch since we were on the topic :-)

Is there any data measuring the polling load on large scale deployments?
Was there a plan to reduce the polling load to an acceptable level? If yes could you provide any pointer if any?


>
>>
>>>> Queue based versioning might be less runtime overhead per message but at
>>>> the expense of a potentially complex queue version management (which can
>>>> become tricky if you have more than 2 versions).
>>>> I think Neutron was considering to use versioned queues as well for its
>>>> rolling upgrade (along with versioned objects) and I already pointed out
>>>> that managing the queues could be tricky.
>>>>
>>>> In general, trying to provide a versioning framework that allows to do
>>>> arbitrary changes between versions is quite difficult (and often bound to
>>>> fail).
>>>>
>>> yeah, so that's what a lot of the devs are debating about right now.
>>> performance is our key driver so if we do something we think/know will
>>> negatively impact performance, it better bring a whole lot more of
>>> something else. if queue based versioning offers comparable
>>> functionalities, i'd personally be more interested to explore that route
>>> first. is there a thread/patch/log that we could read to see what
>>> Neutron discovered when they looked into it?
>> The versioning comments are buried in this mega patch if you are brave enough to dig in:
>>
>> https://review.openstack.org/#/c/190635
>>
>> The (offline) conclusion was that this was WIP and deserved more discussion (need to check back with Miguel and Ihar from the Neutron team).
>> One option considered in that discussion was to use oslo messaging topics to manage flows of messages that had different versions (and still use versionedobjects). So if you have 3 versions in your cloud you'd end up with 3 topics (and as many queues when it comes to Rabbit). What is complex is to manage the queues/topic names (how to name them), how to discover them and how to deal with all the corner cases (like a new node coming in with an arbitrary version, nodes going away at any moment, downgrade cases).
>
>conceptually, i would think only the consumers need to know about all 
>the queues and even then, it should only really need to know about the 
>ones it understands. the producers (polling agents) can just fire off to 
>the correct versioned queue and be done... thanks for the above link 
>(it'll help with discussion/spec design).

When everything goes according to plan, any solution can work but this is hardly the case in production, especially at scale.  Here are a few question that may help in the discussion:
- how are versioned queue named?
- who creates a versioned queue (producer or consumer?) and who deletes it when no more entity of that version is running?
- how to make sure a producer is not producing in a queue that has no consumer (a messaging infra like rabbit is designed to decouple producers from consumers)
- all corner cases of entities (consumers or producers) popping up with newer or older version, and terminating (gracefully or not) during the upgrade/downgrade, what happens to the queues...

IMHO using a simple communication schema (1 topic/queue for all versions) with in-band message versioning is a much less complex proposition than juggling with versioned queues (not to say the former is simple to do). With versioned queues you're kind of trading off the per message versioning with per queue versioning but at the expense of:
- a complex queue management (if you want to do it right) 
- a not less complex per queue message decoding (since the consumer needs to know how to decode and interpret every message depending on the version of the queue it comes from)
- a more difficult debug environment (harder to debug multiple queues than 1 queue)
- and added stress on oslo messaging (due to the use of transient queues)


Regards,

  Alec





>


More information about the OpenStack-dev mailing list