[openstack-dev] [ceilometer] resource_metadata and metaquery

Sandy Walsh sandy.walsh at rackspace.com
Thu Jan 24 12:28:55 UTC 2013


On 01/24/2013 05:52 AM, Julien Danjou wrote:> On Thu, Jan 24 2013, Sandy
Walsh wrote:
>
>> This seems like a very inefficient schema requiring multiple sub-queries.
>>
>> Other than the naming, is it really any different than the current
>> Metadata table when it comes to db performance?
>
> There's no metadata table currently, there's a metadata *column*.
> You can't do any filtering request based on that current column in pure
> SQL.

Sorry, I should have said proposed metadata table.

>> I think a better approach would be to offer different Metric types
>> (extensible) which can control their own mapping to the db.
>
> I can't see how you can do that and supports a large amount of different
> metrics type, being generic.

I'll throw together a proposal. I think it can be done with a set of two
extensions:

1. the parser part of the event consumer (please excuse my terminology
abuse. "agent" perhaps?)
2. the database portion (which would need to deal with migration, CRUD
and advanced query). The hard part.

We would have to agree on a common format for passing these data
structures around. Probably just some base attributes + "extra" bits. It
would likely look like the metadata/dimension structure, but
under-the-hood could be handled efficiently. This structure would also
have a tag that would identify the "handler" needed to deal with it. A
datatype name, if you will.

UI, API, aggregation, etc would all work with these generic data
structures.

Honestly I don't think there would be a whole lot of them. Likely, just
one datatype per system (cinder, nova, quantum, etc).

The aggregation system (aka multi-publisher) could listen for data types
it's interested in for roll-ups.

The potential downside is that we could end up with one "monster
datatype" which is a most-common-denominator of all the important
attributes across all systems (cinder, nova, quantum, etc). I think
we're going to end up with one of these anyway once we get into the
multi-publisher/aggregation layers. eg: "Instance" or "Tenant"

I think I should do up a little video showing the type of db data
structures we've found useful in StackTach. They're small, but
non-trivial. It should really illustrate what multi-publisher is going
to need.

> But I think that you may want is to implement an dynamic SQL engine
> backend creating and indexing columns you want to request for. That's a
> solution, but we're trying to be generic with the default sqlalchemy
> backend.

Wouldn't the end effect be the same (without the large impact of an
index creation hit on first request)? How would we police the growth of
db indices?

>> I'd be curious to see how the metadata table approach performs when you
>> are querying on multiple keys (like Event Name + Cell + Host + Request
>> ID, for example) with a large number of rows. Has anyone tried this?
>
> I don't think someone did. This blueprint draft was just something we
> talked about back then with Nick and we wrote some ideas to not forget
> it and have some things to discuss.
>
> The problem is that metadata are EAV and that plays badly with SQL (and
> especially with SQL lowered down to basics thanks to ORM abstraction and
> SQLAlchemy). It's not clear that doing splitting the metadata in another
> table is going to be more efficient, even if data are indexed. It may be
> faster to use SQL indexes to retrieve matching events as it is, and do
> the final metadata filtering at application level (i.e. in
> storage.impl_sqlalchemy).

Yep, I agree EAV is bad, that's why I'm proposing a largely denormalized
table for the raw/underlying data types. Something easily queried on,
but extensible.

>
> As you said, that should probably be tested.
>
> FTR I've created a blueprint on this:
>
>
https://blueprints.launchpad.net/ceilometer/+spec/sqlalchemy-metadata-query
>

Thanks. We (RAX) are likely to be using mongodb as our backend storage
system as well. Perhaps there's merit in having a discussion about
sticking with one or the other (sql vs no-sql)?

Having one datatype per collection would certainly make things easier on
#2 mentioned above (especially around the migration side).

Thinking out loud: If we push the storage into the data type driver we
could likely have different storage systems per data type? (not sure if
that's a good thing or not)

-S



More information about the OpenStack-dev mailing list