[openstack-dev] [ceilometer] resource_metadata and metaquery

Doug Hellmann doug.hellmann at dreamhost.com
Thu Jan 24 22:09:31 UTC 2013


On Thu, Jan 24, 2013 at 1:37 PM, Sandy Walsh <sandy.walsh at rackspace.com>wrote:

>
>
> On 01/24/2013 01:41 PM, Doug Hellmann wrote:
> >
> >
> > On Thu, Jan 24, 2013 at 7:28 AM, Sandy Walsh <sandy.walsh at rackspace.com
> > <mailto:sandy.walsh at rackspace.com>> wrote:
> >
> >     On 01/24/2013 05:52 AM, Julien Danjou wrote:> On Thu, Jan 24 2013,
> Sandy
> >     Walsh wrote:
> >     >
> >     >> This seems like a very inefficient schema requiring multiple
> >     sub-queries.
> >     >>
> >     >> Other than the naming, is it really any different than the current
> >     >> Metadata table when it comes to db performance?
> >     >
> >     > There's no metadata table currently, there's a metadata *column*.
> >     > You can't do any filtering request based on that current column in
> >     pure
> >     > SQL.
> >
> >     Sorry, I should have said proposed metadata table.
> >
> >     >> I think a better approach would be to offer different Metric types
> >     >> (extensible) which can control their own mapping to the db.
> >     >
> >     > I can't see how you can do that and supports a large amount of
> >     different
> >     > metrics type, being generic.
> >
> >     I'll throw together a proposal. I think it can be done with a set of
> two
> >     extensions:
> >
> >     1. the parser part of the event consumer (please excuse my
> terminology
> >     abuse. "agent" perhaps?)
> >     2. the database portion (which would need to deal with migration,
> CRUD
> >     and advanced query). The hard part.
> >
> >     We would have to agree on a common format for passing these data
> >     structures around. Probably just some base attributes + "extra"
> bits. It
> >     would likely look like the metadata/dimension structure, but
> >     under-the-hood could be handled efficiently. This structure would
> also
> >     have a tag that would identify the "handler" needed to deal with it.
> A
> >     datatype name, if you will.
> >
> >     UI, API, aggregation, etc would all work with these generic data
> >     structures.
> >
> >     Honestly I don't think there would be a whole lot of them. Likely,
> just
> >     one datatype per system (cinder, nova, quantum, etc).
> >
> >     The aggregation system (aka multi-publisher) could listen for data
> types
> >     it's interested in for roll-ups.
> >
> >     The potential downside is that we could end up with one "monster
> >     datatype" which is a most-common-denominator of all the important
> >     attributes across all systems (cinder, nova, quantum, etc). I think
> >     we're going to end up with one of these anyway once we get into the
> >     multi-publisher/aggregation layers. eg: "Instance" or "Tenant"
> >
> >     I think I should do up a little video showing the type of db data
> >     structures we've found useful in StackTach. They're small, but
> >     non-trivial. It should really illustrate what multi-publisher is
> going
> >     to need.
> >
> >     > But I think that you may want is to implement an dynamic SQL engine
> >     > backend creating and indexing columns you want to request for.
> >     That's a
> >     > solution, but we're trying to be generic with the default
> sqlalchemy
> >     > backend.
> >
> >     Wouldn't the end effect be the same (without the large impact of an
> >     index creation hit on first request)? How would we police the growth
> of
> >     db indices?
> >
> >     >> I'd be curious to see how the metadata table approach performs
> >     when you
> >     >> are querying on multiple keys (like Event Name + Cell + Host +
> >     Request
> >     >> ID, for example) with a large number of rows. Has anyone tried
> this?
> >     >
> >     > I don't think someone did. This blueprint draft was just something
> we
> >     > talked about back then with Nick and we wrote some ideas to not
> forget
> >     > it and have some things to discuss.
> >     >
> >     > The problem is that metadata are EAV and that plays badly with SQL
> >     (and
> >     > especially with SQL lowered down to basics thanks to ORM
> >     abstraction and
> >     > SQLAlchemy). It's not clear that doing splitting the metadata in
> >     another
> >     > table is going to be more efficient, even if data are indexed. It
> >     may be
> >     > faster to use SQL indexes to retrieve matching events as it is,
> and do
> >     > the final metadata filtering at application level (i.e. in
> >     > storage.impl_sqlalchemy).
> >
> >     Yep, I agree EAV is bad, that's why I'm proposing a largely
> denormalized
> >     table for the raw/underlying data types. Something easily queried on,
> >     but extensible.
> >
> >     >
> >     > As you said, that should probably be tested.
> >     >
> >     > FTR I've created a blueprint on this:
> >     >
> >     >
> >
> https://blueprints.launchpad.net/ceilometer/+spec/sqlalchemy-metadata-query
> >     >
> >
> >     Thanks. We (RAX) are likely to be using mongodb as our backend
> storage
> >     system as well. Perhaps there's merit in having a discussion about
> >     sticking with one or the other (sql vs no-sql)?
> >
> >     Having one datatype per collection would certainly make things
> easier on
> >     #2 mentioned above (especially around the migration side).
> >
> >     Thinking out loud: If we push the storage into the data type driver
> we
> >     could likely have different storage systems per data type? (not sure
> if
> >     that's a good thing or not)
> >
> >
> > When you say "one datatype per collection" do you mean one type of
> > measurement?
>
> Yes. Sorry if I'm abusing the terminology here (not covered in
> http://docs.openstack.org/developer/ceilometer/glossary.html )
>
> Reading that paragraph again, I think I could have said it better. I was
> trying to say that having a no-sql schema would make things easier all
> around.
>

Yeah, I've found that to be the case, too. We've had some people express
reluctance to deploy on anything other than MySQL, though, so we're trying
to support SQL as well.


>
> But to that point, each new data type (a metric? a measure? a counter?)
>

The terminology confusion is definitely an issue, but the fault is ours,
not yours. Angus wasn't around to help with naming things at that point in
the project, so we can blame him. :-)

When you query the meter API you get back individual measurements that are
called "samples" now, so let's use those terms (meter == the name of the
thing measured and sample == the measurement). As we finish up the V2 API I
expect that we'll update the glossary.


> would have its own driver associated with and get stored in mongo under
> a separate collection. Certainly joins would be costly. They could go
> different keys in a single collection too.
>

What does that buy us? Does it make the indexing more efficient somehow, if
the records all have more or less the same schema?

Doug


>
> (it's most likely no one would deploy in that fashion, just thinking
> ahead a little where the shard key would be dependent on what's
> important in the data type)


> -S
>
>
>
> > Doug
> >
> >
> >
> >     -S
> >
> >     _______________________________________________
> >     OpenStack-dev mailing list
> >     OpenStack-dev at lists.openstack.org
> >     <mailto:OpenStack-dev at lists.openstack.org>
> >     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130124/ff6cf517/attachment.html>


More information about the OpenStack-dev mailing list