[openstack-dev] [ceilometer] resource_metadata and metaquery

Monty Taylor mordred at inaugust.com
Thu Jan 24 22:29:16 UTC 2013



On 01/25/2013 09:09 AM, Doug Hellmann wrote:
> 
> 
> On Thu, Jan 24, 2013 at 1:37 PM, Sandy Walsh <sandy.walsh at rackspace.com
> <mailto:sandy.walsh at rackspace.com>> wrote:
> 
> 
> 
>     On 01/24/2013 01:41 PM, Doug Hellmann wrote:
>     >
>     >
>     > On Thu, Jan 24, 2013 at 7:28 AM, Sandy Walsh
>     <sandy.walsh at rackspace.com <mailto:sandy.walsh at rackspace.com>
>     > <mailto:sandy.walsh at rackspace.com
>     <mailto:sandy.walsh at rackspace.com>>> wrote:
>     >
>     >     On 01/24/2013 05:52 AM, Julien Danjou wrote:> On Thu, Jan 24
>     2013, Sandy
>     >     Walsh wrote:
>     >     >
>     >     >> This seems like a very inefficient schema requiring multiple
>     >     sub-queries.
>     >     >>
>     >     >> Other than the naming, is it really any different than the
>     current
>     >     >> Metadata table when it comes to db performance?
>     >     >
>     >     > There's no metadata table currently, there's a metadata
>     *column*.
>     >     > You can't do any filtering request based on that current
>     column in
>     >     pure
>     >     > SQL.
>     >
>     >     Sorry, I should have said proposed metadata table.
>     >
>     >     >> I think a better approach would be to offer different
>     Metric types
>     >     >> (extensible) which can control their own mapping to the db.
>     >     >
>     >     > I can't see how you can do that and supports a large amount of
>     >     different
>     >     > metrics type, being generic.
>     >
>     >     I'll throw together a proposal. I think it can be done with a
>     set of two
>     >     extensions:
>     >
>     >     1. the parser part of the event consumer (please excuse my
>     terminology
>     >     abuse. "agent" perhaps?)
>     >     2. the database portion (which would need to deal with
>     migration, CRUD
>     >     and advanced query). The hard part.
>     >
>     >     We would have to agree on a common format for passing these data
>     >     structures around. Probably just some base attributes +
>     "extra" bits. It
>     >     would likely look like the metadata/dimension structure, but
>     >     under-the-hood could be handled efficiently. This structure
>     would also
>     >     have a tag that would identify the "handler" needed to deal
>     with it. A
>     >     datatype name, if you will.
>     >
>     >     UI, API, aggregation, etc would all work with these generic data
>     >     structures.
>     >
>     >     Honestly I don't think there would be a whole lot of them.
>     Likely, just
>     >     one datatype per system (cinder, nova, quantum, etc).
>     >
>     >     The aggregation system (aka multi-publisher) could listen for
>     data types
>     >     it's interested in for roll-ups.
>     >
>     >     The potential downside is that we could end up with one "monster
>     >     datatype" which is a most-common-denominator of all the important
>     >     attributes across all systems (cinder, nova, quantum, etc). I
>     think
>     >     we're going to end up with one of these anyway once we get
>     into the
>     >     multi-publisher/aggregation layers. eg: "Instance" or "Tenant"
>     >
>     >     I think I should do up a little video showing the type of db data
>     >     structures we've found useful in StackTach. They're small, but
>     >     non-trivial. It should really illustrate what multi-publisher
>     is going
>     >     to need.
>     >
>     >     > But I think that you may want is to implement an dynamic SQL
>     engine
>     >     > backend creating and indexing columns you want to request for.
>     >     That's a
>     >     > solution, but we're trying to be generic with the default
>     sqlalchemy
>     >     > backend.
>     >
>     >     Wouldn't the end effect be the same (without the large impact
>     of an
>     >     index creation hit on first request)? How would we police the
>     growth of
>     >     db indices?
>     >
>     >     >> I'd be curious to see how the metadata table approach performs
>     >     when you
>     >     >> are querying on multiple keys (like Event Name + Cell + Host +
>     >     Request
>     >     >> ID, for example) with a large number of rows. Has anyone
>     tried this?
>     >     >
>     >     > I don't think someone did. This blueprint draft was just
>     something we
>     >     > talked about back then with Nick and we wrote some ideas to
>     not forget
>     >     > it and have some things to discuss.
>     >     >
>     >     > The problem is that metadata are EAV and that plays badly
>     with SQL
>     >     (and
>     >     > especially with SQL lowered down to basics thanks to ORM
>     >     abstraction and
>     >     > SQLAlchemy). It's not clear that doing splitting the metadata in
>     >     another
>     >     > table is going to be more efficient, even if data are
>     indexed. It
>     >     may be
>     >     > faster to use SQL indexes to retrieve matching events as it
>     is, and do
>     >     > the final metadata filtering at application level (i.e. in
>     >     > storage.impl_sqlalchemy).
>     >
>     >     Yep, I agree EAV is bad, that's why I'm proposing a largely
>     denormalized
>     >     table for the raw/underlying data types. Something easily
>     queried on,
>     >     but extensible.
>     >
>     >     >
>     >     > As you said, that should probably be tested.
>     >     >
>     >     > FTR I've created a blueprint on this:
>     >     >
>     >     >
>     >    
>     https://blueprints.launchpad.net/ceilometer/+spec/sqlalchemy-metadata-query
>     >     >
>     >
>     >     Thanks. We (RAX) are likely to be using mongodb as our backend
>     storage
>     >     system as well. Perhaps there's merit in having a discussion about
>     >     sticking with one or the other (sql vs no-sql)?
>     >
>     >     Having one datatype per collection would certainly make things
>     easier on
>     >     #2 mentioned above (especially around the migration side).
>     >
>     >     Thinking out loud: If we push the storage into the data type
>     driver we
>     >     could likely have different storage systems per data type?
>     (not sure if
>     >     that's a good thing or not)
>     >
>     >
>     > When you say "one datatype per collection" do you mean one type of
>     > measurement?
> 
>     Yes. Sorry if I'm abusing the terminology here (not covered in
>     http://docs.openstack.org/developer/ceilometer/glossary.html )
> 
>     Reading that paragraph again, I think I could have said it better. I was
>     trying to say that having a no-sql schema would make things easier all
>     around.
> 
> 
> Yeah, I've found that to be the case, too. We've had some people express
> reluctance to deploy on anything other than MySQL, though, so we're
> trying to support SQL as well.

MongoDB is super troubling and problematic in some situations. It's not
a tech judgement on mongo itself - but the AGPL is untouchable by some
folks.

OTOH - If you run in to MySQL issues - I might know some people who know
a lot about it. :)

>     But to that point, each new data type (a metric? a measure? a counter?)
> 
> 
> The terminology confusion is definitely an issue, but the fault is ours,
> not yours. Angus wasn't around to help with naming things at that point
> in the project, so we can blame him. :-)
> 
> When you query the meter API you get back individual measurements that
> are called "samples" now, so let's use those terms (meter == the name of
> the thing measured and sample == the measurement). As we finish up the
> V2 API I expect that we'll update the glossary.
>  
> 
>     would have its own driver associated with and get stored in mongo under
>     a separate collection. Certainly joins would be costly. They could go
>     different keys in a single collection too.
> 
> 
> What does that buy us? Does it make the indexing more efficient somehow,
> if the records all have more or less the same schema?
> 
> Doug
>  
> 
> 
>     (it's most likely no one would deploy in that fashion, just thinking
>     ahead a little where the shard key would be dependent on what's
>     important in the data type) 
> 
> 
>     -S
> 
> 
> 
>     > Doug
>     >
>     >
>     >
>     >     -S
>     >
>     >     _______________________________________________
>     >     OpenStack-dev mailing list
>     >     OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     >     <mailto:OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>>
>     >     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     >
>     >
>     >
>     >
>     > _______________________________________________
>     > OpenStack-dev mailing list
>     > OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     >
> 
>     _______________________________________________
>     OpenStack-dev mailing list
>     OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list