<br><br><div class="gmail_quote">On Thu, Jan 24, 2013 at 1:37 PM, Sandy Walsh <span dir="ltr"><<a href="mailto:sandy.walsh@rackspace.com" target="_blank">sandy.walsh@rackspace.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
<br>
On 01/24/2013 01:41 PM, Doug Hellmann wrote:<br>
><br>
><br>
> On Thu, Jan 24, 2013 at 7:28 AM, Sandy Walsh <<a href="mailto:sandy.walsh@rackspace.com">sandy.walsh@rackspace.com</a><br>
</div><div><div class="h5">> <mailto:<a href="mailto:sandy.walsh@rackspace.com">sandy.walsh@rackspace.com</a>>> wrote:<br>
><br>
> On 01/24/2013 05:52 AM, Julien Danjou wrote:> On Thu, Jan 24 2013, Sandy<br>
> Walsh wrote:<br>
> ><br>
> >> This seems like a very inefficient schema requiring multiple<br>
> sub-queries.<br>
> >><br>
> >> Other than the naming, is it really any different than the current<br>
> >> Metadata table when it comes to db performance?<br>
> ><br>
> > There's no metadata table currently, there's a metadata *column*.<br>
> > You can't do any filtering request based on that current column in<br>
> pure<br>
> > SQL.<br>
><br>
> Sorry, I should have said proposed metadata table.<br>
><br>
> >> I think a better approach would be to offer different Metric types<br>
> >> (extensible) which can control their own mapping to the db.<br>
> ><br>
> > I can't see how you can do that and supports a large amount of<br>
> different<br>
> > metrics type, being generic.<br>
><br>
> I'll throw together a proposal. I think it can be done with a set of two<br>
> extensions:<br>
><br>
> 1. the parser part of the event consumer (please excuse my terminology<br>
> abuse. "agent" perhaps?)<br>
> 2. the database portion (which would need to deal with migration, CRUD<br>
> and advanced query). The hard part.<br>
><br>
> We would have to agree on a common format for passing these data<br>
> structures around. Probably just some base attributes + "extra" bits. It<br>
> would likely look like the metadata/dimension structure, but<br>
> under-the-hood could be handled efficiently. This structure would also<br>
> have a tag that would identify the "handler" needed to deal with it. A<br>
> datatype name, if you will.<br>
><br>
> UI, API, aggregation, etc would all work with these generic data<br>
> structures.<br>
><br>
> Honestly I don't think there would be a whole lot of them. Likely, just<br>
> one datatype per system (cinder, nova, quantum, etc).<br>
><br>
> The aggregation system (aka multi-publisher) could listen for data types<br>
> it's interested in for roll-ups.<br>
><br>
> The potential downside is that we could end up with one "monster<br>
> datatype" which is a most-common-denominator of all the important<br>
> attributes across all systems (cinder, nova, quantum, etc). I think<br>
> we're going to end up with one of these anyway once we get into the<br>
> multi-publisher/aggregation layers. eg: "Instance" or "Tenant"<br>
><br>
> I think I should do up a little video showing the type of db data<br>
> structures we've found useful in StackTach. They're small, but<br>
> non-trivial. It should really illustrate what multi-publisher is going<br>
> to need.<br>
><br>
> > But I think that you may want is to implement an dynamic SQL engine<br>
> > backend creating and indexing columns you want to request for.<br>
> That's a<br>
> > solution, but we're trying to be generic with the default sqlalchemy<br>
> > backend.<br>
><br>
> Wouldn't the end effect be the same (without the large impact of an<br>
> index creation hit on first request)? How would we police the growth of<br>
> db indices?<br>
><br>
> >> I'd be curious to see how the metadata table approach performs<br>
> when you<br>
> >> are querying on multiple keys (like Event Name + Cell + Host +<br>
> Request<br>
> >> ID, for example) with a large number of rows. Has anyone tried this?<br>
> ><br>
> > I don't think someone did. This blueprint draft was just something we<br>
> > talked about back then with Nick and we wrote some ideas to not forget<br>
> > it and have some things to discuss.<br>
> ><br>
> > The problem is that metadata are EAV and that plays badly with SQL<br>
> (and<br>
> > especially with SQL lowered down to basics thanks to ORM<br>
> abstraction and<br>
> > SQLAlchemy). It's not clear that doing splitting the metadata in<br>
> another<br>
> > table is going to be more efficient, even if data are indexed. It<br>
> may be<br>
> > faster to use SQL indexes to retrieve matching events as it is, and do<br>
> > the final metadata filtering at application level (i.e. in<br>
> > storage.impl_sqlalchemy).<br>
><br>
> Yep, I agree EAV is bad, that's why I'm proposing a largely denormalized<br>
> table for the raw/underlying data types. Something easily queried on,<br>
> but extensible.<br>
><br>
> ><br>
> > As you said, that should probably be tested.<br>
> ><br>
> > FTR I've created a blueprint on this:<br>
> ><br>
> ><br>
> <a href="https://blueprints.launchpad.net/ceilometer/+spec/sqlalchemy-metadata-query" target="_blank">https://blueprints.launchpad.net/ceilometer/+spec/sqlalchemy-metadata-query</a><br>
> ><br>
><br>
> Thanks. We (RAX) are likely to be using mongodb as our backend storage<br>
> system as well. Perhaps there's merit in having a discussion about<br>
> sticking with one or the other (sql vs no-sql)?<br>
><br>
> Having one datatype per collection would certainly make things easier on<br>
> #2 mentioned above (especially around the migration side).<br>
><br>
> Thinking out loud: If we push the storage into the data type driver we<br>
> could likely have different storage systems per data type? (not sure if<br>
> that's a good thing or not)<br>
><br>
><br>
> When you say "one datatype per collection" do you mean one type of<br>
> measurement?<br>
<br>
</div></div>Yes. Sorry if I'm abusing the terminology here (not covered in<br>
<a href="http://docs.openstack.org/developer/ceilometer/glossary.html" target="_blank">http://docs.openstack.org/developer/ceilometer/glossary.html</a> )<br>
<br>
Reading that paragraph again, I think I could have said it better. I was<br>
trying to say that having a no-sql schema would make things easier all<br>
around.<br></blockquote><div><br></div><div>Yeah, I've found that to be the case, too. We've had some people express reluctance to deploy on anything other than MySQL, though, so we're trying to support SQL as well.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
But to that point, each new data type (a metric? a measure? a counter?)<br></blockquote><div><br></div><div>The terminology confusion is definitely an issue, but the fault is ours, not yours. Angus wasn't around to help with naming things at that point in the project, so we can blame him. :-)</div>
<div><br></div><div>When you query the meter API you get back individual measurements that are called "samples" now, so let's use those terms (meter == the name of the thing measured and sample == the measurement). As we finish up the V2 API I expect that we'll update the glossary.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
would have its own driver associated with and get stored in mongo under<br>
a separate collection. Certainly joins would be costly. They could go<br>
different keys in a single collection too.<br></blockquote><div><br></div><div>What does that buy us? Does it make the indexing more efficient somehow, if the records all have more or less the same schema?</div><div><br>
</div><div>Doug</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
(it's most likely no one would deploy in that fashion, just thinking<br>
ahead a little where the shard key would be dependent on what's<br>
important in the data type) </blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
-S<br>
<div class="im"><br>
<br>
<br>
> Doug<br>
><br>
><br>
><br>
> -S<br>
><br>
> _______________________________________________<br>
> OpenStack-dev mailing list<br>
> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
</div>> <mailto:<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a>><br>
> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<div class="HOEnZb"><div class="h5">><br>
><br>
><br>
><br>
> _______________________________________________<br>
> OpenStack-dev mailing list<br>
> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
><br>
<br>
_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</div></div></blockquote></div><br>