[openstack-dev] Thoughts on the patch test failure rate and moving forward

Joshua Harlow harlowja at outlook.com
Thu Jul 24 20:19:52 UTC 2014


On Jul 24, 2014, at 12:54 PM, Sean Dague <sean at dague.net> wrote:

> On 07/24/2014 02:51 PM, Joshua Harlow wrote:
>> A potentially brilliant idea ;-)
>> 
>> Aren't all the machines the gate runs tests on VMs running via OpenStack APIs?
>> 
>> OpenStack supports snapshotting (last time I checked). So instead of providing back a whole bunch of log files, provide back a snapshot of the machine/s that ran the tests; let person who wants to download that snapshot download it (and then they can boot it up into virtualbox, qemu, there own OpenStack cloud...) and investigate all the log files they desire. 
>> 
>> Are we really being so conservative on space that we couldn't do this? I find it hard to believe that space is a concern for anything anymore (if it really matters store the snapshots in ceph, or glusterfs, swift, or something else... which should dedup the blocks). This is pretty common with how people use snapshots and what they back them with anyway so it would be nice if infra exposed the same thing...
>> 
>> Would something like that be possible? I'm not so familiar with all the inner workings of the infra project; but if it eventually boots VMs using an OpenStack cloud, it would seem reasonable that it could provide the same mechanisms we are all already used to using...
>> 
>> Thoughts?
> 
> There are actual space concerns. Especially when we're talking about 20k
> runs / week. At which point snapshots are probably in the neighborhood
> of 10G, so we're talking about 200 TB / week of storage. Plus there are
> actual technical details of the fact that glance end points are really
> quite beta in the clouds we use. Remember our tests runs aren't pets,
> they are cattle, we need to figure out the right distillation of data
> and move on, as there isn't enough space or time to keep everything around.

Sure not pets..., save only the failing ones then (the broken cattle)?

Is 200TB/week really how much is actually stored when ceph or other uses data-deduping? Does rackspace or HP (the VM providers for infra afaik) do this/or use a similar deduping technology for storing snapshots?

I agree with right distillation and maybe it's not always needed, but it would/could be nice to have a button on gerrit that u could activate within a certain amount of time after the run to get all the images that the VMs used during the tests (yes the download would be likely be huge) if you really want to setup the exact same environment that the test failed with. Maybe have that button expire after a week (then u only need 200 TB of *expiring* space).

> Also portability of system images is... limited between hypervisors.
> 
> If this is something you'd like to see if you could figure out the hard
> parts of, I invite you to dive in on the infra side. It's very easy to
> say it's easy. :) Actually coming up with a workable solution requires a
> ton more time and energy.

Of course, that goes without saying,

I guess I thought this is a ML for discussions and thoughts (in part the 'thought' part of this subject) and need not be a solution off the bat.

Just an idea anyway...

> 
> 	-Sean
> 
> -- 
> Sean Dague
> http://dague.net
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list