[openstack-dev] Gate proposal - drop Postgresql configurations in the gate

Sean Dague sean at dague.net
Fri Jun 13 11:06:17 UTC 2014


On 06/12/2014 10:18 PM, Dan Prince wrote:
> On Thu, 2014-06-12 at 09:24 -0700, Joe Gordon wrote:
>>
>> On Jun 12, 2014 8:37 AM, "Sean Dague" <sean at dague.net> wrote:
>>>
>>> On 06/12/2014 10:38 AM, Mike Bayer wrote:
>>>>
>>>> On 6/12/14, 8:26 AM, Julien Danjou wrote:
>>>>> On Thu, Jun 12 2014, Sean Dague wrote:
>>>>>
>>>>>> That's not cacthable in unit or functional tests?
>>>>> Not in an accurate manner, no.
>>>>>
>>>>>> Keeping jobs alive based on the theory that they might one day
>> be useful
>>>>>> is something we just don't have the liberty to do any more.
>> We've not
>>>>>> seen an idle node in zuul in 2 days... and we're only at j-1.
>> j-3 will
>>>>>> be at least +50% of this load.
>>>>> Sure, I'm not saying we don't have a problem. I'm just saying
>> it's not a
>>>>> good solution to fix that problem IMHO.
>>>>
>>>> Just my 2c without having a full understanding of all of
>> OpenStack's CI
>>>> environment, Postgresql is definitely different enough that MySQL
>>>> "strict mode" could still allow issues to slip through quite
>> easily, and
>>>> also as far as capacity issues, this might be longer term but I'm
>> hoping
>>>> to get database-related tests to be lots faster if we can move to
>> a
>>>> model that spends much less time creating databases and schemas.
>>>
>>> This is what I mean by functional testing. If we were directly
>> hitting a
>>> real database on a set of in tree project tests, I think you could
>>> discover issues like this. Neutron was headed down that path.
>>>
>>> But if we're talking about a devstack / tempest run, it's not really
>>> applicable.
>>>
>>> If someone can point me to a case where we've actually found this
>> kind
>>> of bug with tempest / devstack, that would be great. I've just
>> *never*
>>> seen it. I was the one that did most of the fixing for pg support in
>>> Nova, and have helped other projects as well, so I'm relatively
>> familiar
>>> with the kinds of fails we can discover. The ones that Julien
>> pointed
>>> really aren't likely to be exposed in our current system.
>>>
>>> Which is why I think we're mostly just burning cycles on the
>> existing
>>> approach for no gain.
>>
>> Given all the points made above, I think dropping PostgreSQL is the
>> right choice; if only we had infinite cloud that would be another
>> story.
>>
>> What about converting one of our existing jobs (grenade partial ncpu,
>> large ops, regular grenade, tempest with nova network etc.) Into a
>> PostgreSQL only job? We could get some level of PostgreSQL testing
>> without any additional jobs, although this is  tradeoff obviously.
> 
> I'd be fine with this tradeoff if it allows us to keep PostgreSQL in the
> mix.

The problem isn't just testing, it's people looking at the failures in
the different configurations.

I'm glad everyone loves having lots of configurations. :)

I'm less glad we've got a 24hr merge queue in the gate because very few
people are actually sifting through the failed results to figure out why
and fix them. :(

If we had more people looking through failures then it would just be a
machine capacity problem. But it's not, it's also a people capacity problem.

It's just not sustainable as a project. Pleading with people to help on
the failed side has not worked over the last year. So I really think
we're at a point where we need to start throwing jobs until we reduce
the failure rate to one that we can actually make forward progress.

Because right now we can't typically land fixes for the race conditions
in any timely manner because they get stomped by other races. I've got a
giant set of outstanding patches to make some of these stuff more clear,
which is all stuck.

So if we can't evolve the system back towards health, we need to just
cut a bunch of stuff off until we can.

	-Sean

-- 
Sean Dague
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140613/6236b9c2/attachment.pgp>


More information about the OpenStack-dev mailing list