[openstack-dev] [gate] concurrent workers are overwhelming postgresql in the gate - bug 1338841

Matt Riedemann mriedem at linux.vnet.ibm.com
Wed Jul 9 19:59:56 UTC 2014


Bug 1338841 [1] started showing up yesterday and I first noticed it on 
the change to set osapi_volume_workers equal to the number of CPUs 
available by default.  Similar patches for trove (api/conductor workers) 
and glance (api/registry workers) have landed in the last week also, and 
nova has been running with multiple api/conductor workers by default 
since Icehouse.

It looks like the cinder change tipped the default postgresql 
max_connections over and we started getting asynchronous connection 
failures in that job. [2]

We can also note that the postgresql job is the only one that runs the 
nova api-metadata service, which has it's own workers.

The VMs the jobs are running on have 8 VCPUs, so that's at least 88 
workers between nova (3), cinder (1), glance (2), trove (2), neutron, 
heat and ceilometer.

So osapi_volume_workers (8) + n-api-meta workers (8) seems to have 
tipped it over.

The first attempt at a fix is to simply double the default 
max_connections value [3].

While looking up the postgresql configuration docs, I also read a bit on 
synchronous_commit=off and fsync=off, which sound like we might want to 
also think about using one of those in devstack runs since they are 
supposed to be more performant if you don't care about disaster recovery 
(which we don't in gate runs on VMs).

Anyway, bumping max connections might fix the gate, I'm just sending 
this out to see if there are any postgresql experts out there with 
additional tips or insights on things we can tweak or look for, 
including whether or not it might be worthwhile to set 
synchronous_commit=off or fsync=off for gate runs.

[1] https://bugs.launchpad.net/nova/+bug/1338841
[2] http://goo.gl/yRBDjQ
[3] https://review.openstack.org/#/c/105854/

-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list