Perhaps, but not so fast. I still need more investigation. Mark Goddard <mark@stackhpc.com> 於 2019年8月8日 週四 下午10:36寫道:
On Thu, 8 Aug 2019 at 11:39, Eddie Yen <missile0407@gmail.com> wrote:
Hi Mark, thanks for suggestion.
I think this, too. Cinder-api may normal but HAproxy could be very busy since one controller down. I'll try to increase the value about cinder-api timeout.
Will you be proposing this fix upstream?
Mark Goddard <mark@stackhpc.com> 於 2019年8月7日 週三 上午12:06寫道:
On Tue, 6 Aug 2019 at 16:33, Matt Riedemann <mriedemos@gmail.com> wrote:
On 8/6/2019 7:18 AM, Mark Goddard wrote:
We do use a larger timeout for glance-api (haproxy_glance_api_client_timeout and haproxy_glance_api_server_timeout, both 6h). Perhaps we need something similar for cinder-api.
A 6 hour timeout for cinder API calls would be nuts IMO. The thing that was failing was a volume attachment delete/create from what I recall, which is the newer version (as of Ocata?) for the old initialize_connection/terminate_connection APIs. These are synchronous RPC calls from cinder-api to cinder-volume to do things on the storage backend and we have seen them take longer than 60 seconds in the gate CI runs with the lvm driver. I think the investigation normally turned up lvchange taking over 60 seconds on some concurrent operation locking out the RPC call which eventually results in the MessagingTimeout from oslo.messaging. That's unrelated to your gateway timeout from HAProxy but the point is yeah you likely want to bump up those timeouts since cinder-api has these synchronous calls to the cinder-volume service. I just don't think you need to go to 6 hours :). I think the keystoneauth1 default http response timeout is 10 minutes so maybe try that.
Yeah, wasn't advocating for 6 hours - just showing which knobs are available :)
--
Thanks,
Matt