[OpenStack-Infra] Your Gerrit Account has been temporarily disabled

Jeremy Stanley fungi at yuggoth.org
Fri Jun 5 15:51:56 UTC 2015


On 2015-06-05 15:12:42 +0000 (+0000), Znoinski, Waldemar wrote:
[...]
> I'd like to know more about what you saw and/or what was causing
> (or you think was) the problem.

Specifically, we saw all available Gerrit stream-events worker
threads busy servicing your connections, and all other stream-events
tasks queued and waiting for an available worker thread.
Unfortunately when Gerrit gets into this situation, it seems that
merely killing the tasks being serviced does not wake up the waiting
tasks and so all the other stream-events connections get no new
updates until we restart the entire Gerrit service.

> * Was it too many connections spawn in a given amount of time?

No, it looked like they had been opened at different times over the
course of at least several hours.

> * Were the connections long lasting (possible lack of closing the
> connections)?

I think this may be the problem (not the long lasting, but the not
being closed while actually defunct).

> * Was the command inside the ssh session not finishing/hanging (or
> long running) ?

It's unfortunately hard to tell from what little detail we get in
Gerrit logs and thread dumps.

> What I see my side for last 24h period is ~10 connections to
> review.openstack.org which were hanging and not closed my side,
> yet not doing anything as far as I can tell. From your description
> of the problem that may be it - Gerrit threads consumed
> unnecessarily. If you have connection details of the problematic
> ssh sessions (source port at least) it would be great.

I don't, but we might be able to recreate this problem now that we
have a little better idea of the surrounding circumstances.

> As I understand listening of Gerrit event stream is not causing
> the issue but the second part (to run 'gerrit query' over ssh) is
> - correct me if I'm wrong.
[...]

It's actually the gerrit stream-events connections that are the
problem, not gerrit query from what we can tell. Was there anything
unusual about your open stream-events connections from your end, as
far as you know? I'm sort of wondering whether connections which get
uncleanly terminated at the client (firewall drops an existing state
and doesn't spoof a FIN or RST or send a relevant ICMP error) cause
the socket buffer to fill up and then the worker threads block on
write once that happens. Speculation for now, but once we can nail
this down hopefully we'll be able to provide an actionable bug
report to the Gerrit developer community.
-- 
Jeremy Stanley



More information about the OpenStack-Infra mailing list