[Third-party-announce] Hitachi Manila HNAS CI has been disabled

Clark Boylan cboylan at sapwetik.org
Fri Mar 17 18:07:11 UTC 2017


On Fri, Mar 17, 2017, at 06:35 AM, Erlon Cruz wrote:
> Hi Clark,
> 
> We have in our infra 2 CI masters. 1 of them has 1 account configured and
> the other 3. This day (March 13) we were doing a test in a second server
> and we cloned the server to test the new scripts,  so we ended up with 7
> zuul threads connecting for requests and posting updates. This test was
> done only that afternoon and I believe that was what flagged your radar.
> 
> We have checked the traffic generated[1] (now we have only the 4 accounts
> activated) and it seems normal. This is the tcpdump from our gateway with
> last octet 193. In that log, the 172.24.45.118 server has 4 accounts and
> the 124 only one. That seems to be a normal flow right?
> 
> 
> [1] http://paste.openstack.org/show/603133/
> 
> On Tue, Mar 14, 2017 at 7:03 PM, Clark Boylan <cboylan at sapwetik.org>
> wrote:
> 
> > On Tue, Mar 14, 2017, at 02:25 PM, Clark Boylan wrote:
> > > On Tue, Mar 14, 2017, at 02:12 PM, Clark Boylan wrote:
> > > > This account was found to be creating thousands of connections to
> > Gerrit
> > > > effectively DoSing it. As a result the account was disabled.
> > > >
> > > > For anyone looking to reenable the account in the future, the account
> > > > number is 17623. The operators of this CI account will want to
> > determine
> > > > why this was happening and correct it before we reenable the account
> > > > though.
> > >
> > > I now have more infos. It appears there are at least two more CI
> > > accounts coming from this IP address (last octet is 193). I have thus
> > > disabled Hitachi HBSD2 CI (16660) and Hitachi Manila HSP CI (22236) as
> > > well. We will need to sort out what is causing all of these connections
> > > to sit on Gerrit and correct that. Please let us know if you need more
> > > help debugging (though info on our end is fairly sparse, just shows
> > > these accounts associated with the IP, but not which specific account is
> > > creating all of the connections).
> > >
> >
> > And now for another update. Disabling these three accounts didn't affect
> > the new connections coming in from that IP address. So instead we have
> > blocked the IP at the firewall and I have reenabled the CI accounts.
> > This means the CI accounts won't work until the connection issue is
> > sorted out, but once thats done and the firewall rule is removed we
> > should be good to go again. Perhaps you can help track down who is
> > coming from your IP address and leaving thousands of stale connections
> > open?
> >
> > Thank you,
> > Clark

To follow up on this and close the thread, the root cause was a
misconfigured Zuul server that couldn't access its private key for ssh
communication to Gerrit. Once that key's permissions were update Zuul
started running happily.

I wrote https://review.openstack.org/447066 to hopefully avoid this
problem in Zuul entirely as well. Deployments that include that change
should avoid DoSing Gerrit if they end up in this situation.

Clark



More information about the Third-party-announce mailing list