[all] Re: review.opendev.org Downtime March 20, 2020 at 1400UTC

Clark Boylan cboylan at sapwetik.org
Mon Mar 30 22:33:28 UTC 2020


On Fri, Mar 20, 2020, at 7:48 AM, Sean Mooney wrote:
> On Fri, 2020-03-20 at 14:28 +0000, Jeremy Stanley wrote:
> > On 2020-03-20 14:14:57 +0000 (+0000), Sean Mooney wrote:
> > > on a related note i noticed i was getting different contenent from
> > > https://review.openstack.org/p/openstack/nova and
> > > ssh://sean-k-mooney@review.openstack.org:29418/openstack/nova.git
> > > last week.
> > > 
> > > these were my remotes in my nove repo
> > > gerrit	ssh://sean-k-mooney@review.openstack.org:29418/openstack/nova.git (fetch)
> > > gerrit	ssh://sean-k-mooney@review.openstack.org:29418/openstack/nova.git (push)
> > > origin	https://review.openstack.org/p/openstack/nova (fetch)
> > > origin	https://review.openstack.org/p/openstack/nova (push)
> > 
> > I definitely don't recommend cloning from review.o.o, better to rely
> > on https://opendev.org/openstack/nova	 in your origin remotes.
> ya i normally set it up against either github or 
> https://opendev.org/openstack/nova
> the review.o.o is much larger and take up way more space on disk so i 
> normally avoid
> it but i think i origianly create this repo usign gertty for reviews.
> 
> this was on my local laptop and i normally only use that repo for doing 
> quick fixes.
> 
> > 
> > > the gerrit remote when i did a fetch was up to date with
> > > https://opendev.org/openstack/nova and with
> > > https://github.com/openstack/nova but
> > > https://review.openstack.org/p/openstack/nova was behind both of
> > > them by a few hours maybe a day i dont rememeber but i just
> > > remember tinking it was odd.
> > 
> > That https://review.openstack.org/p/openstack/nova URL is "just
> > another replica" like opendev.org and github.com (the /p is directed
> > to a local copy Gerrit is replicating to a local copy on the
> > server's filesystem).
> > 
> > > is this somethign ye were aware of that could happen? it was if
> > > the redirect from review.openstack.org was pointing an out of sync
> > > gerrit backend.
> > 
> > [...]
> > 
> > Gerrit can't guarantee consistency for its replication tasks, so it
> > can certainly happen. I think we had plans to remove that /p replica
> > anyway (it's going to potentially conflict with some URLs for newer
> > Gerrit versions), but generally if you notice a discrepancy between
> > ssh://sean-k-mooney@review.openstack.org:29418/openstack/nova.git
> > and https://opendev.org/openstack/nova do let us know, we've been
> > trying to get on top of a race condition in our Gitea updates where
> > Gerrit will silently fail to replicate refs to some of the servers
> > while they're in the middle of restarting.
> 
> ya if i notice an issue between those too ill let you know but in generall
> they seam to mostly be in sync. 
> 

To followup on this we think we addressed the major issue with https://review.opendev.org/#/c/711130/. We think the important bit is ensuring that Gerrit can detect that replication is going to fail by turning off the ssh daemon before other Gitea services and starting it last.

Gerrit will happily retry replication when it detects that it is in a failure state. Unfortunately, the old restart logic had things stopping together and the ssh side would happily accept data if the actual services was done and that led to problems.

Again, do let us know if you notice this in the future, but we're hopeful this recent change addresses the problem.

Clark



More information about the openstack-discuss mailing list