[OpenStack-Infra] afs02 r/o volume mirrors - ongoing incident
Ian Wienand
iwienand at redhat.com
Thu May 24 07:40:58 UTC 2018
Hi,
We were notified of an issue around 22:45GMT with the volumes backing
the storage on afs02.dfw.o.o, which holds R/O mirrors for our AFS
volumes.
It seems that during this time there were a number of "vos release"s
in flight, or started, that ended up with volumes in a range of
unreliable states that made them un-releaseable (essentially halting
mirror updates).
Several of the volumes were recoverable with a manual "vos unlock" and
re-releasing the volume. However, others were not.
To keep it short, fairly extensive debugging took place [2], but we
had corrupt volumes and deadlocked transactions between afs01 & afs02
with no reasonable solution.
In an effort to resolve this, the afs01 & 02 servers were restarted to
clear all old transactions, and for the affected mirrors I essentially
removed their read-only copies and re-added them with:
k5start -t -f /etc/afsadmin.keytab service/afsadmin -- vos unlock $MIRROR
k5start -t -f /etc/afsadmin.keytab service/afsadmin -- vos remove -server afs02.dfw.openstack.org -partition a -id $MIRROR.readonly
k5start -t -f /etc/afsadmin.keytab service/afsadmin -- vos release -v $MIRROR
k5start -t -f /etc/afsadmin.keytab service/afsadmin -- vos addsite -server afs02.dfw.openstack.org -partition a -id $MIRROR
The following volumes needed to be recovered
mirror.fedora
mirror.pypi
mirror.ubuntu
mirror.ubuntu-ports
mirror.debian
(these are the largest repositories, and maybe it's no surprise that's
why they became corrupt?)
I have placed mirror-update.o.o in the emergency file, and commented
out all cron jobs on it.
Right now, I am running a script in a screen as the root user on
mirror-update.o.o to "vos release" these in sequence
(/root/release.sh). Hopefully, this brings thing back into sync by
recreating the volumes. If not, more debugging will be required :/
Please feel free to check in on this, otherwise I will update tomorrow
.au time
-i
[1] http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-05-23.log.html#t2018-05-23T22:43:46
[2] http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-05-24.log.html#t2018-05-24T04:01:21
More information about the OpenStack-Infra
mailing list