[OpenStack-Infra] Gitea next steps
James E. Blair
corvus at inaugust.com
Mon Feb 4 19:04:34 UTC 2019
Hi,
At the last infra team meeting, we talked about whether and how to
proceed with Gitea. I'd like to summarize that quickly and make sure
we're all on board with it.
* We will continue to deploy our own Kubernetes using the
k8s-for-openstack Ansible playbook that Monty found. Since that's
developed by a third-party, we will use it by checking out the
upstream source from GitHub, but pinning to a known sha so that we
don't encounter surprises.
* We discussed deploying with a new version of rook which does not
require the flex driver, but it turns out I was a bit ahead of things
-- that hasn't landed yet. So we can probably keep our current
deployment.
Ian raised two new issues:
1) We should verify that the system still functions if our single-master
Kubernetes loses its master.
Monty and I tried this -- it doesn't. The main culprit here seems to be
DNS. The single master is responsible for intra-(and extra!)-cluster
DNS. This makes gitea unhappy for three reasons: a) if its SQL
connections have gone idle and terminated, it cannot re-establish them,
and b) it is unable to resolve remote hostnames for avatars, which can
greatly slow down page loads, and c) the replication receiver is not a
long running process, it's just run over SSH, so it can't connect to the
database either, and therefore replication fails.
The obvious solution, use a multi-master setup, apparently has issues if
k8s is deployed in a cloud with LoadBalancer objects (which we are
using).
Kubernetes does have support for scale-out DNS, it's not clear whether
that still has a SPOF though. Monty is experimenting with this.
If that doesn't improve things, we may still want to proceed since the
system should still mostly work for browsing and git clones if the
master fails, and full operation will resume when it comes online.
2) Rook is difficult to upgrade.
This appears to be the case. When it does come time to upgrade rook, we
may want to simply build a new Kubernetes cluster for the system.
Presumably by that point, it won't require the flexvolume driver, which
will be a good reason to make a new cluster anyway, and perhaps further
upgrades after that won't be as complicated.
Once we conclude investigation into issue #1, I think these are the next
steps:
* Land the patches to manage the opendev k8s cluster with Ansible.
* Pin the k8s-on-openstack repo to the current sha.
* Add HTTPS termination to the cluster.
* Update opendev.org DNS to point to the cluster.
* Treat this as a soft-launch of the production service. Do not
publicise it or encourage people to switch to it yet, but continue to
observe it as we complete the rest of the tasks in [1].
[1] http://specs.openstack.org/openstack-infra/infra-specs/specs/opendev-gerrit.html#work-items
-Jim
More information about the OpenStack-Infra
mailing list