Open Stack

Wed Jul 6 22:10:12 UTC 2016

Hi everyone,

I brought this up a few meetings ago, but I wanted to collect the
thoughts in one place to more easily get infra team input on the
status of work toward a translations checksite for the i18n team. As
some history, the i18n team wrote a specification a while back which
we approved, which folks can read for background:
http://specs.openstack.org/openstack-infra/infra-specs/specs/translation_check_site.html

The original assignees were mostly i18n people, and have been pulled
off to other things. As one of the primary infra liaisons with the
i18n team I've been pulled into helping, but my ability to help is
limited due to time and need for collaboration with some other infra
folks on some decisions. So here I am emailing the rest of the team
for help. Plus we also wanted to bring the conversations happening
privately about roadblocks to happen publicly so I don't continue to
be a blocker here.

Over the past several months Frank Kloeker worked to write a
preliminary Puppet module for us in puppet-translation_checksite (now
merged) and he has an outstanding corresponding system-config patch:
https://review.openstack.org/#/c/276466/

As the spec outlines, the assumption was that we'd run this on a
long-lived server in some way, updating the translation strings
directly from Zanata daily, and re-installing DevStack once a week.
We've run into a few issues with this, which I'd appreciate some
thoughts about so I have some help evaluating how to move forward.

1. The Puppet module is really fragile. In theory it works, Frank did
a good job with it. But almost every time I run it I run into another
problem. Sometimes it has to do with a DevStack error (there was a
known problem a couple times when I tried to run it), or trouble with
my environment (DevStack doesn't fail gracefully if a dependency is
not satisfied due to network timeout or whatnot) and sometimes it's
just a change in our infra that breaks things (yesterday it was an
unexpected problem with the puppet apt module).

The module itself doesn't yet have any recovery for any of this. If we
had DevStack running along well for a week, and it gets to the next
week and it fails to build, we're stuck with a broken system and no
notification that it's broken. We could spend time building fault
tolerance and build failure alerts into it, but I want to make sure
we're on the right track first.

2. We don't actually have a solution to run "new" DevStack once a
week. Some options:

 - The once a week rebuild is just known downtime for the checksite,
have a cron job to ./unstack and delete /opt/devstack?
 - Get to a place we're we're auto-building new servers, and just
build a new one and swap DNS once a week once we know the new server
also is running properly with something like a health script that must
pass
 - Something else?

3. It takes a long time to run DevStack's stack.sh, which this module
does. Current timeout is 3600 (1 hour), but I have to bump it up to
run it locally in my tests. Even at an hour, this will really gum up
the works if it's part of system-config and running alongside all our
other ansible+puppet runs, even if the building of DevStack is only
once a week. Is this acceptable to us?

4. While we will have i18n team members logging into the Horizon
interface to see the progress of their translations work (that's the
whole point), the translations checksite is essentially read-only and
we have a pretty good mechanism in place for spinning up daily
DevStack instances for all our tests. Maybe we should back-peddle and
somehow leverage this tooling instead?

Thanks everyone.

-- 
Elizabeth Krumbach Joseph || Lyz || pleia2

Open Stack

[OpenStack-Infra] Work toward a translations checksite and call for help

OpenStack

Community

Documentation

Branding & Legal