[openstack-dev] Zuul v3 - What's Coming: What to expect with the Zuul v3 Rollout

Monty Taylor mordred at inaugust.com
Tue Feb 28 23:26:12 UTC 2017

Hi everybody!

This content can also be found at
http://inaugust.com/posts/whats-coming-zuulv3.html - but I've pasted it
in here directly because I know that some folks don't like clicking links.

tl;dr - At last week's OpenStack PTG, the OpenStack Infra team ran the
first Zuul v3 job, so it's time to start getting everybody ready for
what's coming

**Don't Panic!** Awesome changes are coming, but you are NOT on the hook
for rewriting all of your project's gate jobs or anything crazy like
that. Now grab a seat by the fire, pour yourself a drink while I spin a
yarn about days gone by and days yet to come.

First, some background

The OpenStack Infra team has been hard at work for quite a while on a
new version of Zuul (where by 'quite some time' I mean that Jim Blair
and I had our first Zuul v3 design whiteboarding session in 2014). As
you might be able to guess given the amount of time, there are some big
things coming that will have a real and visible impact on the OpenStack
community and beyond. Since we have a running Zuul v3 now [1], it seemed
like the time to start getting folks up to speed on what to expect.

There is other deep-dive information on architecture and rationale if
you're interested[2], but for now we'll focus on what's relevant for end
users. We're also going to start sending out a bi-weekly "Status of Zuul
v3" email to the openstack-dev at lists.openstack.org mailing list ... so
stay tuned!

**Important Note** This post includes some code snippets - but v3 is
still a work in progress. We know of at least one breaking change that
is coming to the config format, so please treat this not as a tutorial,
but as a conceptual overview. Syntax is subject to change.

The Big Ticket Items

While there are a bunch of changes behind the scenes, there are a
reasonably tractable number of user-facing differences.

* Self-testing In-Repo Job Config
* Ansible Job Content
* First-class Multi-node Jobs
* Improved Job Reuse
* Support for non-OpenStack Code and Node Systems
* and Much, Much More

Self-testing In-Repo Job Config

This is probably the biggest deal. There are a lot of OpenStack Devs
(around 2k in Ocata) and a lot of repositories (1689) There a lot fewer
folks on the project-config-core team who are the ones who review all of
the job config changes (please everyone thank Andreas Jaeger next time
you see him). That's not awesome.

Self-testing in-repo job config is awesome.

Many systems out there these days have an in-repo job config system.
Travis CI has had it since day one, and Jenkins has recently added
support for a Jenkinsfile inside of git repos. With Zuul v3, we'll have
it too.

Once we roll out v3 to everyone, as a supplement to jobs defined in our
central config repositories, each project will be able to add a
zuul.yaml file to their own repo:

- job:
    name: my_awesome_job
      - name: controller
        label: centos-7

- project:
    name: openstack/awesome_project
        - my_awesome_job

It's a small file, but there is a lot going on, so let's unpack it.

First we define a job to run. It's named my_awesome_job and it needs one
node. That node will be named controller and will be based on the
centos-7 base node in nodepool.

In the next section, we say that we want to run that job in the check
pipeline, which in OpenStack is defined as the jobs that run when
patchsets are proposed.

And it's also self-testing!

Everyone knows the fun game of writing a patch to the test jobs, getting
it approved, then hoping it works once it starts running. With Zuul v3
in-repo jobs, if there is a change to job definitions in a proposed
patch, that patch will be tested with those changes applied. And since
it's Zuul, Depends-On footers are honored as well - so iteration on
getting a test job right becomes just like iterating on any other patch
or sequence of patches.

Ansible Job Content

The job my_awesome_job isn't very useful if it doesn't define any
content. That's done in the repo as well, in playbooks/my_awesome_job.yaml:

- hosts: controller
    - name: Run make tests
      shell: make distcheck

As previously mentioned, the job content is now defined in Ansible
rather than using our Jenkins Job Builder tool. This playbook is going
to run a tasks on a host called controller which you may remember we
requested in the job definition. On that host, it will run make
distcheck. Pretty much anything you can do in Ansible, you can do in a
Zuul job now, and the playbooks should also be re-usable outside of a
testing context.

First Class Multi-Node Jobs

The previous example was for running a job on a node. What if you want
to do multi-node?

- job:
    name: my_awesome_job
      - name: controller
        label: ubuntu-xenial
      - name: compute
        label: centos-7

- project:
    name: openstack/awesome_project
        - my_awesome_job

As you may have surmised, nodes is a list, so you can have more than
one. Then, since Ansible is naturally mutli-node aware, you use that to
write the multi-node content:

- hosts: controller
    - name: Install Keystone
      shell: pip install {{ zuul.git_root }}/openstack/keystone
- hosts: compute
    - name: Install Nova
      shell: pip install {{ zuul.git_root }}/openstack/nova
- hosts: *
    - name: Install CloudKitty
      shell: pip install {{ zuul.git_root }}/openstack/cloudkitty

That will install Keystone on controller, Nova on compute and CloudKitty
on both.

Improved Job Reuse

In our current system, because of some details about how Jenkins works
and the fact that our CI system used to be based on Jenkins, we have a
ton of templated jobs that lead both to magically long job names and a
bunch of cargo culting of job definitions.

In the new system, a lot of the duplication goes away. So instead of
having gate-nova-python27 and gate-swift-python27 and
gate-keystone-python27, there will just be a job called "python27" and
each of the projects can use it. Similarly, for more complex job content
like devstack-gate, since Ansible is a fully-fledged system on its own
that was designed for modularity and re-use, we can compose things into
roles that take parameters and can be reused without copy/paste.

(ssssh! In fact, the python27 job will almost certainly be a job that
uses an extremely small playbook that itself uses a role called tox. But
also, the tox role, the python27 playbook and the python27 job
definition will all be things we define centrally in a standard library
of pieces, so as a user of the system you should be able to just choose
to run "python27" and not worry about it - unless you want to dig in and
learn more.)

Support for non-OpenStack Code and Node Systems

Zuul was originally written to support the OpenStack project, but since
then we've grown more people who have interest in running Zuul. Since we
wrote it the first time to solve our problems of extremely massive
scale, we didn't put a ton of effort into making it easily consumable
elsewhere. That hasn't stopped people, and there are tons of Zuul
installations out there ... but that doesn't mean life is easy for those
folks. With Zuul v3 we've also been explicitly focused on making it much
more easily reconsumable.

Part of supporting friends in other communities means embracing support
of tools that OpenStack does not use. The fine folks at Gooddata wrote a
set of patches adding support for Github which they have been using for
a while. We'll be landing those, which should allow us to add jobs to
the system that check things like "will this pull request to pip break
devstack". There is also work from the CentOS community via a tool
called linch-pin that we're looking at incorporating into Nodepool that
should allow creating build nodes on any system Ansible knows how to
talk to. Those features are intended to follow quickly on after we get
OpenStack migrated.

What's Next?

1) Zuul on Zuul
2) Infra Repos
3) Job conversion Script
4) OpenStack Migration

We currently have a Zuul v3 running against changes to the Zuul repo.
We're using to iterate on job content and other features. There is a
change coming to the job definition syntax to allow job dependencies to
be a graph instead of just a tree which will be fairly invasive, so
we're keeping the affected surface area small until that's ready.

Once we're happy with how things are running, we'll move the rest of the
Infra repos over, probably in chunks. Although Infra test jobs are
typically a bit different than the jobs in the rest of OpenStack, we do
have enough representative examples that we should be able to work out
the kinks before we throw things at other folks. (shade and nodepool
both do integration testing on devstack-gate jobs, for instance)

While we work on Infra migration, we'll be developing a conversion
script to convert the existing jobs. A good portion of that will be
fully automatable. For instance, mapping everyone's
gate-{project}-python27 to a reference to the python27 job is easy for a
computer to do. However, there's still a ton of snowflake jobs that
we'll likely wind up just converting the content of as is and then
iterating on refactoring to be more efficient or improved over time.

Then the Big Day will arrive. When the conversion script is as good as
we can get it and we're satisfied with stability of the job language,
security and scalability, there will be a Big Bang cutover of all of the
rest of OpenStack. If all goes well, most developers should mostly just
notice that a bunch of job names got shorter and that it's a user named
Zuul commenting on patches. Folks who have patches to project-config
in-flight at that time will need to rework patches, but the conversion
script should hopefully make that a minimal burden.

and Much, Much More

There are far too many new and exciting things in Zuul v3 to cover in a
single post, and many of the topics (such as Ansible Jobs, or Job
Inheritance and Reuse) are deep topics we can dive in to over time. The
long and short of it is that Zuul v3 is coming soon to an OpenStack
Infra near you, so expect more and more communication about what that
means over the next few months.


[1] OpenStack is not running Zuul v3 in production at the moment. We
have an instance running and only responding to events from the Zuul v3
repo. As of the time of this writing, OpenStack is still running 2.5 in
production. Believe me - when we hit production, you'll know it.

[2] Links to deeper information:
 * "There is no Jenkins, only Zuul" post about Zuul 2.5
 * Jim Blair's Zuul v3 Talk from OpenStack Summit Barcelona
 * Zuul v3 Spec
 * Pre-PTG Zuul v3 Conceptual Deep Dive


More information about the OpenStack-dev mailing list