[OpenStack-Infra] Thoughts on evolving Zuul

Jay Pipes jaypipes at gmail.com
Sat Feb 28 16:41:40 UTC 2015


Jim, great stuff. A couple suggestions inline :)

On 02/26/2015 09:59 AM, James E. Blair wrote:
> A tenant may optionally specify repos from which it may derive its
> configuration.  In this manner, a repo may keep its Zuul configuration
> within its own repo.  This would only happen if the main configuration
> file specified that it is permitted::
>
>    ### main.yaml (continued)
>    - tenant:
>        name: random-stackforge-project
>        include:
> 	- global_config.yaml
>        repos:
> 	- stackforge/random  # Specific project config is in-repo

Might I suggest that instead of a repos: YAML block, that instead, the 
include: YAML block allow URIs. So, to support some random Zuul config 
in a stackforge repo, you could do:

include:
  - global_config.yaml
  - https://git.openstack.org/stackforge/random/tools/zuul.yml

That would make the configuration simpler, I think.

> Jobs defined in-repo may not have access to the full feature set
> (including some authorization features).  They also may not override
> existing jobs.
>
> Job definitions continue to have the features in the current Zuul
> layout, but they also take on some of the responsibilities currently
> handled by the Jenkins (or other worker) definition::
>
>    ### global_config.yaml
>    # Every tenant in the system has access to these jobs (because their
>    # tenant definition includes it).
>    - job:
>        name: base
>        timeout: 30m
>        node: precise   # Just a variable for later use
>        nodes:  # The operative list of nodes
> 	- name: controller
> 	  image: {node}  # Substitute the variable
>        auth:  # Auth may only be defined in central config, not in-repo
> 	swift:
> 	  - container: logs
>        pre-run:  # These specify what to run before and after the job
> 	- zuul-cloner
>        post-run:
> 	- archive-logs

++

> Jobs have inheritance, and the above definition provides a base level
> of functionality for all jobs.  It sets a default timeout, requests a
> single node (of type precise), and requests swift credentials to
> upload logs.  Further jobs may extend and override these parameters::
>
>    ### global_config.yaml (continued)
>    # The python 2.7 unit test job
>    - job:
>        name: python27
>        parent: base
>        node: trusty

Yes, this is great :)

> Our use of job names specific to projects is a holdover from when we
> wanted long-lived slaves on jenkins to efficiently re-use workspaces.
> This hasn't been necessary for a while, though we have used this to
> our advantage when collecting stats and reports.  However, job
> configuration can be simplified greatly if we simply have a job that
> runs the python 2.7 unit tests which can be used for any project.  To
> the degree that we want to know how often this job failed on nova, we
> can add that information back in when reporting statistics.  Jobs may
> have multiple aspects to accomodate differences among branches, etc.::
>
>    ### global_config.yaml (continued)
>    # Version that is run for changes on stable/icehouse
>    - job:
>        name: python27
>        parent: base
>        branch: stable/icehouse
>        node: precise
>
>    # Version that is run for changes on stable/juno
>    - job:
>        name: python27
>        parent: base
>        branch: stable/juno  # Could be combined into previous with regex
>        node: precise        # if concept of "best match" is defined
>
> Jobs may specify that they require more than one node::
>
>    ### global_config.yaml (continued)
>    - job:
>        name: devstack-multinode
>        parent: base
>        node: trusty  # could do same branch mapping as above
>        nodes:
> 	- name: controller
> 	  image: {node}
> 	- name: compute
> 	  image: {node}
>
> Jobs defined centrally (i.e., not in-repo) may specify auth info::
>
>    ### global_config.yaml (continued)
>    - job:
>        name: pypi-upload
>        parent: base
>        auth:
> 	password:
> 	  pypi-password: pypi-password
> 	  # This looks up 'pypi-password' from an encrypted yaml file
> 	  # and adds it into variables for the job
>
> Pipeline definitions are similar to the current syntax, except that it
> supports specifying additional information for jobs in the context of
> a given project and pipeline.  For instance, rather than specifying
> that a job is globally non-voting, you may specify that it is
> non-voting for a given project in a given pipeline::
>
>    ### openstack.yaml
>    - project:
>        name: openstack/nova
>        gate:
> 	queue: integrated  # Shared queues are manually built
> 	jobs:
> 	  - python27  # Runs version of job appropriate to branch
> 	  - devstack
> 	  - devstack-deprecated-feature:
> 	      branch: stable/juno  # Only run on stable/juno changes
> 	      voting: false  # Non-voting
>        post:
> 	jobs:
> 	  - tarball:
> 	      jobs:
> 		- pypi-upload
>
> Currently unique job names are used to build shared change queues.
> Since job names will no longer be unique, shared queues must be
> manually constructed by assigning them a name.  Projects with the same
> queue name for the same pipeline will have a shared queue.
>
> A subset of functionality is avaible to projects that are permitted to
> use in-repo configuration::
>
>    ### stackforge/random/.zuul.yaml
>    - job:
>        name: random-job
>        parent: base      # From global config; gets us logs
>        node: precise
>
>    - project:
>        name: stackforge/random
>        gate:
> 	jobs:
> 	  - python27    # From global config
> 	  - random-job  # Flom local config

Again, here I would support URI-based job config directives. Why? Well, 
let's say that a project has a separate repository that contains job and 
test configuration files. You'd be able to set a URI here and continue 
to keep your job and test configurations separate from the code base...

> The executable content of jobs should be defined as ansible playbooks.
> Playbooks can be fairly simple and might consist of little more than
> "run this shell script" for those who are not otherwise interested in
> ansible::
>
>    ### stackforge/random/playbooks/random-job.yaml
>    ---
>    hosts: controller
>    tasks:
>      - shell: run_some_tests.sh
>
> Global jobs may define ansible roles for common functions::
>
>    ### openstack-infra/zuul-playbooks/python27.yaml
>    ---
>    hosts: controller
>    roles:
>      - tox:
> 	env: py27
>
> Because ansible has well-articulated multi-node orchestration
> features, this permits very expressive job definitions for multi-node
> tests.  A playbook can specify different roles to apply to the
> different nodes that the job requested::
>
>    ### openstack-infra/zuul-playbooks/devstack-multinode.yaml
>    ---
>    hosts: controller
>    roles:
>      - devstack
>    ---
>    hosts: compute
>    roles:
>      - devstack-compute
>
> Additionally, if a project is already defining ansible roles for its
> deployment, then those roles may be easily applied in testing, making
> CI even closer to CD.  Finally, to make Zuul more useful for CD, Zuul
> may be configured to run a job (ie, ansible role) on a specific node.
>
> The pre- and post-run entries in the job definition might also apply
> to ansible playbooks and can be used to simplify job setup and
> cleanup::
>
>    ### openstack-infra/zuul-playbooks/zuul-cloner.yaml
>    ---
>    hosts: all
>    roles:
>      - zuul-cloner: {{zuul}}
>
> Where the zuul variable is a dictionary containing all the information
> currently transmitted in the ZUUL_* environment variables.  Similarly,
> the log archiving script can copy logs from the host to swift.
>
> A new Zuul component would be created to execute jobs.  Rather than
> running a worker process on each node (which requires installing
> software on the test node, and establishing and maintaining network
> connectivity back to Zuul, and the ability to coordinate actions across
> nodes for multi-node tests), this new component will accept jobs from
> Zuul, and for each one, write an ansible inventory file with the node
> and variable information, and then execute the ansible playbook for that
> job.  This means that the new Zuul component will maintain ssh
> connections to all hosts currently running a job.  This could become a
> bottleneck, but ansible and ssh have been known to scale to a large
> number of simultaneous hosts, and this component may be scaled
> horizontally.  It should be simple enough that it could even be
> automatically scaled if needed.  In turn, however, this does make node
> configuration simpler (test nodes need only have an ssh public key
> installed) and makes tests behave more like deployment.

+100 on the Ansible-related suggested changes. :)

Thanks!
-jay



More information about the OpenStack-Infra mailing list