[OpenStack-Infra] Thoughts on evolving Zuul
Zaro
zaro0508 at gmail.com
Thu Feb 26 22:41:11 UTC 2015
Thanks Jim. This makes a lot of sense and will hopefully make things
simpler and more robust.
Just a few questions:
1. It looks like zuul can request a specific set of nodes for a job. Do
you envision the typical ansible playbook to install additional things
required for the jobs? or would zuul always need to request a suitable node
for the job?
2. Would there be a way to share environment variables across multiple
shell tasks? For example would it be possible to reference a variable
defined in the job yaml file from inside of a shell script?
-Khai
On Thu, Feb 26, 2015 at 8:59 AM, James E. Blair <corvus at inaugust.com> wrote:
> Hi,
>
> I've been wanting to make some structural changes to Zuul to round it
> out into a coherent system. I don't want to change it too much, but I'd
> also like a clean break with some of the baggage we've been carrying
> around from earlier decisions, and I want it to be able to continue to
> scale up (the config in particular is getting hard to manage with >500
> projects).
>
> I've batted a few ideas around with Monty, and I've written up my
> thoughts below. This is mostly a narrative exploration of what I think
> it should look like. This is not exhaustive, but I think it explores
> most of the major ideas. The next step is to turn this into a spec and
> start iterating on it and getting more detailed.
>
> I'm posting this here first for discussion to see if there are any
> major conceptual things that we should address before we get into more
> detailed spec review. Please let me know what you think.
>
> -Jim
>
> =======
> Goals
> =======
>
> Make zuul scale to thousands of projects.
> Make Zuul more multi-tenant friendly.
> Make it easier to express complex scenarios in layout.
> Make nodepool more useful for non virtual nodes.
> Make nodepool more efficient for multi-node tests.
> Remove need for long-running slaves.
> Make it easier to use Zuul for continuous deployment.
>
> To accomplish this, changes to Zuul's configuration syntax are
> proposed, making it simpler to manage large number of jobs and
> projects, along with a new method of describing and running jobs, and
> a new system for node distribution with Nodepool.
>
> =====================
> Changes To Nodepool
> =====================
>
> Nodepool should be made to support explicit node requests and
> releases. That is to say, it should act more like its name -- a node
> pool.
>
> Rather than having servers add themselves to the pool by waiting for
> them (or Jenkins on their behalf) to register with gearman, nodepool
> should instead define functions to supply nodes on demand. For
> example it might define the gearman functions "get-nodes" and
> "put-nodes". Zuul might request a node for a job by submitting a
> "get-nodes" job with the node type (eg "precise") as an argument. It
> could request two nodes together (in the same AZ) by supplying more
> than one node type in the same call. When complete, it could call
> "put-nodes" with the node identifiers to instruct nodepool to return
> them (nodepool might then delete, rebuild, etc).
>
> This model is much more efficient for multi-node tests, where we will
> no longer need to have special multinode labels. Instead the
> multinode configuration can be much more ad-hoc and vary per job.
>
> The testenv broker used by tripleo behaves somewhat in this manner
> (though it only supports static sets of resources). It also has logic
> to deal with the situation where Zuul might exit unexpectedly and not
> return nodes (though it should strive to do so). This feature in the
> broker should be added to nodepool. Additionally, nodepool should
> support fully static resources (they should become just another node
> type) so that it can handle the use case of the test broker.
>
> =================
> Changes To Zuul
> =================
>
> Zuul is currently fundamentally a single-tenant application. Some
> folks want to use it in a multi-tenant environment. Even within
> OpenStack, we have use for multitenancy. OpenStack might be one
> tenant, and each stackforge project might be another. Even if the big
> tent discussion renders that thinking obsolete, we may still want the
> kind of separation multi-tenancy can provide. The proposed
> implementation is flexible enough to run Zuul completely single tenant
> with shared everything, completely multi-tenant with shared nothing, and
> everything in-between. Being able to adjust just how much is shared or
> required, and how much can be left to individual projects will be very
> useful.
>
> To support this, the main configuration should define tenants, and
> tenants should specify config files to include. These include files
> should define pipelines, jobs, and projects, all of which are
> namespaced to the tenant (so different tenants may have different jobs
> with the same names)::
>
> ### main.yaml
> - tenant:
> name: openstack
> include:
> - global_config.yaml
> - openstack.yaml
>
> Files may be included by more than one tenant, so common items can be
> placed in a common file and referenced globally. This means that for,
> eg, OpenStack, we can define pipelines and our base job definitions
> (with logging info, etc) once, and include them in all of our tenants::
>
> ### main.yaml (continued)
> - tenant:
> name: openstack-infra
> include:
> - global_config.yaml
> - infra.yaml
>
> A tenant may optionally specify repos from which it may derive its
> configuration. In this manner, a repo may keep its Zuul configuration
> within its own repo. This would only happen if the main configuration
> file specified that it is permitted::
>
> ### main.yaml (continued)
> - tenant:
> name: random-stackforge-project
> include:
> - global_config.yaml
> repos:
> - stackforge/random # Specific project config is in-repo
>
> Jobs defined in-repo may not have access to the full feature set
> (including some authorization features). They also may not override
> existing jobs.
>
> Job definitions continue to have the features in the current Zuul
> layout, but they also take on some of the responsibilities currently
> handled by the Jenkins (or other worker) definition::
>
> ### global_config.yaml
> # Every tenant in the system has access to these jobs (because their
> # tenant definition includes it).
> - job:
> name: base
> timeout: 30m
> node: precise # Just a variable for later use
> nodes: # The operative list of nodes
> - name: controller
> image: {node} # Substitute the variable
> auth: # Auth may only be defined in central config, not in-repo
> swift:
> - container: logs
> pre-run: # These specify what to run before and after the job
> - zuul-cloner
> post-run:
> - archive-logs
>
> Jobs have inheritance, and the above definition provides a base level
> of functionality for all jobs. It sets a default timeout, requests a
> single node (of type precise), and requests swift credentials to
> upload logs. Further jobs may extend and override these parameters::
>
> ### global_config.yaml (continued)
> # The python 2.7 unit test job
> - job:
> name: python27
> parent: base
> node: trusty
>
> Our use of job names specific to projects is a holdover from when we
> wanted long-lived slaves on jenkins to efficiently re-use workspaces.
> This hasn't been necessary for a while, though we have used this to
> our advantage when collecting stats and reports. However, job
> configuration can be simplified greatly if we simply have a job that
> runs the python 2.7 unit tests which can be used for any project. To
> the degree that we want to know how often this job failed on nova, we
> can add that information back in when reporting statistics. Jobs may
> have multiple aspects to accomodate differences among branches, etc.::
>
> ### global_config.yaml (continued)
> # Version that is run for changes on stable/icehouse
> - job:
> name: python27
> parent: base
> branch: stable/icehouse
> node: precise
>
> # Version that is run for changes on stable/juno
> - job:
> name: python27
> parent: base
> branch: stable/juno # Could be combined into previous with regex
> node: precise # if concept of "best match" is defined
>
> Jobs may specify that they require more than one node::
>
> ### global_config.yaml (continued)
> - job:
> name: devstack-multinode
> parent: base
> node: trusty # could do same branch mapping as above
> nodes:
> - name: controller
> image: {node}
> - name: compute
> image: {node}
>
> Jobs defined centrally (i.e., not in-repo) may specify auth info::
>
> ### global_config.yaml (continued)
> - job:
> name: pypi-upload
> parent: base
> auth:
> password:
> pypi-password: pypi-password
> # This looks up 'pypi-password' from an encrypted yaml file
> # and adds it into variables for the job
>
> Pipeline definitions are similar to the current syntax, except that it
> supports specifying additional information for jobs in the context of
> a given project and pipeline. For instance, rather than specifying
> that a job is globally non-voting, you may specify that it is
> non-voting for a given project in a given pipeline::
>
> ### openstack.yaml
> - project:
> name: openstack/nova
> gate:
> queue: integrated # Shared queues are manually built
> jobs:
> - python27 # Runs version of job appropriate to branch
> - devstack
> - devstack-deprecated-feature:
> branch: stable/juno # Only run on stable/juno changes
> voting: false # Non-voting
> post:
> jobs:
> - tarball:
> jobs:
> - pypi-upload
>
> Currently unique job names are used to build shared change queues.
> Since job names will no longer be unique, shared queues must be
> manually constructed by assigning them a name. Projects with the same
> queue name for the same pipeline will have a shared queue.
>
> A subset of functionality is avaible to projects that are permitted to
> use in-repo configuration::
>
> ### stackforge/random/.zuul.yaml
> - job:
> name: random-job
> parent: base # From global config; gets us logs
> node: precise
>
> - project:
> name: stackforge/random
> gate:
> jobs:
> - python27 # From global config
> - random-job # Flom local config
>
> The executable content of jobs should be defined as ansible playbooks.
> Playbooks can be fairly simple and might consist of little more than
> "run this shell script" for those who are not otherwise interested in
> ansible::
>
> ### stackforge/random/playbooks/random-job.yaml
> ---
> hosts: controller
> tasks:
> - shell: run_some_tests.sh
>
> Global jobs may define ansible roles for common functions::
>
> ### openstack-infra/zuul-playbooks/python27.yaml
> ---
> hosts: controller
> roles:
> - tox:
> env: py27
>
> Because ansible has well-articulated multi-node orchestration
> features, this permits very expressive job definitions for multi-node
> tests. A playbook can specify different roles to apply to the
> different nodes that the job requested::
>
> ### openstack-infra/zuul-playbooks/devstack-multinode.yaml
> ---
> hosts: controller
> roles:
> - devstack
> ---
> hosts: compute
> roles:
> - devstack-compute
>
> Additionally, if a project is already defining ansible roles for its
> deployment, then those roles may be easily applied in testing, making
> CI even closer to CD. Finally, to make Zuul more useful for CD, Zuul
> may be configured to run a job (ie, ansible role) on a specific node.
>
> The pre- and post-run entries in the job definition might also apply
> to ansible playbooks and can be used to simplify job setup and
> cleanup::
>
> ### openstack-infra/zuul-playbooks/zuul-cloner.yaml
> ---
> hosts: all
> roles:
> - zuul-cloner: {{zuul}}
>
> Where the zuul variable is a dictionary containing all the information
> currently transmitted in the ZUUL_* environment variables. Similarly,
> the log archiving script can copy logs from the host to swift.
>
> A new Zuul component would be created to execute jobs. Rather than
> running a worker process on each node (which requires installing
> software on the test node, and establishing and maintaining network
> connectivity back to Zuul, and the ability to coordinate actions across
> nodes for multi-node tests), this new component will accept jobs from
> Zuul, and for each one, write an ansible inventory file with the node
> and variable information, and then execute the ansible playbook for that
> job. This means that the new Zuul component will maintain ssh
> connections to all hosts currently running a job. This could become a
> bottleneck, but ansible and ssh have been known to scale to a large
> number of simultaneous hosts, and this component may be scaled
> horizontally. It should be simple enough that it could even be
> automatically scaled if needed. In turn, however, this does make node
> configuration simpler (test nodes need only have an ssh public key
> installed) and makes tests behave more like deployment.
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20150226/57d264af/attachment-0001.html>
More information about the OpenStack-Infra
mailing list