[OpenStack-Infra] Thoughts on evolving Zuul
Monty Taylor
mordred at inaugust.com
Thu Feb 26 23:04:11 UTC 2015
On 02/26/2015 05:41 PM, Zaro wrote:
> Thanks Jim. This makes a lot of sense and will hopefully make things
> simpler and more robust.
>
> Just a few questions:
I am not Jim - but I'm going to answer anyway ...
> 1. It looks like zuul can request a specific set of nodes for a job. Do
> you envision the typical ansible playbook to install additional things
> required for the jobs? or would zuul always need to request a suitable node
> for the job?
I think we're leaning towards fewer types of images - so I'd expect
playbooks to specialize an image as step one. This is, of course,
similar to what many jobs do today already, with installation steps
before revoke-sudo is run.
> 2. Would there be a way to share environment variables across multiple
> shell tasks? For example would it be possible to reference a variable
> defined in the job yaml file from inside of a shell script?
Yes - although it might not be specifically environment variables.
Sharing variables from one task to another or taking the output
variables from one task and referencing them as part of a subsequent
task are well supported (I can even show you examples of doing this in
the recent launch_node work if you wanna see what it looks like)
> -Khai
>
>
> On Thu, Feb 26, 2015 at 8:59 AM, James E. Blair <corvus at inaugust.com> wrote:
>
>> Hi,
>>
>> I've been wanting to make some structural changes to Zuul to round it
>> out into a coherent system. I don't want to change it too much, but I'd
>> also like a clean break with some of the baggage we've been carrying
>> around from earlier decisions, and I want it to be able to continue to
>> scale up (the config in particular is getting hard to manage with >500
>> projects).
>>
>> I've batted a few ideas around with Monty, and I've written up my
>> thoughts below. This is mostly a narrative exploration of what I think
>> it should look like. This is not exhaustive, but I think it explores
>> most of the major ideas. The next step is to turn this into a spec and
>> start iterating on it and getting more detailed.
>>
>> I'm posting this here first for discussion to see if there are any
>> major conceptual things that we should address before we get into more
>> detailed spec review. Please let me know what you think.
>>
>> -Jim
>>
>> =======
>> Goals
>> =======
>>
>> Make zuul scale to thousands of projects.
>> Make Zuul more multi-tenant friendly.
>> Make it easier to express complex scenarios in layout.
>> Make nodepool more useful for non virtual nodes.
>> Make nodepool more efficient for multi-node tests.
>> Remove need for long-running slaves.
>> Make it easier to use Zuul for continuous deployment.
>>
>> To accomplish this, changes to Zuul's configuration syntax are
>> proposed, making it simpler to manage large number of jobs and
>> projects, along with a new method of describing and running jobs, and
>> a new system for node distribution with Nodepool.
>>
>> =====================
>> Changes To Nodepool
>> =====================
>>
>> Nodepool should be made to support explicit node requests and
>> releases. That is to say, it should act more like its name -- a node
>> pool.
>>
>> Rather than having servers add themselves to the pool by waiting for
>> them (or Jenkins on their behalf) to register with gearman, nodepool
>> should instead define functions to supply nodes on demand. For
>> example it might define the gearman functions "get-nodes" and
>> "put-nodes". Zuul might request a node for a job by submitting a
>> "get-nodes" job with the node type (eg "precise") as an argument. It
>> could request two nodes together (in the same AZ) by supplying more
>> than one node type in the same call. When complete, it could call
>> "put-nodes" with the node identifiers to instruct nodepool to return
>> them (nodepool might then delete, rebuild, etc).
>>
>> This model is much more efficient for multi-node tests, where we will
>> no longer need to have special multinode labels. Instead the
>> multinode configuration can be much more ad-hoc and vary per job.
>>
>> The testenv broker used by tripleo behaves somewhat in this manner
>> (though it only supports static sets of resources). It also has logic
>> to deal with the situation where Zuul might exit unexpectedly and not
>> return nodes (though it should strive to do so). This feature in the
>> broker should be added to nodepool. Additionally, nodepool should
>> support fully static resources (they should become just another node
>> type) so that it can handle the use case of the test broker.
>>
>> =================
>> Changes To Zuul
>> =================
>>
>> Zuul is currently fundamentally a single-tenant application. Some
>> folks want to use it in a multi-tenant environment. Even within
>> OpenStack, we have use for multitenancy. OpenStack might be one
>> tenant, and each stackforge project might be another. Even if the big
>> tent discussion renders that thinking obsolete, we may still want the
>> kind of separation multi-tenancy can provide. The proposed
>> implementation is flexible enough to run Zuul completely single tenant
>> with shared everything, completely multi-tenant with shared nothing, and
>> everything in-between. Being able to adjust just how much is shared or
>> required, and how much can be left to individual projects will be very
>> useful.
>>
>> To support this, the main configuration should define tenants, and
>> tenants should specify config files to include. These include files
>> should define pipelines, jobs, and projects, all of which are
>> namespaced to the tenant (so different tenants may have different jobs
>> with the same names)::
>>
>> ### main.yaml
>> - tenant:
>> name: openstack
>> include:
>> - global_config.yaml
>> - openstack.yaml
>>
>> Files may be included by more than one tenant, so common items can be
>> placed in a common file and referenced globally. This means that for,
>> eg, OpenStack, we can define pipelines and our base job definitions
>> (with logging info, etc) once, and include them in all of our tenants::
>>
>> ### main.yaml (continued)
>> - tenant:
>> name: openstack-infra
>> include:
>> - global_config.yaml
>> - infra.yaml
>>
>> A tenant may optionally specify repos from which it may derive its
>> configuration. In this manner, a repo may keep its Zuul configuration
>> within its own repo. This would only happen if the main configuration
>> file specified that it is permitted::
>>
>> ### main.yaml (continued)
>> - tenant:
>> name: random-stackforge-project
>> include:
>> - global_config.yaml
>> repos:
>> - stackforge/random # Specific project config is in-repo
>>
>> Jobs defined in-repo may not have access to the full feature set
>> (including some authorization features). They also may not override
>> existing jobs.
>>
>> Job definitions continue to have the features in the current Zuul
>> layout, but they also take on some of the responsibilities currently
>> handled by the Jenkins (or other worker) definition::
>>
>> ### global_config.yaml
>> # Every tenant in the system has access to these jobs (because their
>> # tenant definition includes it).
>> - job:
>> name: base
>> timeout: 30m
>> node: precise # Just a variable for later use
>> nodes: # The operative list of nodes
>> - name: controller
>> image: {node} # Substitute the variable
>> auth: # Auth may only be defined in central config, not in-repo
>> swift:
>> - container: logs
>> pre-run: # These specify what to run before and after the job
>> - zuul-cloner
>> post-run:
>> - archive-logs
>>
>> Jobs have inheritance, and the above definition provides a base level
>> of functionality for all jobs. It sets a default timeout, requests a
>> single node (of type precise), and requests swift credentials to
>> upload logs. Further jobs may extend and override these parameters::
>>
>> ### global_config.yaml (continued)
>> # The python 2.7 unit test job
>> - job:
>> name: python27
>> parent: base
>> node: trusty
>>
>> Our use of job names specific to projects is a holdover from when we
>> wanted long-lived slaves on jenkins to efficiently re-use workspaces.
>> This hasn't been necessary for a while, though we have used this to
>> our advantage when collecting stats and reports. However, job
>> configuration can be simplified greatly if we simply have a job that
>> runs the python 2.7 unit tests which can be used for any project. To
>> the degree that we want to know how often this job failed on nova, we
>> can add that information back in when reporting statistics. Jobs may
>> have multiple aspects to accomodate differences among branches, etc.::
>>
>> ### global_config.yaml (continued)
>> # Version that is run for changes on stable/icehouse
>> - job:
>> name: python27
>> parent: base
>> branch: stable/icehouse
>> node: precise
>>
>> # Version that is run for changes on stable/juno
>> - job:
>> name: python27
>> parent: base
>> branch: stable/juno # Could be combined into previous with regex
>> node: precise # if concept of "best match" is defined
>>
>> Jobs may specify that they require more than one node::
>>
>> ### global_config.yaml (continued)
>> - job:
>> name: devstack-multinode
>> parent: base
>> node: trusty # could do same branch mapping as above
>> nodes:
>> - name: controller
>> image: {node}
>> - name: compute
>> image: {node}
>>
>> Jobs defined centrally (i.e., not in-repo) may specify auth info::
>>
>> ### global_config.yaml (continued)
>> - job:
>> name: pypi-upload
>> parent: base
>> auth:
>> password:
>> pypi-password: pypi-password
>> # This looks up 'pypi-password' from an encrypted yaml file
>> # and adds it into variables for the job
>>
>> Pipeline definitions are similar to the current syntax, except that it
>> supports specifying additional information for jobs in the context of
>> a given project and pipeline. For instance, rather than specifying
>> that a job is globally non-voting, you may specify that it is
>> non-voting for a given project in a given pipeline::
>>
>> ### openstack.yaml
>> - project:
>> name: openstack/nova
>> gate:
>> queue: integrated # Shared queues are manually built
>> jobs:
>> - python27 # Runs version of job appropriate to branch
>> - devstack
>> - devstack-deprecated-feature:
>> branch: stable/juno # Only run on stable/juno changes
>> voting: false # Non-voting
>> post:
>> jobs:
>> - tarball:
>> jobs:
>> - pypi-upload
>>
>> Currently unique job names are used to build shared change queues.
>> Since job names will no longer be unique, shared queues must be
>> manually constructed by assigning them a name. Projects with the same
>> queue name for the same pipeline will have a shared queue.
>>
>> A subset of functionality is avaible to projects that are permitted to
>> use in-repo configuration::
>>
>> ### stackforge/random/.zuul.yaml
>> - job:
>> name: random-job
>> parent: base # From global config; gets us logs
>> node: precise
>>
>> - project:
>> name: stackforge/random
>> gate:
>> jobs:
>> - python27 # From global config
>> - random-job # Flom local config
>>
>> The executable content of jobs should be defined as ansible playbooks.
>> Playbooks can be fairly simple and might consist of little more than
>> "run this shell script" for those who are not otherwise interested in
>> ansible::
>>
>> ### stackforge/random/playbooks/random-job.yaml
>> ---
>> hosts: controller
>> tasks:
>> - shell: run_some_tests.sh
>>
>> Global jobs may define ansible roles for common functions::
>>
>> ### openstack-infra/zuul-playbooks/python27.yaml
>> ---
>> hosts: controller
>> roles:
>> - tox:
>> env: py27
>>
>> Because ansible has well-articulated multi-node orchestration
>> features, this permits very expressive job definitions for multi-node
>> tests. A playbook can specify different roles to apply to the
>> different nodes that the job requested::
>>
>> ### openstack-infra/zuul-playbooks/devstack-multinode.yaml
>> ---
>> hosts: controller
>> roles:
>> - devstack
>> ---
>> hosts: compute
>> roles:
>> - devstack-compute
>>
>> Additionally, if a project is already defining ansible roles for its
>> deployment, then those roles may be easily applied in testing, making
>> CI even closer to CD. Finally, to make Zuul more useful for CD, Zuul
>> may be configured to run a job (ie, ansible role) on a specific node.
>>
>> The pre- and post-run entries in the job definition might also apply
>> to ansible playbooks and can be used to simplify job setup and
>> cleanup::
>>
>> ### openstack-infra/zuul-playbooks/zuul-cloner.yaml
>> ---
>> hosts: all
>> roles:
>> - zuul-cloner: {{zuul}}
>>
>> Where the zuul variable is a dictionary containing all the information
>> currently transmitted in the ZUUL_* environment variables. Similarly,
>> the log archiving script can copy logs from the host to swift.
>>
>> A new Zuul component would be created to execute jobs. Rather than
>> running a worker process on each node (which requires installing
>> software on the test node, and establishing and maintaining network
>> connectivity back to Zuul, and the ability to coordinate actions across
>> nodes for multi-node tests), this new component will accept jobs from
>> Zuul, and for each one, write an ansible inventory file with the node
>> and variable information, and then execute the ansible playbook for that
>> job. This means that the new Zuul component will maintain ssh
>> connections to all hosts currently running a job. This could become a
>> bottleneck, but ansible and ssh have been known to scale to a large
>> number of simultaneous hosts, and this component may be scaled
>> horizontally. It should be simple enough that it could even be
>> automatically scaled if needed. In turn, however, this does make node
>> configuration simpler (test nodes need only have an ssh public key
>> installed) and makes tests behave more like deployment.
>>
>> _______________________________________________
>> OpenStack-Infra mailing list
>> OpenStack-Infra at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>>
>
>
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>
More information about the OpenStack-Infra
mailing list