[openstack-dev] [all] [ptls] The Czar system, or how to scale PTLs

Zane Bitter zbitter at redhat.com
Fri Aug 22 15:01:20 UTC 2014


On 22/08/14 08:33, Thierry Carrez wrote:
> Hi everyone,
>
> We all know being a project PTL is an extremely busy job. That's because
> in our structure the PTL is responsible for almost everything in a project:
>
> - Release management contact
> - Work prioritization
> - Keeping bugs under control
> - Communicate about work being planned or done
> - Make sure the gate is not broken
> - Team logistics (run meetings, organize sprints)
> - ...
>
> They end up being completely drowned in those day-to-day operational
> duties, miss the big picture, can't help in development that much
> anymore, get burnt out. Since you're either "the PTL" or "not the PTL",
> you're very alone and succession planning is not working that great either.

Succession planning works as well as you want it to IMO. In Kilo, Heat 
will have its 5th PTL in 5 release cycles, and all of those successions 
except the first were planned. We have multiple potential candidates for 
the next election whom I think would be more than capable of doing a 
great job; the harder thing is finding people who are able to commit the 
time.

> There have been a number of experiments to solve that problem. John
> Garbutt has done an incredible job at helping successive Nova PTLs
> handling the release management aspect. Tracy Jones took over Nova bug
> management. Doug Hellmann successfully introduced the concept of Oslo
> liaisons to get clear point of contacts for Oslo library adoption in
> projects. It may be time to generalize that solution.

+1

My goal as the Heat PTL has been to try to identify all of the places 
where the PTL can be a single point of failure and work toward 
eliminating them, as a first step toward eliminating PTLs altogether.

BTW the big two remaining are scheduling sessions for the design summit 
and approving/targeting blueprints in Launchpad - the specs repos helped 
somewhat with the latter, but it won't be completely fixed until 
Launchpad goes away.

> The issue is one of responsibility: the PTL is ultimately responsible
> for everything in a project. If we can more formally delegate that
> responsibility, we can avoid getting up to the PTL for everything, we
> can rely on a team of people rather than just one person.

First off, the PTL is not responsible for everything in a project. 
*Everyone* is responsible for everything in a project.

The PTL is *accountable* for everything in a project. PTLs are the 
mechanism the TC uses to ensure that programs remain accountable to the 
wider community.

> Enter the Czar system: each project should have a number of liaisons /
> official contacts / delegates that are fully responsible to cover one
> aspect of the project. We need to have Bugs czars, which are responsible
> for getting bugs under control. We need to have Oslo czars, which serve
> as liaisons for the Oslo program but also as active project-local oslo
> advocates. We need Security czars, which the VMT can go to to progress
> quickly on plugging vulnerabilities. We need release management czars,
> to handle the communication and process with that painful OpenStack
> release manager. We need Gate czars to serve as first-line-of-contact
> getting gate issues fixed... You get the idea.

+1

Rather than putting it all on one person, we should enumerate the ways 
in which we want projects to be accountable to the wider community, and 
allow the projects themselves to determine who is accountable for each 
particular function. Furthermore, we should allow them to do so on their 
own cadence, rather than only at 6-month intervals.

> Some people can be czars of multiple areas. PTLs can retain some czar
> activity if they wish. Czars can collaborate with their equivalents in
> other projects to share best practices. We just need a clear list of
> areas/duties and make sure each project has a name assigned to each.

Exactly, maybe we'd have a wiki page or something listing the contact 
person for each area in each project.

> Now, why czars ? Why not rely on informal activity ? Well, for that
> system to work we'll need a lot of people to step up and sign up for
> more responsibility. Making them "czars" makes sure that effort is
> recognized and gives them something back. Also if we don't formally
> designate people, we can't really delegate and the PTL will still be
> directly held responsible. The Release management czar should be able to
> sign off release SHAs without asking the PTL. The czars and the PTL
> should collectively be the new "project drivers".
>
> At that point, why not also get rid of the PTL ?

+1 :)

> And replace him with a
> team of czars ? If the czar system is successful, the PTL should be
> freed from the day-to-day operational duties and will be able to focus
> on the project health again.

I don't see that as something the wider OpenStack community needs to 
dictate. We have a heavyweight election process for PTLs once every 
cycle because that used to be the process for electing the TC. Now that 
it no longer serves this dual purpose, PTL elections have outlived their 
usefulness.

If projects want to have a designated tech lead, let them. If they want 
to have the lead elected in a form of representative democracy, let 
them. But there's no need to impose that process on every project. If 
they want to rotate the tech lead every week instead of every 6 months, 
why not let them? We'll soon see from experimentation which models work. 
Let a thousand flowers bloom, &c.

We see all these articles in the trade press about how OpenStack needs a 
single dictator, and we rightly call them out as the BS that they are. 
Any yet we impose that model from above on the individual projects in 
OpenStack. Why?

Think of the core teams as the "technical committees" of individual 
projects. We trust the OpenStack TC (once elected) to spontaneously 
self-organise and build consensus. Why do we believe that the core teams 
are incapable of doing the same?

> We still need someone to keep an eye on the
> project-wide picture and coordinate the work of the czars.

s/someone/everyone/

> We need
> someone to pick czars, in the event multiple candidates sign up.

I don't think this is anywhere _near_ as big a problem as you think.

Honestly, if there are people who can't find a way to share some work 
even in a situation where alternating every week is a completely valid 
option, we should probably send them back to kindergarten until they 
figure it out.

(The bigger problem is actually when _no_ candidates sign up, but I 
think we can rely on a sense of 'noblesse oblige' amongst core team 
members to pressure them into it.)

> We also
> still need someone to have the final say in case of deadlocked issues.

-1 we really don't.

> People say we don't have that many deadlocks in OpenStack for which the
> PTL ultimate power is needed, so we could get rid of them. I'd argue
> that the main reason we don't have that many deadlocks in OpenStack is
> precisely *because* we have a system to break them if they arise.

s/that many/any/ IME and I think that threatening to break a deadlock by 
fiat is just as bad as actually doing it. And by 'bad' I mean 
community-poisoningly, trust-destroyingly bad.

> That
> encourages everyone to find a lazy consensus. That part of the PTL job
> works. Let's fix the part that doesn't work (scaling/burnout).

Let's allow projects to decide for themselves what works. Not every 
project is the same.

cheers,
Zane.



More information about the OpenStack-dev mailing list