[openstack-dev] [all] [ptls] The Czar system, or how to scale PTLs
thierry at openstack.org
Fri Aug 22 12:33:27 UTC 2014
We all know being a project PTL is an extremely busy job. That's because
in our structure the PTL is responsible for almost everything in a project:
- Release management contact
- Work prioritization
- Keeping bugs under control
- Communicate about work being planned or done
- Make sure the gate is not broken
- Team logistics (run meetings, organize sprints)
They end up being completely drowned in those day-to-day operational
duties, miss the big picture, can't help in development that much
anymore, get burnt out. Since you're either "the PTL" or "not the PTL",
you're very alone and succession planning is not working that great either.
There have been a number of experiments to solve that problem. John
Garbutt has done an incredible job at helping successive Nova PTLs
handling the release management aspect. Tracy Jones took over Nova bug
management. Doug Hellmann successfully introduced the concept of Oslo
liaisons to get clear point of contacts for Oslo library adoption in
projects. It may be time to generalize that solution.
The issue is one of responsibility: the PTL is ultimately responsible
for everything in a project. If we can more formally delegate that
responsibility, we can avoid getting up to the PTL for everything, we
can rely on a team of people rather than just one person.
Enter the Czar system: each project should have a number of liaisons /
official contacts / delegates that are fully responsible to cover one
aspect of the project. We need to have Bugs czars, which are responsible
for getting bugs under control. We need to have Oslo czars, which serve
as liaisons for the Oslo program but also as active project-local oslo
advocates. We need Security czars, which the VMT can go to to progress
quickly on plugging vulnerabilities. We need release management czars,
to handle the communication and process with that painful OpenStack
release manager. We need Gate czars to serve as first-line-of-contact
getting gate issues fixed... You get the idea.
Some people can be czars of multiple areas. PTLs can retain some czar
activity if they wish. Czars can collaborate with their equivalents in
other projects to share best practices. We just need a clear list of
areas/duties and make sure each project has a name assigned to each.
Now, why czars ? Why not rely on informal activity ? Well, for that
system to work we'll need a lot of people to step up and sign up for
more responsibility. Making them "czars" makes sure that effort is
recognized and gives them something back. Also if we don't formally
designate people, we can't really delegate and the PTL will still be
directly held responsible. The Release management czar should be able to
sign off release SHAs without asking the PTL. The czars and the PTL
should collectively be the new "project drivers".
At that point, why not also get rid of the PTL ? And replace him with a
team of czars ? If the czar system is successful, the PTL should be
freed from the day-to-day operational duties and will be able to focus
on the project health again. We still need someone to keep an eye on the
project-wide picture and coordinate the work of the czars. We need
someone to pick czars, in the event multiple candidates sign up. We also
still need someone to have the final say in case of deadlocked issues.
People say we don't have that many deadlocks in OpenStack for which the
PTL ultimate power is needed, so we could get rid of them. I'd argue
that the main reason we don't have that many deadlocks in OpenStack is
precisely *because* we have a system to break them if they arise. That
encourages everyone to find a lazy consensus. That part of the PTL job
works. Let's fix the part that doesn't work (scaling/burnout).
Thierry Carrez (ttx)
More information about the OpenStack-dev