[openstack-dev] [all] [ptls] The Czar system, or how to scale PTLs

Flavio Percoco flavio at redhat.com
Fri Aug 22 14:18:40 UTC 2014


On 08/22/2014 02:33 PM, Thierry Carrez wrote:
> Hi everyone,
> 
> We all know being a project PTL is an extremely busy job. That's because
> in our structure the PTL is responsible for almost everything in a project:
> 
> - Release management contact
> - Work prioritization
> - Keeping bugs under control
> - Communicate about work being planned or done
> - Make sure the gate is not broken
> - Team logistics (run meetings, organize sprints)
> - ...
> 
> They end up being completely drowned in those day-to-day operational
> duties, miss the big picture, can't help in development that much
> anymore, get burnt out. Since you're either "the PTL" or "not the PTL",
> you're very alone and succession planning is not working that great either.
> 
> There have been a number of experiments to solve that problem. John
> Garbutt has done an incredible job at helping successive Nova PTLs
> handling the release management aspect. Tracy Jones took over Nova bug
> management. Doug Hellmann successfully introduced the concept of Oslo
> liaisons to get clear point of contacts for Oslo library adoption in
> projects. It may be time to generalize that solution.
> 
> The issue is one of responsibility: the PTL is ultimately responsible
> for everything in a project. If we can more formally delegate that
> responsibility, we can avoid getting up to the PTL for everything, we
> can rely on a team of people rather than just one person.
> 
> Enter the Czar system: each project should have a number of liaisons /
> official contacts / delegates that are fully responsible to cover one
> aspect of the project. We need to have Bugs czars, which are responsible
> for getting bugs under control. We need to have Oslo czars, which serve
> as liaisons for the Oslo program but also as active project-local oslo
> advocates. We need Security czars, which the VMT can go to to progress
> quickly on plugging vulnerabilities. We need release management czars,
> to handle the communication and process with that painful OpenStack
> release manager. We need Gate czars to serve as first-line-of-contact
> getting gate issues fixed... You get the idea.
> 
> Some people can be czars of multiple areas. PTLs can retain some czar
> activity if they wish. Czars can collaborate with their equivalents in
> other projects to share best practices. We just need a clear list of
> areas/duties and make sure each project has a name assigned to each.
> 
> Now, why czars ? Why not rely on informal activity ? Well, for that
> system to work we'll need a lot of people to step up and sign up for
> more responsibility. Making them "czars" makes sure that effort is
> recognized and gives them something back. Also if we don't formally
> designate people, we can't really delegate and the PTL will still be
> directly held responsible. The Release management czar should be able to
> sign off release SHAs without asking the PTL. The czars and the PTL
> should collectively be the new "project drivers".
> 
> At that point, why not also get rid of the PTL ? And replace him with a
> team of czars ? If the czar system is successful, the PTL should be
> freed from the day-to-day operational duties and will be able to focus
> on the project health again. We still need someone to keep an eye on the
> project-wide picture and coordinate the work of the czars. We need
> someone to pick czars, in the event multiple candidates sign up. We also
> still need someone to have the final say in case of deadlocked issues.
> 
> People say we don't have that many deadlocks in OpenStack for which the
> PTL ultimate power is needed, so we could get rid of them. I'd argue
> that the main reason we don't have that many deadlocks in OpenStack is
> precisely *because* we have a system to break them if they arise. That
> encourages everyone to find a lazy consensus. That part of the PTL job
> works. Let's fix the part that doesn't work (scaling/burnout).
> 


+1

FWIW, this is how things have worked in Zaqar since the very beginning.
We've split our duties throughout the team, which helped clearing some
of the PTL duties. So far, we have people responsible for:

* Bugs for the server
* Bugs for the client
* Gate and Tests
* Client releases

The areas not listed there are still part of the PTLs responsibilities.

Cheers,
Flavio

-- 
@flaper87
Flavio Percoco



More information about the OpenStack-dev mailing list