[openstack-dev] [all] [ptls] The Czar system, or how to scale PTLs

Russell Bryant rbryant at redhat.com
Fri Aug 22 13:40:28 UTC 2014


On 08/22/2014 08:33 AM, Thierry Carrez wrote:
> Hi everyone,
> 
> We all know being a project PTL is an extremely busy job. That's because
> in our structure the PTL is responsible for almost everything in a project:
> 
> - Release management contact
> - Work prioritization
> - Keeping bugs under control
> - Communicate about work being planned or done
> - Make sure the gate is not broken
> - Team logistics (run meetings, organize sprints)
> - ...
> 
> They end up being completely drowned in those day-to-day operational
> duties, miss the big picture, can't help in development that much
> anymore, get burnt out. Since you're either "the PTL" or "not the PTL",
> you're very alone and succession planning is not working that great either.
> 
> There have been a number of experiments to solve that problem. John
> Garbutt has done an incredible job at helping successive Nova PTLs
> handling the release management aspect. Tracy Jones took over Nova bug
> management. Doug Hellmann successfully introduced the concept of Oslo
> liaisons to get clear point of contacts for Oslo library adoption in
> projects. It may be time to generalize that solution.
> 
> The issue is one of responsibility: the PTL is ultimately responsible
> for everything in a project. If we can more formally delegate that
> responsibility, we can avoid getting up to the PTL for everything, we
> can rely on a team of people rather than just one person.
> 
> Enter the Czar system: each project should have a number of liaisons /
> official contacts / delegates that are fully responsible to cover one
> aspect of the project. We need to have Bugs czars, which are responsible
> for getting bugs under control. We need to have Oslo czars, which serve
> as liaisons for the Oslo program but also as active project-local oslo
> advocates. We need Security czars, which the VMT can go to to progress
> quickly on plugging vulnerabilities. We need release management czars,
> to handle the communication and process with that painful OpenStack
> release manager. We need Gate czars to serve as first-line-of-contact
> getting gate issues fixed... You get the idea.
> 
> Some people can be czars of multiple areas. PTLs can retain some czar
> activity if they wish. Czars can collaborate with their equivalents in
> other projects to share best practices. We just need a clear list of
> areas/duties and make sure each project has a name assigned to each.
> 
> Now, why czars ? Why not rely on informal activity ? Well, for that
> system to work we'll need a lot of people to step up and sign up for
> more responsibility. Making them "czars" makes sure that effort is
> recognized and gives them something back. Also if we don't formally
> designate people, we can't really delegate and the PTL will still be
> directly held responsible. The Release management czar should be able to
> sign off release SHAs without asking the PTL. The czars and the PTL
> should collectively be the new "project drivers".
> 
> At that point, why not also get rid of the PTL ? And replace him with a
> team of czars ? If the czar system is successful, the PTL should be
> freed from the day-to-day operational duties and will be able to focus
> on the project health again. We still need someone to keep an eye on the
> project-wide picture and coordinate the work of the czars. We need
> someone to pick czars, in the event multiple candidates sign up. We also
> still need someone to have the final say in case of deadlocked issues.
> 
> People say we don't have that many deadlocks in OpenStack for which the
> PTL ultimate power is needed, so we could get rid of them. I'd argue
> that the main reason we don't have that many deadlocks in OpenStack is
> precisely *because* we have a system to break them if they arise. That
> encourages everyone to find a lazy consensus. That part of the PTL job
> works. Let's fix the part that doesn't work (scaling/burnout).
> 

+1 on czars.  That's what was working best for me to start scaling
things in Nova, especially through my 2nd term (Icehouse).  John and
Tracy were a big help as you mentioned as examples.  There were others
that were stepping up, too.  I think it's been working well enough to
formalize it.

Another area worth calling out is a gate czar.  Having someone who
understands infra and QA quite well and is regularly on top of the
status of the project in the gate is helpful and quite important.

-- 
Russell Bryant



More information about the OpenStack-dev mailing list