[openstack-dev] [tc] [all] [glance] On operating a high throughput or otherwise team
Clint Byrum
clint at fewbar.com
Mon May 16 07:23:07 UTC 2016
Excerpts from Nikhil Komawar's message of 2016-05-14 17:42:16 -0400:
> Hi all,
>
>
> Lately I have been involved in discussions that have resulted in giving
> a wrong idea to the approach I take in operating the (Glance) team(s).
> While my approach is consistency, coherency and agility in getting
> things done (especially including the short, mid as well as long term
> plans), it appears that it wasn't something evident. So, I have decided
> to write this email so that I can collectively gather feedback and share
> my thoughts on the right(eous) approach.
>
I find it rather odd that you or anyone believes there is a "right"
approach that would work for over 1500 active developers and 200+
companies.
We can definitely improve upon the aspects of what we have now, by
incremental change or revolution. But I doubt we'll ever have a
community that is "right" for everyone.
>
> My experience has been that OpenStack is relatively slow. In fact the
> feedback I get from people who are secondary (short span contributors)
> is that it's very slow. There's a genuine reason for that and it's not
> as simple as you are an Open Source/Community project or that people are
> unreasonable or that there's lot of bike-shedding, etc.
>
It's slow like a freight train. Sure, time from point A to point B for
any one interest can be agonizingly slow. But the aggregate number of
changes (both in design, as well as code) is _staggering_ given the
number of competing interests involved.
>
> We are developing something that is usable, operationally friendly and
> that it's easier to contribute & maintain but, many strong influencers
> are missing on the most important need for OpenStack -- efficient way of
> communication. I think we have the tools and right approach on paper and
> we've mandated it in the charter too, but that's not enough to operate
> things. Also, many people like to work on the assumption that all the
> tools of communication are equivalent or useful and there are no
> side-effects of using them ever. I strongly disagree. Please find the
> reason below:
>
I'd be interested to see evidence of anyone believing something close
to that, much less "many people".
I do believe people don't take into account everyone's perspective and
communication style when choosing how to communicate. But we can't really
know all of the ways anything we do in a distributed system affects all
of the parts. We can reason about it, and I think you've done a fine job
of reasoning through some of the points. But you can't know, nor can I,
and I don't think anyone is laboring under the illusion that they can
know this.
>
> Let me start from scratch:-
>
>
> * What is code really?
>
> Code is nothing but a way to communicate your decisions. These decisions
> (if, then, else, while, etc.) are nothing but a way to consistently
> produce a repeatable output using a machine. (
> https://en.wikipedia.org/wiki/Turing_machine )
>
We'll have to agree to disagree here. _YES_ code does as much
communicating with humans as it does controlling computers. However,
there's a huge difference between the way communication works
(influence) and the way computing works (control).
>
> * If it's that simple, why is there even a problem?
>
> Decisions when taken in tandem or in parallel can result into a more
> complex phenomenon that is not perceptibly evident. That results into
> assumptions.
>
Isn't it funny how our communication system has the same problems as our
software? [1]
[1] https://en.wikipedia.org/wiki/Conway's_law
>
> * So, what can be the blocker?
>
> Nothing, but working with these assumptions is really the blocker. That
> is exactly why many people in their feedback say we have a "people
> problem" in OpenStack. But it's not really the people problem, it is the
> assumption problem.
>
> Assumptions are very very bad:
>
> With 'n' problems in a domain and 'm' people working on all those
> problems, individually, we have the assumption problem of the order of
> O((m*e)^n) where you can think of 'e' as the convergence factor.
> Convergence factor being the ability of a group to come to an agreement
> of the order of 'agree to agree', 'agree to disagree' (add percentages
> to each for more granularity). There is also another assumption (for the
> convergence factor) that everyone wants to work in the best interest of
> solving the problems in that domain.
>
>
rAmen brother. We can't assume to know the motivations of anyone, though
we can at least decide how much to trust what people say. So if they say
that they're interested in solving the problems in a domain, I certainly
will give them space to prove that right or wrong.
> * How do I attempt to solve this situation?
>
> I think the first and foremost step is understanding the 'intent' behind
> every step -- whether it is a proposal, code, email, etc.
>
> Another important step is to reduce the communication gap -- be it be
> meetings, emails, chats, etc. I think the distinguishing factor of each
> of these modes of communication should be taken place while
> communicating. For example, the process of communication involves ->
> intent, thought, ability to communicate, language barriers/restrictions
> by the speaker and for the audience it is the other way around ->
> language barrier/restrictions, ability to comprehend, internalize (give
> it a shape in your thoughts) and then catch the intent. This is a long
> process behind each and every step of the communication whether one
> sentence or if it's a long review. So, when the intent is important to
> communicate we need to use the medium of communication that is most
> suitable to communicate the intent, in case of recommended tools it is
> irc on regular basis and otherwise they are meetups. We sometimes use
> video/audio conf calls etc. High bandwidth communication is extremely
> important to increase the convergence factor and solve the problem.
> Let's start using them more and more. Please.
>
I see high bandwidth communication as something we use to correct errors
in the communication process that is _far_ too high scale and diverse
to be synchronous.
> I think people prefer to use ML a lot and I am not a great fan of the
> same. It is a multi-cast way of communication and it has assumptions
> around time, space, intent of the audience & intent to actually read
> them. Same is for gerrit/etherpad.
>
This is pretty much a required reality for distributed communications. One
must evaluate what they might be racing with when posting an asynchronous
message to the ML. But in return, one can get a very high hit rate for a
very low cost. Done right, one might even program their readers to watch
out for whatever topic they've posted on, so that the readers can alert
the author to the subject at the next synchronous high-bandwidth
communications event.
> Same applies to the broadcast media too but to a smaller extent as that
> content is static and focuses on one thing.
>
> Multi-cast medium of communication is more disruptive as it involves a
> possibility of divergence from the topic, strongly polarizing opinions
> due to the small possibility of catching of the intent. So, let us use
> it 'judiciously' and preferably only as a newspaper.
>
This is only really possible by scaling down the community, or siloing
sections of the community off from eachother. That's the exact opposite
of what I'd like to see happen in OpenStack. We need way more people who
are able to cross the silo boundaries and work at a high level to solve
cross-cutting problems, not less.
> Another step is to arrange/show-up in meetings, yes this is tedious but
> extremely vital. This is the place where you can actually determine if
> the convergence factor is more or less. I find that a lot of people take
> meetings lightly and their approach isn't establishing a deterministic
> behavior in the team. Many times, it becomes a disruptive behavior and
> the convergence decreases significantly.
>
This isn't just tedious, it's just not a possibility for the community
as a whole. Sure, some people can make meetings at 0800 Pacific-US time.
But I cannot, because I have a family that needs me at that time.
> I have always been persistent of mid-cycles as they have helped Glance
> team (whoever present) to come to a state of agreeing to agree or at
> least agreeing to disagree. Both of which are enough to remain 'unstuck'
> and focus on getting things done.
>
Sure, the mid-cycle is a place to correct errors and create new source
material for asynchronous change management. But we can't all be at all
of the mid-cycles, so this isn't really a great thing to center large
scale communication around.
> Team bonding, one to one feedback etc. processes are adopted as well.
> But OpenStack is one team and you can potentially have a significantly
> high dynamism in the flexibility of the team so all that is relatively
> less important.
>
> This cycle I am experimenting with the focus approach (thanks to Doug
> for leading a good example to [release] tag). It is my anticipation that
> focusing on the things per week will get people to have more synchronous
> communication and less of loss of context.
>
You're right on this. It will help. Providing clear references and
reducing the time between asynchronous mail and synchronous
error-correcting chats is vital to reducing waste. But eliminating waste
just isn't possible, and that's just life.
> Though, I think every team needs to be synchronous about their approach
> and not use delayed mechanisms like ML or gerrit.
>
>
>
> Another important point I wish to raise:
>
> * I find it is very important for people who actually have a strong say
> in things to focus less on code or even individual reviews and focus
> more on awareness, collaboration and establishing convergence factor.
> Leave the nitty gritty details to those who have more bandwidth in hand
> for it.
>
I believe what you just meant to say was "I believe we need an
architecture team." I agree, let's get the ball rolling on that. In all
seriousness, we don't really have one, and as a result, we don't really
have a well thought out overall system, even though the individual
components are nicely thought out.
> * Let's abolish statistical evaluation to all things or to the very
> least make them less important when it comes to the contribution. It
> results into a bad experience of lack of focus, loss of context in
> different problems and doesn't really show the real contribution someone
> is giving to the problem solving. We are all adults, we do not need
> teachers (even if it's a computer) to give us grades.
>
This will punish the non-native English speakers. We've seen that these
contributors consistently do reviews, and provide value, but struggle to
attain core reviewer status and in general get recognition in the
community because they struggle with the "intangibles" of group
communication. The numbers are there for those of us who don't know that
person to look and be able to say 'oh look they do a lot of stuff with
X'. Does that mean they're actually awesome at what they do? We can't
tell, but we at least have common ground to build on.
> * I think if someone has a very strong say then they need to keep a
> (near to) synchronous communication to the development process. We need
> to keep the context, keep our convergence intact and move forward with a
> common understanding. Otherwise it is VERY disruptive for someone
> investing their time, money, energy and interest in OpenStack.
>
I think this does happen with specs. If you have something strong to
say, write it down, submit it, and discuss as it lands. And when it
lands, your name is on it, and you will be approached about it. I see
this process working reasonably well already, so I'm not sure it needs
change.
> * Also, one very important thing that I keep hearing: "I do not like
> that" without any other information, as an argument to disregard
> technical proposals. I think it is very disruptive and irrational way to
> express arguments. We are not buying flowers in OpenStack, we need to
> keep rationality in check when we express our opinions. It reduces
> convergence factor and increases dubiety among the developers &
> reviewers. Then we have a ecosystem where people do not understand why
> we do things the way we do it. We should not stop businesses just
> because someone doesn't like something, please no. Lack of rationale can
> actually do that.
>
>
+1! Qualify or quantify your objections, but please don't just give us
your unfiltered opinions.
> I think the most important thing is to have belief on our practices. For
> that we need to enforce our standards and ensure people follow it. Once
> we have more strictness on less disruption, we will have more confidence
> in moving forward faster. We only have a governance today that is merely
> a guideline (Constitution), what we really need is a judiciary.
>
>
I love that you're thinking about this, and I thank you for writing this
all down. I know it means a lot to you. However, I don't believe we need
a judiciary. We are a meritocracy, and so, we favor those who accomplish
things. Introducing a lot of formal rules and process on top of that is
just awkward.
As a counterpoint, I think we mostly need more understanding of our
distributed nature, and to let go of the idea that we can control any
of it. To anyone involved I say: Wield your influence, and measure your
success, but don't expect 1500 people to do what you tell them to do,
because they might just have 1500 different ideas of what you actually
meant.
More information about the OpenStack-dev
mailing list