[Openstack-operators] Openstack team size vs's deployment size

Mathieu Gagné mgagne at calavera.ca
Fri Sep 9 22:41:35 UTC 2016


On Wed, Sep 7, 2016 at 6:59 PM, Kris G. Lindgren <klindgren at godaddy.com> wrote:
>
> I was hoping to poll other operators to see what their average team size
> vs’s deployment size is, as I am trying to use this in an internal company
> discussion.

It is difficult to come up with numbers without context as not all
team are equally created. But I think we are currently facing the same
situation where you feel the team can't keep up with the amount of
work. I will describe our life as the operational "OpenStack" team and
how we plan on address our issues. Maybe you will find useful
information.

Our team is composed of ~ 7 people. This team is responsible for
everything "IT" except "corporative" services such as Exchange, Active
directory or telephony.
We also have multiple development teams, one being more or less
dedicated to everything related to baremetal: network automation,
Ironic development, etc.
We also have a dedicated network team, hands and eyes,
assembly/provisioning team, etc. (we don't wire/rack the servers
ourselves)

So if you ask "How many people per compute nodes?", as you can see,
the number might be very low as we aren't 100% dedicated to OpenStack.

In our team, each member has a set of competences/specialities.
(hardware, backup, monitoring, OpenStack, HA and more) This means
there are "experts" in our team which are responsible for those
domains of expertises. This also means that there might be only one
guy knowing how to package OpenStack or one fully understanding how
the logging system is installed.

We have a lot of OpenStack related tasks:
* Maintain the "deployer" (mainly written in Puppet)
* OpenStack packages (based on Ubuntu packages)
* OpenStack bugs fixing and features
* Debug delivery issues (baremetal, performance, misconfiguration, etc.)
* Capacity management (new compute nodes, storage nodes)
* New deployments (regions)
* Maintain other related systems such as monitoring, logging, etc.
* POC with new OpenStack projects

Our team also happens to be exposed to multiple stakeholders who all
want us to work on their needs/problems right away. This includes
other developers, network team, customer support, provisioning, and
many more. It is difficult to prioritize all those requests. This
causes a lot of planning challenges which we have yet to fully
address. This is something we will be working on very shortly.

The main challenge that comes up from all this is difficulty to keep
up with technical debt payment.

Technical debt makes everything above much more difficult:
* your logging system is not setup as you would like it, rendering it useless
* your monitoring system has stability issues which you can't find
time to address, resulting in even less sleep.
* upgrading OpenStack becomes a task you postpone as long as you can
due to other higher priorities

So you want to address those to make your life easier and hopefully
improve your life quality.

Based on my experience, the business will never give you 6 months
full-time to fix your technical debts. They will want features, new
regions, more capacity, etc. And operational problems still need to be
addressed right away when they happen.

What we are going through now is putting on paper our pain points
(technical debts) and communicate those to upper management. As long
as you cannot quantify and describe your "misery" (technical debts),
it's hard for the business to justify hiring new people, offer you the
opportunity to work on your internal tools or fully understand your
situation.

To summarize:
* Identify why you would want more people (technical debts, too many
ops tasks, etc.)
* See if your team can address those issues themselves
* Or use those info to justify hiring more people to work on those items
* Or delegate operational tasks to other teams if possible
* Or build tools to make the above possible.

Cheers

--
Mathieu



More information about the OpenStack-operators mailing list