[Openstack-operators] Openstack team size vs's deployment size

Tim Bell Tim.Bell at cern.ch
Thu Sep 8 16:29:43 UTC 2016

It is a difficult calculation to make for me:

-          Shared services like Facilities/Hardware repair/Network…

-          Expectation for new functions since upstream has made them available

-          The user support effort has increased significantly as more users come online

-          Application cloud-readiness is another factor, are the applications able to handle infrastructure outages.

Would be a very interesting topic for an Ops session at Barcelona and I suspect the larger deployments are aiming to approach O(1) staffing, i.e. no problem to add another X% resources to the cloud with only a <<X% increase in staff but understanding the best practices and some advice (such as < 1 FTE will be challenging) would be helpful.


From: "Kris G. Lindgren" <klindgren at godaddy.com>
Date: Thursday 8 September 2016 at 17:52
To: "Van Leeuwen, Robert" <rovanleeuwen at ebay.com>, openstack-operators <openstack-operators at lists.openstack.org>
Subject: Re: [Openstack-operators] Openstack team size vs's deployment size

I completely agree about the general rule of thumb.  I am only looking at the team that specifically supports openstack.  For us frontend support for public clouds is handled by another team/org all together.

So what I am looking at is the people who are in charge of the care/feeding of the openstack system specifically.
Which means: people who do the dev work on openstack, community participation, the integrations work, the capacity monitoring and server additions, the responding to alarms/monitoring around openstack, the POC of new features.  Any automation work specifically around openstack.  Any testing specifically done against openstack.

For us we have both private and public clouds in 4 different regions.
We try to maintain either N or N-1 release cadence.
We build our own packages.
We have lots of existing legacy systems that we need to integrate with.
We have a number of local patches that we carry (the majority around cellsv1).

So given that would you be willing to share your compute node/engineer ratio?

Kris Lindgren
Senior Linux Systems Engineer

From: "Van Leeuwen, Robert" <rovanleeuwen at ebay.com>
Date: Thursday, September 8, 2016 at 1:33 AM
To: "Kris G. Lindgren" <klindgren at godaddy.com>, OpenStack Operators <openstack-operators at lists.openstack.org>
Subject: Re: [Openstack-operators] Openstack team size vs's deployment size

> I was hoping to poll other operators to see what their average team size vs’s deployment size is,
>  as I am trying to use this in an internal company discussion.
> Right now we are in the order of ~260 Compute servers per Openstack Dev/Engineer.
> So trying to see how we compare with other openstack installs, particularly those running with more than a few hundred compute nodes.

In my opinion it highly depends on too many things to have general rule of thumb.
Just a few things that I think would impact required team size:
* How many regions you have: setting up and managing a region usually takes more time then adding computes to an existing region
* How often do you want/need to upgrade
* Are you offering more then “core IAAS services” e.g. designate/trove/…
* What supporting things do you need around your cloud and who manage e.g. networking, setting up dns / repositories / authentication systems  etc
* What kind of SDN are you using/ how it needs to be integrated existing networks
* What kind of hardware you are rolling and what is the average size of the instances. E.G. hosting 1000 tiny instances on a 768GB / 88 core hypervisor will probably create more support tickets then 10 large instances on a low-spec hypervisor.
* How you handle storage ceph/san/local?
* Do you need live-migration when doing maintenance or are you allowed to bring down an availability zone
* Are you building your own packages / Using vendor packages
* The level of support the users expect and which team is taking care of that

In my private cloud experience rolling compute nodes and the controllers are not the bulk of the work.
The time goes in all the things that you need around the cloud and customizations that takes time.

It might be a bit different for public cloud providers where you might deliver as-is and do not need any integrations.
But you might need other things like very reliable billing and good automation around misbehaving users.

Robert van Leeuwen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160908/d8b13ee5/attachment.html>

More information about the OpenStack-operators mailing list