[openstack-dev] [infra] Infra cloud: infra running a cloud for nodepool

James E. Blair corvus at inaugust.com
Tue Feb 24 21:18:52 UTC 2015

A group of folks from HP is interested in starting an effort to run a
cloud as part of the Infrastructure program with the purpose of
providing resources to nodepool for OpenStack testing.  HP is supplying
two racks of machines, and we will operate each as an independent cloud.
I think this is a really good idea, and will do a lot for OpenStack.

Here's what we would get out of it:

1) More test resources.  The primary goal of this cloud will be to
provide more instances to nodepool.  This would extend our pool to
include a third provider meaning that we are more resilient to service
disruptions, and increase our aggregate capacity meaning we can perform
more testing more quickly.  It's hard to say for certain until we have
something spun up that we can benchmark, but we are hoping for somewhere
between an additional 50% to 100% of our current capacity.

2) Closing the loop between OpenStack developers and ops.  This cloud
will be deployed as often as we are able (perhaps daily, perhaps less
often, depending on technology) meaning that if it is not behaving in a
way developers like, they can fix it fairly quickly.

3) A fully open deployment.  The infra team already runs a large
logstash and elasticsearch system for finding issues in devstack runs.
We will deploy the same technology (and perhaps more) to make sure that
anyone who wants to inspect the operational logs from the running
production cloud is able to do so.  We can even run the same
elastic-recheck queries to see if known bugs are visible in production.
The cloud will be deployed using the same tools and processes as the
rest of the project infrastructure, meaning anyone can edit the modules
that deploy the cloud to make changes.

How is this different from the TripleO cloud?

The primary goal of the TripleO cloud is to provide test infrastructure
so that the TripleO project can run tests that require real hardware and
complex environments.  The primary purpose of the infra cloud will be to
run a production service that will stand alongside other cloud providers
to supply virtual machines to nodepool.

What about the infra team's aversion to real hardware?

It's true that all of our current resources are virtual, and this would
be adding the first real, bare-metal, machines to the infra project.
However, there are a number of reasons I feel we're ready to take that
step now:

* This cloud will stand alongside two others to provide resources to
  nodepool.  If it completely fails, infra will continue to operate; so
  we don't need to be overly concerned with uptime and being on-call,

* The deployment and operation of the cloud will use the same technology
  and processes as the infra project currently uses, so there should be
  minimal barriers for existing team members.

* A bunch of new people will be joining the team to help with this.  We
  expect them to become fully integrated with the rest of infra, so that
  they are able to help out in other areas and the whole team expands
  its collective capacity and expertise.

If this works well, it may become a template for incorporating other
hardware contributions into the system.

Next steps:

We've started the process of identifying the steps to make this happen,
as well as answering some deployment questions (specifics about
technology, topology, etc).  There is a StoryBoard story for the effort:


And some notes that we took at a recent meeting to bootstrap the effort:


I think one of the next steps is to actually write all that up and push
it up as a change to the system-config documentation.  Once we're
certain we agree on all of that, it should be safe to divide up many of
the remaining tasks.


More information about the OpenStack-dev mailing list