Open Stack

Mon Aug 20 20:47:45 UTC 2018

As we start looking at how TripleO will address next generation deployment
needs such as Edge, multi-site, and multi-cloud, I'd like to kick off a
discussion around how TripleO can evolve and adapt to meet these new
challenges.

What are these challenges? I think the OpenStack Edge Whitepaper does a good
job summarizing some of them:

https://www.openstack.org/assets/edge/OpenStack-EdgeWhitepaper-v3-online.pdf

They include:

- management of distributed infrastructure
- massive scale (thousands instead of hundreds)
- limited network connectivity
- isolation of distributed sites
- orchestration of federated services across multiple sites

We already have a lot of ongoing work that directly or indirectly starts to
address some of these challenges. That work includes things like
config-download, split-controlplane, metalsmith integration, validations,
all-in-one, and standalone.

I laid out some initial ideas in a previous message:

http://lists.openstack.org/pipermail/openstack-dev/2018-July/132398.html

I'll be reviewing some of that here and going into a bit more detail.

These are some of the high level ideas I'd like to see TripleO start to
address:

- More separation between planning and deploying (likely to be further defined
  in spec discussion). We've had these concepts for a while, but we need to do
  a better job of surfacing them to users as deployments grow in size and
  complexity.

  With config-download, we can more easily separate the phases of rendering,
  downloading, validating, and applying the configuration. As we increase in
  scale to managing many deployments, we should take advantage of what each of
  those phases offer.

  The separation also makes the deployment more portable, as we should
  eliminate any restrictions that force the undercloud to be the control node
  applying the configuration.

- Management of multiple deployments from a single undercloud. This is of
  course already possible today, but we need better docs and polish and more
  testing to flush out any bugs.

- Plan and template management in git.

  This could be an iterative step towards eliminating Swift in the undercloud.
  Swift seemed like a natural choice at the time because it was an existing
  OpenStack service.  However, I think git would do a better job at tracking
  history and comparing changes and is much more lightweight than Swift. We've
  been managing the config-download directory as a git repo, and I like this
  direction. For now, we are just putting the whole git repo in Swift, but I
  wonder if it makes sense to consider eliminating Swift entirely. We need to
  consider the scale of managing thousands of plans for separate edge
  deployments.

  I also think this would be a step towards undercloud simplification.

- Orchestration between plans. I think there's general agreement around scaling
  up the undercloud to be more effective at managing and deploying multiple
  plans.

  The plans could be different OpenStack deployments potentially sharing some
  resources. Or, they could be deployments of different software stacks
  (Kubernetes/OpenShift, Ceph, etc).

  We'll need to develop some common interfaces for some basic orchestration
  between plans. It could include dependencies, ordering, and sharing parameter
  data (such as passwords or connection info). There is already some ongoing
  discussion about some of this work:

  http://lists.openstack.org/pipermail/openstack-dev/2018-August/133247.html

  I would suspect this would start out as collecting specific use cases, and
  then figuring out the right generic interfaces.

- Multiple deployments of a single plan. This could be useful for doing many
  deployments that are all the same. Of course some info might be different
  such as network IP's, hostnames, and node specific details. We could have
  some generic input interfaces for those sorts of things without having to
  create new Heat stacks, which would allow re-using the same plan/stack for
  multiple deployments. When scaling to hundreds/thousands of edge deployments
  this could be really effective at side-stepping managing hundreds/thousands
  of Heat stacks.

  We may also need further separation between a plan and it's deployment state
  to have this modularity.

- Distributed management/application of configuration. Even though the
  configuration is portable (config-download), we may still want some
  automation around applying the deployment when not using the undercloud as a
  control node. I think things like ansible-runner or Ansible AWX could help
  here, or perhaps mistral-executor agents, or "mistral as a library". This
  would also make our workflows more portable.

- New documentation highlighting some or all of the above features and how to
  take advantage of it for new use cases (thousands of edge deployments, etc).
  I see this as a sort of "TripleO Edge Deployment Guide" that would highlight
  how to take advantage of TripleO for Edge/multi-site use cases.

Obviously all the ideas are a lot of work, and not something I think we'll
complete in a single cycle.

I'd like to pull a squad together focused on Edge/multi-site/multi-cloud and
TripleO. On that note, this squad could also work together with other
deployment projects that are looking at similar use cases and look to
collaborate.

If you're interested in working on this squad, I'd see our first tasks as
being:

- Brainstorming additional ideas to the above
- Breaking down ideas into actionable specs/blueprints for stein (and possibly
  future releases).
- Coming up with a consistent message around direction and vision for solving
  these deployment challenges.
- Bringing together ongoing work that relates to these use cases together so
  that we're all collaborating with shared vision and purpose and we can help
  prioritize reviews/ci/etc.
- Identifying any discussion items we need to work through in person at the
  upcoming Denver PTG.

I'm happy to help facilitate the squad. If you have any feedback on these ideas
or would like to join the squad, reply to the thread or sign up in the
etherpad:

https://etherpad.openstack.org/p/tripleo-edge-squad-status

I'm just referring to the squad as "Edge" for now, but we can also pick a
cooler owl themed name :).

-- 
-- James Slagle
--

Open Stack

[openstack-dev] [TripleO]Addressing Edge/Multi-site/Multi-cloud deployment use cases (new squad)

OpenStack

Community

Documentation

Branding & Legal