[openstack-dev] [TripleO]Addressing Edge/Multi-site/Multi-cloud deployment use cases (new squad)
James Slagle
james.slagle at gmail.com
Mon Aug 20 20:47:45 UTC 2018
As we start looking at how TripleO will address next generation deployment
needs such as Edge, multi-site, and multi-cloud, I'd like to kick off a
discussion around how TripleO can evolve and adapt to meet these new
challenges.
What are these challenges? I think the OpenStack Edge Whitepaper does a good
job summarizing some of them:
https://www.openstack.org/assets/edge/OpenStack-EdgeWhitepaper-v3-online.pdf
They include:
- management of distributed infrastructure
- massive scale (thousands instead of hundreds)
- limited network connectivity
- isolation of distributed sites
- orchestration of federated services across multiple sites
We already have a lot of ongoing work that directly or indirectly starts to
address some of these challenges. That work includes things like
config-download, split-controlplane, metalsmith integration, validations,
all-in-one, and standalone.
I laid out some initial ideas in a previous message:
http://lists.openstack.org/pipermail/openstack-dev/2018-July/132398.html
I'll be reviewing some of that here and going into a bit more detail.
These are some of the high level ideas I'd like to see TripleO start to
address:
- More separation between planning and deploying (likely to be further defined
in spec discussion). We've had these concepts for a while, but we need to do
a better job of surfacing them to users as deployments grow in size and
complexity.
With config-download, we can more easily separate the phases of rendering,
downloading, validating, and applying the configuration. As we increase in
scale to managing many deployments, we should take advantage of what each of
those phases offer.
The separation also makes the deployment more portable, as we should
eliminate any restrictions that force the undercloud to be the control node
applying the configuration.
- Management of multiple deployments from a single undercloud. This is of
course already possible today, but we need better docs and polish and more
testing to flush out any bugs.
- Plan and template management in git.
This could be an iterative step towards eliminating Swift in the undercloud.
Swift seemed like a natural choice at the time because it was an existing
OpenStack service. However, I think git would do a better job at tracking
history and comparing changes and is much more lightweight than Swift. We've
been managing the config-download directory as a git repo, and I like this
direction. For now, we are just putting the whole git repo in Swift, but I
wonder if it makes sense to consider eliminating Swift entirely. We need to
consider the scale of managing thousands of plans for separate edge
deployments.
I also think this would be a step towards undercloud simplification.
- Orchestration between plans. I think there's general agreement around scaling
up the undercloud to be more effective at managing and deploying multiple
plans.
The plans could be different OpenStack deployments potentially sharing some
resources. Or, they could be deployments of different software stacks
(Kubernetes/OpenShift, Ceph, etc).
We'll need to develop some common interfaces for some basic orchestration
between plans. It could include dependencies, ordering, and sharing parameter
data (such as passwords or connection info). There is already some ongoing
discussion about some of this work:
http://lists.openstack.org/pipermail/openstack-dev/2018-August/133247.html
I would suspect this would start out as collecting specific use cases, and
then figuring out the right generic interfaces.
- Multiple deployments of a single plan. This could be useful for doing many
deployments that are all the same. Of course some info might be different
such as network IP's, hostnames, and node specific details. We could have
some generic input interfaces for those sorts of things without having to
create new Heat stacks, which would allow re-using the same plan/stack for
multiple deployments. When scaling to hundreds/thousands of edge deployments
this could be really effective at side-stepping managing hundreds/thousands
of Heat stacks.
We may also need further separation between a plan and it's deployment state
to have this modularity.
- Distributed management/application of configuration. Even though the
configuration is portable (config-download), we may still want some
automation around applying the deployment when not using the undercloud as a
control node. I think things like ansible-runner or Ansible AWX could help
here, or perhaps mistral-executor agents, or "mistral as a library". This
would also make our workflows more portable.
- New documentation highlighting some or all of the above features and how to
take advantage of it for new use cases (thousands of edge deployments, etc).
I see this as a sort of "TripleO Edge Deployment Guide" that would highlight
how to take advantage of TripleO for Edge/multi-site use cases.
Obviously all the ideas are a lot of work, and not something I think we'll
complete in a single cycle.
I'd like to pull a squad together focused on Edge/multi-site/multi-cloud and
TripleO. On that note, this squad could also work together with other
deployment projects that are looking at similar use cases and look to
collaborate.
If you're interested in working on this squad, I'd see our first tasks as
being:
- Brainstorming additional ideas to the above
- Breaking down ideas into actionable specs/blueprints for stein (and possibly
future releases).
- Coming up with a consistent message around direction and vision for solving
these deployment challenges.
- Bringing together ongoing work that relates to these use cases together so
that we're all collaborating with shared vision and purpose and we can help
prioritize reviews/ci/etc.
- Identifying any discussion items we need to work through in person at the
upcoming Denver PTG.
I'm happy to help facilitate the squad. If you have any feedback on these ideas
or would like to join the squad, reply to the thread or sign up in the
etherpad:
https://etherpad.openstack.org/p/tripleo-edge-squad-status
I'm just referring to the squad as "Edge" for now, but we can also pick a
cooler owl themed name :).
--
-- James Slagle
--
More information about the OpenStack-dev
mailing list