[openstack-dev] [tripleo] prototype with standalone mode and remote edge compute nodes

James Slagle james.slagle at gmail.com
Fri Jul 20 19:53:07 UTC 2018


On Thu, Jul 19, 2018 at 7:13 PM, Ben Nemec <openstack at nemebean.com> wrote:
>
>
> On 07/19/2018 03:37 PM, Emilien Macchi wrote:
>>
>> Today I played a little bit with Standalone deployment [1] to deploy a
>> single OpenStack cloud without the need of an undercloud and overcloud.
>> The use-case I am testing is the following:
>> "As an operator, I want to deploy a single node OpenStack, that I can
>> extend with remote compute nodes on the edge when needed."
>>
>> We still have a bunch of things to figure out so it works out of the box,
>> but so far I was able to build something that worked, and I found useful to
>> share it early to gather some feedback:
>> https://gitlab.com/emacchi/tripleo-standalone-edge
>>
>> Keep in mind this is a proof of concept, based on upstream documentation
>> and re-using 100% what is in TripleO today. The only thing I'm doing is to
>> change the environment and the roles for the remote compute node.
>> I plan to work on cleaning the manual steps that I had to do to make it
>> working, like hardcoding some hiera parameters and figure out how to
>> override ServiceNetmap.
>>
>> Anyway, feel free to test / ask questions / provide feedback.
>
>
> What is the benefit of doing this over just using deployed server to install
> a remote server from the central management system?  You need to have
> connectivity back to the central location anyway.  Won't this become
> unwieldy with a large number of edge nodes?  I thought we told people not to
> use Packstack for multi-node deployments for exactly that reason.
>
> I guess my concern is that eliminating the undercloud makes sense for
> single-node PoC's and development work, but for what sounds like a
> production workload I feel like you're cutting off your nose to spite your
> face.  In the interest of saving one VM's worth of resources, now all of
> your day 2 operations have no built-in orchestration.  Every time you want
> to change a configuration it's "copy new script to system, ssh to system,
> run script, repeat for all systems.  So maybe this is a backdoor way to make
> Ansible our API? ;-)

I believe Emilien was looking at this POC in part because of some
input from me, so I will attempt to address your questions
constructively.

What you're looking at here is exactly a POC. The deployment is a POC
using the experimental standalone code. I think the use case as
presented by Emilien is something worth considering:

>> "As an operator, I want to deploy a single node OpenStack, that I can
>> extend with remote compute nodes on the edge when needed."

I wouldn't interpret that to mean much of anything around eliminating
the undercloud, other than what is stated for the use case. I feel
that  jumping to eliminating the undercloud would be an over
simplification. The goal of the POC isn't packstack parity, or even
necessarily a packstack like architecture.

One of the goals is to see if we can deploy separate disconnected
stacks for Control and Compute. The standalone work happens to be a
good way to test out some of the work around that. The use case was
written to help describe and provide an overall picture of what is
going on with this specific POC, with a focus towards the edge use
case.

You make some points about centralized management and connectivity
back to the central location. Those are the exact sorts of things we
are thinking about when we consider how we will address edge
deployments. If you haven't had a chance yet, check out the Edge
Computing whitepaper from the foundation:

https://www.openstack.org/assets/edge/OpenStack-EdgeWhitepaper-v3-online.pdf

Particularly the challenges outlined around management and deployment
tooling. For lack of anything better I'm calling these the 3 D's:
- Decentralized
- Distributed
- Disconnected

How can TripleO address any of these?

For Decentralized, I'd like to see better separation between the
planning and application of the deployment in TripleO. TripleO has had
the concept of a plan for quite a while, and we've been using it very
effectively for our deployment, but it is somewhat hidden from the
operator. It's not entirely clear to the user that there is any
separation between the plan and the stack, and what benefit there even
is in the plan.

I'd like to address some of that through API improvements around plan
management and making the plan the top level thing being managed
instead of a deployment. We're already moving in this direction with
config-download and a lot of the changes we've made during Queens.

For better or worse, some other tools like Terraform call this out as
one their main differentiators:

https://www.terraform.io/intro/vs/cloudformation.html (3rd paragraph).

TripleO has long separated the planning and application phases. We
just need to do a better job at developing useful features around that
work. The UI has been taking advantage of it more than anything else
at this point. I'd like to focus a bit more on what benefits we get
from the plan, and how we can turn these into operator value.

Imagine a scenario where you have a plan that has been deployed, and
you want to make some changes. You upload a new plan, the plan is
processed, we update a copy of the deployed stack (or perhaps
ephemeral stack), run config-download, and the operator has the
immediate feedback about what *would* be changed. Heat plays a role
here in giving us a way to orchestrate the plan into a deployment
model.

Ansible also plays a role in that we could take things a step further
and run with --check to provide further feedback before anything is
ever applied or updated. Ongoing work around new baremetal management
workflows via metalsmith will give us more insight into planning the
baremetal deployment. These tools (Heat/Ansible/Metalsmith/etc), they
are technology choices. They are not architectures in and of
themselves.

You have centralized management of the planning phase, whose output
could be a set of playbooks applied in a decentralized way, such as
provided via an API and downloaded to a remote site where an operator
is sitting in a emergency response scenario with some "hardware in a
box" that they want to deploy local compute/storage resources on to,
and connect to a local network. Connectivity back to the centralized
platform may or may not be required depending on what services are
deployed.

For Distributed, I think of git. We have built-in git management of
the config-download output. We are discussing (further) git management
of the templates and processed plan. This gives operators some ability
to manage the output in a distributive fashion, and make new changes
outside of the centralized platform.

Perhaps in the future, we could offer an API/interface around pulling
any changes back into the represented plan based on what an operator
had changed. Sort of like a pull request for the plan, but by starting
with the output.

Obviously, this needs a lot more definition and refining other than
just "use git". Again, these efforts are about experimenting with use
cases, not technology choices. To get us to those experiments quickly,
it may look like we are making rash decisions about use X or Y, but
that's not the driver here.

For Disconnected, it also ties into how we'd address decentralized and
distributed. The choice of tooling helps, but it's not as simple as
"use Ansible". Part of the reason we are looking at this POC, and how
to deploy it easily is to investigate questions such as what happens
to the deployed workloads if the compute loses connectivity to the
control plane or management platform. We want to make sure TripleO can
deploy something that can handle these sorts of scenarios. During
periods of disconnection at the edge or other remote sites, operators
may still need to make changes (see points about distributed above).

Using the standalone deployment can help us quickly answer these
questions and develop a "Steel Thread"[1] to build upon.

Ultimately, this is the sort of high level designs and architectures
we are beginning to investigate. We are trying to let the use cases
and operator need address the design, even while the use cases are
still being better understood (see above whitepaper). It's not about
"just use Ansible" or "rewrite the API".

[1] http://www.agiledevelopment.org/agile-talk/111-defining-acceptance-criteria-using-the-steel-thread-concept


-- 
-- James Slagle
--



More information about the OpenStack-dev mailing list