[openstack-dev] [tripleo] prototype with standalone mode and remote edge compute nodes

Ben Nemec openstack at nemebean.com
Fri Jul 20 21:43:47 UTC 2018



On 07/20/2018 02:53 PM, James Slagle wrote:
> On Thu, Jul 19, 2018 at 7:13 PM, Ben Nemec <openstack at nemebean.com> wrote:
>>
>>
>> On 07/19/2018 03:37 PM, Emilien Macchi wrote:
>>>
>>> Today I played a little bit with Standalone deployment [1] to deploy a
>>> single OpenStack cloud without the need of an undercloud and overcloud.
>>> The use-case I am testing is the following:
>>> "As an operator, I want to deploy a single node OpenStack, that I can
>>> extend with remote compute nodes on the edge when needed."
>>>
>>> We still have a bunch of things to figure out so it works out of the box,
>>> but so far I was able to build something that worked, and I found useful to
>>> share it early to gather some feedback:
>>> https://gitlab.com/emacchi/tripleo-standalone-edge
>>>
>>> Keep in mind this is a proof of concept, based on upstream documentation
>>> and re-using 100% what is in TripleO today. The only thing I'm doing is to
>>> change the environment and the roles for the remote compute node.
>>> I plan to work on cleaning the manual steps that I had to do to make it
>>> working, like hardcoding some hiera parameters and figure out how to
>>> override ServiceNetmap.
>>>
>>> Anyway, feel free to test / ask questions / provide feedback.
>>
>>
>> What is the benefit of doing this over just using deployed server to install
>> a remote server from the central management system?  You need to have
>> connectivity back to the central location anyway.  Won't this become
>> unwieldy with a large number of edge nodes?  I thought we told people not to
>> use Packstack for multi-node deployments for exactly that reason.
>>
>> I guess my concern is that eliminating the undercloud makes sense for
>> single-node PoC's and development work, but for what sounds like a
>> production workload I feel like you're cutting off your nose to spite your
>> face.  In the interest of saving one VM's worth of resources, now all of
>> your day 2 operations have no built-in orchestration.  Every time you want
>> to change a configuration it's "copy new script to system, ssh to system,
>> run script, repeat for all systems.  So maybe this is a backdoor way to make
>> Ansible our API? ;-)
> 
> I believe Emilien was looking at this POC in part because of some
> input from me, so I will attempt to address your questions
> constructively.
> 
> What you're looking at here is exactly a POC. The deployment is a POC
> using the experimental standalone code. I think the use case as
> presented by Emilien is something worth considering:
> 
>>> "As an operator, I want to deploy a single node OpenStack, that I can
>>> extend with remote compute nodes on the edge when needed."
> 
> I wouldn't interpret that to mean much of anything around eliminating
> the undercloud, other than what is stated for the use case. I feel
> that  jumping to eliminating the undercloud would be an over
> simplification. The goal of the POC isn't packstack parity, or even
> necessarily a packstack like architecture.

Okay, this was the main disconnect for me.  I got the impression from 
the discussion up til now that eliminating the undercloud was part of 
the requirements.  Looking back at Emilien's original email I think I 
conflated the standalone PoC description with the use-case description. 
My bad.

> 
> One of the goals is to see if we can deploy separate disconnected
> stacks for Control and Compute. The standalone work happens to be a
> good way to test out some of the work around that. The use case was
> written to help describe and provide an overall picture of what is
> going on with this specific POC, with a focus towards the edge use
> case.
> 
> You make some points about centralized management and connectivity
> back to the central location. Those are the exact sorts of things we
> are thinking about when we consider how we will address edge
> deployments. If you haven't had a chance yet, check out the Edge
> Computing whitepaper from the foundation:
> 
> https://www.openstack.org/assets/edge/OpenStack-EdgeWhitepaper-v3-online.pdf
> 
> Particularly the challenges outlined around management and deployment
> tooling. For lack of anything better I'm calling these the 3 D's:
> - Decentralized
> - Distributed
> - Disconnected
> 
> How can TripleO address any of these?
> 
> For Decentralized, I'd like to see better separation between the
> planning and application of the deployment in TripleO. TripleO has had
> the concept of a plan for quite a while, and we've been using it very
> effectively for our deployment, but it is somewhat hidden from the
> operator. It's not entirely clear to the user that there is any
> separation between the plan and the stack, and what benefit there even
> is in the plan.

+1.  I was disappointed that we didn't adopt the plan as more of a 
first-class citizen for cli deployments after it was implemented.

> 
> I'd like to address some of that through API improvements around plan
> management and making the plan the top level thing being managed
> instead of a deployment. We're already moving in this direction with
> config-download and a lot of the changes we've made during Queens.
> 
> For better or worse, some other tools like Terraform call this out as
> one their main differentiators:
> 
> https://www.terraform.io/intro/vs/cloudformation.html (3rd paragraph).
> 
> TripleO has long separated the planning and application phases. We
> just need to do a better job at developing useful features around that
> work. The UI has been taking advantage of it more than anything else
> at this point. I'd like to focus a bit more on what benefits we get
> from the plan, and how we can turn these into operator value.
> 
> Imagine a scenario where you have a plan that has been deployed, and
> you want to make some changes. You upload a new plan, the plan is
> processed, we update a copy of the deployed stack (or perhaps
> ephemeral stack), run config-download, and the operator has the
> immediate feedback about what *would* be changed. Heat plays a role
> here in giving us a way to orchestrate the plan into a deployment
> model.
> 
> Ansible also plays a role in that we could take things a step further
> and run with --check to provide further feedback before anything is
> ever applied or updated. Ongoing work around new baremetal management
> workflows via metalsmith will give us more insight into planning the
> baremetal deployment. These tools (Heat/Ansible/Metalsmith/etc), they
> are technology choices. They are not architectures in and of
> themselves.
> 
> You have centralized management of the planning phase, whose output
> could be a set of playbooks applied in a decentralized way, such as
> provided via an API and downloaded to a remote site where an operator
> is sitting in a emergency response scenario with some "hardware in a
> box" that they want to deploy local compute/storage resources on to,
> and connect to a local network. Connectivity back to the centralized
> platform may or may not be required depending on what services are
> deployed.
> 
> For Distributed, I think of git. We have built-in git management of
> the config-download output. We are discussing (further) git management
> of the templates and processed plan. This gives operators some ability
> to manage the output in a distributive fashion, and make new changes
> outside of the centralized platform.
> 
> Perhaps in the future, we could offer an API/interface around pulling
> any changes back into the represented plan based on what an operator
> had changed. Sort of like a pull request for the plan, but by starting
> with the output.
> 
> Obviously, this needs a lot more definition and refining other than
> just "use git". Again, these efforts are about experimenting with use
> cases, not technology choices. To get us to those experiments quickly,
> it may look like we are making rash decisions about use X or Y, but
> that's not the driver here.

+1 again.  I argued to use git as the storage backend for plans in the 
first place. :-)  This isn't the exact use case I had in mind, but 
there's definitely overlap.

> 
> For Disconnected, it also ties into how we'd address decentralized and
> distributed. The choice of tooling helps, but it's not as simple as
> "use Ansible". Part of the reason we are looking at this POC, and how
> to deploy it easily is to investigate questions such as what happens
> to the deployed workloads if the compute loses connectivity to the
> control plane or management platform. We want to make sure TripleO can
> deploy something that can handle these sorts of scenarios. During
> periods of disconnection at the edge or other remote sites, operators
> may still need to make changes (see points about distributed above).

This is a requirement I was missing as well.  If you don't necessarily 
have connectivity back to the mothership and need to be able to manage 
the deployment anyway then the standalone part is obviously a necessity. 
  I'd be curious how this works with OpenStack in general, but like you 
said this is a PoC to find out.

> 
> Using the standalone deployment can help us quickly answer these
> questions and develop a "Steel Thread"[1] to build upon.
> 
> Ultimately, this is the sort of high level designs and architectures
> we are beginning to investigate. We are trying to let the use cases
> and operator need address the design, even while the use cases are
> still being better understood (see above whitepaper). It's not about
> "just use Ansible" or "rewrite the API".
> 
> [1] http://www.agiledevelopment.org/agile-talk/111-defining-acceptance-criteria-using-the-steel-thread-concept
> 
> 



More information about the OpenStack-dev mailing list