[openstack-dev] [Heat] Design summit preparation - Next steps for Heat Software Orchestration
Clint Byrum
clint at fewbar.com
Thu Apr 24 15:53:40 UTC 2014
Excerpts from Thomas Spatzier's message of 2014-04-22 09:42:14 -0700:
>
> Hi all,
>
> following up on Zane's request from end of last week, I wanted to kick off
> some discussion on the ML around a design summit session proposal titled "
> Next steps for Heat Software Orchestration". I guess there will be things
> that can be sorted out this way and others that can be refined so we can
> have a productive session in Atlanta. I am basically copying the complete
> contents of the session proposal below so we can iterate on various points.
> If it turns out that we need to split off threads, we can do that at a
> later point.
>
> The session proposal itself is here:
> http://summit.openstack.org/cfp/details/306
>
> And here are the details:
>
> With the Icehouse release, Heat includes implementation for software
> orchestration (Kudos to Steve Baker and Jun Jie Nan) which enables clean
> separation of any kind of software configuration from compute instances and
> thus enables a great new set of features. The implementation for software
> orchestration in Icehouse has probably been the major chunk of work to
> achieve a first end-to-end flow for software configuration thru scripts,
> Chef or Puppet, but there is more work to be done to enable Heat for more
> software orchestration use cases beyond the current support.
> Below are a couple of use cases, and more importantly, thoughts on design
> options of how those use cases can be addressed.
>
> #1 Enable software components for full lifecycle:
> With the current design, "software components" defined thru SoftwareConfig
> resources allow for only one config (e.g. one script) to be specified.
> Typically, however, a software component has a lifecycle that is hard to
> express in a single script. For example, software must be installed
> (created), there should be support for suspend/resume handling, and it
> should be possible to allow for deletion-logic. This is also in line with
> the general Heat resource lifecycle.
> By means of the optional 'actions' property of SoftwareConfig it is
> possible today to specify at which lifecycle action of a SoftwareDeployment
> resource the single config hook shall be executed at runtime. However, for
> modeling complete handling of a software component, this would require a
> number of separate SoftwareConfig and SoftwareDeployment resources to be
> defined which makes a template more verbose than it would have to be.
> As an optimization, SoftwareConfig could allow for providing several hooks
> to address all default lifecycle operations that would then be triggered
> thru the respective lifecycle actions of a SoftwareDeployment resource.
> Resulting SoftwareConfig definitions could then look like the one outlined
> below. I think this would fit nicely into the overall Heat resource model
> for actions beyond stack-create (suspend, resume, delete). Furthermore,
> this will also enable a closer alignment and straight-forward mapping to
> the TOSCA Simple Profile YAML work done at OASIS and the heat-translator
> StackForge project.
>
> So in a short, stripped-down version, SoftwareConfigs could look like
>
> my_sw_config:
> type: OS::Heat::SoftwareConfig
> properties:
> create_config: # the hook for software install
> suspend_config: # hook for suspend action
> resume_config: # hook for resume action
> delete_config: # hook for delete action
>
First off, modeling more actions is definitely on the near-term need
list for TripleO. We need to model rebuild and replace as an action too,
so that we can evacuate/migrate work loads off of compute nodes.
I think you can prototype action handling already with StructuredConfig,
which allows you to define your own structure underneath config:.
So you'd just simply do this:
my_sw_config:
type: OS::Heat::StructuredConfig
properties:
config:
create_config: |
#!/bin/bash
execute_stuff
delete_config: |
#!/bin/bash
execute_delete_stuff
Then your in-instance tools would simply write these as executables and
run them when the action dictates. Suspend and resume aren't currently
exposed, but I think that would be a fairly trivial patch to enable.
Note that IMO we should _not_ encourage users to embed executables in
templates. In addition to muddying the waters between orchestration
and configuration management, it is also a very poor model for long
term source code control. You lose things like renames and per-file
commit history. I think users will be much better served by other tools
bundling software together with a template, and I've always figured that
something like Murano or Solum would do that. So that you'd instead have
a template like this:
my_sw_config:
type: OS::Heat::StructuredConfig
properties:
config:
create_config: {get_file: create_hook.sh}
delete_config: {get_file: delete_hook.sh}
And if you're image-based, then you'd instead just bundle the image
creation definition and not even need to include instructions on code
injection in your template.
> When such a SoftwareConfig gets associated to a server via
> SoftwareDeployment, the SoftwareDeployment resource lifecycle
> implementation could trigger the respective hooks defined in SoftwareConfig
> (if a hook is not defined, a no-op is performed). This way, all config
> related to one piece of software is nicely defined in one place.
>
I don't believe "nicely defined in one place" is the right way to say
this. I would call it an unmanageable chunk of yaml.
One point, there is no "no-op". Heat's software config has an interface,
and in-instance tools do the work. I want to make sure we always stay
true to that. I don't want Heat to assume anything about what happens
inside instances. Heat exposes configs, and optionally waits for a
signal. Nothing more.
>
> #2 Enable add-hoc actions on software components:
> Apart from basic resource lifecycle hooks, it would be desirable to allow
> for invocation of add-hoc actions on software. Examples would be the ad-hoc
> creation of DB backups, application of patches, or creation of users for an
> application. Such hooks (implemented as scripts, Chef recipes or Puppet
> facts) could be defined in the same way as basic lifecycle hooks. They
> could be triggered by doing property updates on the respective
> SoftwareDeployment resources (just a thought and to be discussed during
> design sessions).
> I think this item could help bridging over to some discussions raised by
> the Murano team recently (my interpretation: being able to trigger actions
> from workflows). It would add a small feature on top of the current
> software orchestration in Heat and keep definitions in one place. And it
> would allow triggering by something or somebody else (e.g. a workflow)
> probably using existing APIs.
>
I'm a bit skeptical about anything to add to Heat that doesn't actually
change the end-goal as expressed by the user. What I mean is, backups
could certainly mean defining new volumes or swift containers and telling
servers to put data in them, but it isn't clear that it doesn't also
just mean copying all your data to a backup server already defined.
The distinction is that if we are careful to always defer to workflow
when workflow is required, then both cases have very clear points where
you change the workflow (the part that copies the data) or where you
change the orchestration (the part that manages the resources). And in
both cases, you always trigger the workflow to start a backup, not the
orchestration.
So, I'm interested to see where this goes, but I would like to make sure
that Heat stays out of the workflow business and instead continues to
provide clear integration points for workflow.
>
> #3 address known limitations of Heat software orchestration
> As of today, there already are a couple of know limitations or points where
> we have seen the need for additional discussion and design work. Below is a
> collection of such issues.
> Maybe some are already being worked on; others need more discussion.
>
> #3.1 software deployment should run just once:
> A bug has been raised because with today's implementation it can happen
> that SoftwareDeployments get executed multiple times. There has been some
> discussion around this issue but no final conclusion. An average user will
> however assume that his automation gets run only or exactly once. When
> using existing scripts, it would be an additional burden to require
> rewrites to cope with multiple invocations. Therefore, we should have a
> generic solution to the problem so that users do not have to deal with this
> complex problem.
>
This is entirely solvable by in-instance tools. Heat doesn't need to do
anything to support this. And if we spend time on it, we'll just get in
the way half the time. I mean, do we really need an API to avoid flag
files?
[ -e /var/lib/mystuff/already.ran ] && exit 0
# code here
touch /var/lib/mystuff/already.ran
For more sophisticated idempotency, I suggest using one of the more
sophisticated tools, such as puppet, chef, salt, cfengine.. etc.
However, If Heat allows a user to say "run this once" we have now
taken responsibility for lifecycle management and we have to ask
"once per what?". Once per filesystem lifecycle? Once per instance ID
lifecycle? Once per whole stack lifecycle?
> #3.2 dependency on heat-cfn-api:
> Some parts of current signaling still depend on the heat-cfn-api. While
> work seems underway to completely move to Heat native signaling, some
> cleanup to make sure this is used throughout the code.
>
AFAIK this is done.
> #3.3 connectivity of instances to heat engine API:
> The current metadata and signaling framework has certain dependencies on
> connectivity from VMs to the Heat engine API. With some network setups, and
> in some customer environments we hit limitations of access from VMs to the
> management server. What can be done to enable additional network setups?
>
I think our friends in Marconi, Sahara and Trove will be interested in
this discussion. I have some ideas, but no time to drive them, and that
is a bit out of scope for this email.
> #3.4 number of created keystone users for deployments:
> It has been pointed out that a large number of keystone users get created
> for deployment and concerns have been raised that this could be a problem
> for large deployments.
>
Keystone doesn't currently have a way to down-scope a token. So if you
want to limit an instance to queries for a single resource in a template,
you currently need a user. I think the answer may be OAUTH, but frankly,
this just means lots of OAUTH tokens instead of lots of users. I'm not
sure that actually buys us any performance.
> #3.5 support of server groups:
> How can a clean model look like where software configs get deployed on
> server groups instead of single servers. What is the recommended modeling
> and semantics?
>
My thinking on this is that we need to make sure autoscaling groups are
resource groups, and that we can individually address the members of the
group, as well as the group as a whole.
> #3.6 handling of stack updates for software config:
> Stack updates are not cleanly supported with the initial software
> orchestration implementation. #1 above could address this issue, but do we
> have to do something in addition?
>
Sorry I'm not sure I follow. Updates work fine today. What isn't clean
about them?
> #3.7 stack-abandon and stack-adopt for software-config:
> Issues have been found for stack-abandon and stack-adopt with software
> configs that need to be addressed. Can this be handled by additional hooks
> as lined out under #1?
>
As defined, abandon and adopt are supposed to be "hands-off" operations.
They're for mucking with Heat's internal model when it is inconsistent
with reality. So I'm not sure we want to add _anything_ to the template
language that acknowledges they even exist, or they'll be denied their
super powers that allow them to operate in the shadows.
More information about the OpenStack-dev
mailing list