[Openstack-sigs] [meta] Proposal for self-healing SIG

Adam Spiers aspiers at suse.com
Wed Sep 20 13:35:57 UTC 2017

Hi there,

If you are receiving this via Bcc, then either you have already
shown interest in the idea of a new self-healing SIG proposed below
(in which case thank you!), or I have reason to believe that you
might be :-)

Or maybe you're just subscribed to this SIG list.

Regardless of how it arrived in your inbox, it would be extremely
helpful if you could spend a couple of minutes giving your opinion
on a few simple questions so that we can decide how best to set up
the SIG (if at all):


Of course feedback by email is also welcome.

Thanks a lot!

Adam Spiers <aspiers at suse.com> wrote:
>Hi all,
>[TL;DR: we want to set up a "self-healing infrastructure" SIG.]
>One of the biggest promises of the cloud vision was the idea that all
>the infrastructure could be managed in a policy-driven fashion,
>reacting to failures and other events by automatically healing and
>optimising services.  Most of the components required to implement
>such an architecture already exist, e.g.
>  - Monasca: Monitoring
>  - Aodh: Alarming
>  - Congress: Policy-based governance
>  - Mistral: Workflow
>  - Senlin: Clustering
>  - Vitrage: Root Cause Analysis
>  - Watcher: Optimization
>  - Masakari: Compute plane HA
>  - Freezer-dr: DR and compute plane HA
>However, there is not yet a clear strategy within the community for
>how these should all tie together.
>So at the PTG last week in Denver, we held an initial cross-project
>meeting to discuss this topic.[0]  It was well-attended, with
>representation from almost all of the relevant projects, and it felt
>like a very productive session to me.  I shall do my best to summarise
>whilst trying to avoid any misrepresentation ...
>There was general agreement that the following actions would be
>  - Document reference stacks describing what use cases can already be
>    addressed with the existing projects.  (Even better if some of
>    these stacks have already been tested in the wild.)
>  - Document what integrations between the projects already exist at a
>    technical level.  (We actually began this during the meeting, by
>    placing the projects into phases of a high-level flow, and then
>    collaboratively building a Google Drawing to show that.[1])
>  - Collect real-world use cases from operators, including ones which
>    they would like to accomplish but cannot yet.
>  - From the above, perform gaps analysis to help shape the future
>    direction of these projects, e.g. through specs targetting those
>    gaps.
>  - Perform overlap analysis to help ensure that the projects are
>    correctly scoped and integrate well without duplicating any
>    significant effort.[2]
>  - Set up a SIG[3] to promote further discussion across the projects
>    and with operators.  I talked to Thierry afterwards, and
>    consequently this email is the first step on that path :-)
>  - Allocate the SIG a mailing list prefix - "[self-healing]" or
>    similar.
>  - Set up a bi-weekly IRC meeting for the SIG.
>  - Continue the discussion at the Sydney Forum, since it's an ideal
>    opportunity to get developers and operators together and decide
>    what the next steps should be.
>  - Continue the discussion at the next Ops meetup in Tokyo.
>I got coerced^Wvolunteered to drive the next steps ;-)  So far I
>have created an etherpad proposing the Forum session[4], and added it
>to the Forum wiki page[5].  I'll also add it to the SIG wiki page[6].
>There were things we did not reach a concrete conclusion on:
>  - What should the SIG be called?  We felt that "self-healing" was
>    pretty darn close to capturing the intent of the topic.  However
>    as a natural pedant, I couldn't help but notice that technically
>    speaking, that would most undesirably exclude Watcher, because the
>    optimization it provides isn't *quite* "healing" - the word
>    "healing" implies that something is sick, and optimization can be
>    applied even when the cloud is perfectly healthy.  Any suggestions
>    for a name with a marginally wider scope would be gratefully
>    received.
>  - Should the SIG be scoped to only focus on self-healing (and
>    self-optimization) of OpenStack infrastructure, or should it also
>    include self-healing of workloads?  My feeling is that we should
>    keep it scoped to the infrastructure which falls under the
>    responsibility of the cloud operators; anything user-facing would
>    be very different from a process perspective.
>  - How should the SIG's governance be set up?  Unfortunately it
>    didn't occur to me to raise this question during the discussion,
>    but I've since seen that the k8s SIG managed to make some
>    decisions in this regard[7], and stealing their idea of a PTL-type
>    model with a minimum of 2 chairs sounds good to me.
>  - Which timezone the IRC meeting should be in?  As usual, there were
>    interested parties from all the usual continents, so no one time
>    would suit everyone.  I guess I can just submit a review to the
>    irc-meetings repo and we can have a voting war in Gerrit ;-/
>    Another option would be to alternate timezones every week or two.
>Feedback on any of this is of course most welcome!  After sending
>this, I'll forward it to openstack-{dev,operators} and ask for any
>feedback to be submitted here.
>  [0] https://etherpad.openstack.org/p/self-healing-queens-ptg
>  [1] https://goo.gl/Pf2KgJ
>  [2] Sampath (Masakari PTL), Saad (Freezer PTL), and I had a productive
>      follow-up discussion on how we could aim to re-scope these two
>      projects to avoid unnecessary duplication of effort.
>  [3] https://ttx.re/introducing-sigs.html
>  [4] https://etherpad.openstack.org/p/self-healing-rocky-forum
>  [5] https://wiki.openstack.org/wiki/Forum/Sydney2017
>  [6] https://wiki.openstack.org/wiki/OpenStack_SIGs
>  [7] https://etherpad.openstack.org/p/queens-ptg-sig-k8s
>Openstack-sigs mailing list
>Openstack-sigs at lists.openstack.org

More information about the openstack-sigs mailing list