[Openstack-sigs] [meta] Proposal for self-healing SIG

Andrea Frittoli andrea.frittoli at gmail.com
Tue Sep 19 18:21:52 UTC 2017

On Sun, Sep 17, 2017 at 11:34 PM Adam Spiers <aspiers at suse.com> wrote:

> Hi all,
> [TL;DR: we want to set up a "self-healing infrastructure" SIG.]

> One of the biggest promises of the cloud vision was the idea that all
> the infrastructure could be managed in a policy-driven fashion,
> reacting to failures and other events by automatically healing and
> optimising services.  Most of the components required to implement
> such an architecture already exist, e.g.
>   - Monasca: Monitoring
>   - Aodh: Alarming
>   - Congress: Policy-based governance
>   - Mistral: Workflow
>   - Senlin: Clustering
>   - Vitrage: Root Cause Analysis
>   - Watcher: Optimization
>   - Masakari: Compute plane HA
>   - Freezer-dr: DR and compute plane HA
> However, there is not yet a clear strategy within the community for
> how these should all tie together.
> So at the PTG last week in Denver, we held an initial cross-project
> meeting to discuss this topic.[0]  It was well-attended, with
> representation from almost all of the relevant projects, and it felt
> like a very productive session to me.  I shall do my best to summarise
> whilst trying to avoid any misrepresentation ...

I'm sorry that I missed the session at the PTG :)

Do you have any plan / idea yet about how verification might look like for
integration between all the projects in your list and for self-healing

During the QA sessions at the PTG we discussed about HA / fault tolerance
testing. There is a proposal for a community framework for that, however
we have no plan yet about where to run / how to maintain such tests for
OpenStack. It might be a fitting use case for this rising SIG.

Andrea Frittoli (andreaf)

> There was general agreement that the following actions would be
> worthwhile:
>   - Document reference stacks describing what use cases can already be
>     addressed with the existing projects.  (Even better if some of
>     these stacks have already been tested in the wild.)
>   - Document what integrations between the projects already exist at a
>     technical level.  (We actually began this during the meeting, by
>     placing the projects into phases of a high-level flow, and then
>     collaboratively building a Google Drawing to show that.[1])
>   - Collect real-world use cases from operators, including ones which
>     they would like to accomplish but cannot yet.
>   - From the above, perform gaps analysis to help shape the future
>     direction of these projects, e.g. through specs targetting those
>     gaps.
>   - Perform overlap analysis to help ensure that the projects are
>     correctly scoped and integrate well without duplicating any
>     significant effort.[2]
>   - Set up a SIG[3] to promote further discussion across the projects
>     and with operators.  I talked to Thierry afterwards, and
>     consequently this email is the first step on that path :-)
>   - Allocate the SIG a mailing list prefix - "[self-healing]" or
>     similar.
>   - Set up a bi-weekly IRC meeting for the SIG.
>   - Continue the discussion at the Sydney Forum, since it's an ideal
>     opportunity to get developers and operators together and decide
>     what the next steps should be.
>   - Continue the discussion at the next Ops meetup in Tokyo.
> I got coerced^Wvolunteered to drive the next steps ;-)  So far I
> have created an etherpad proposing the Forum session[4], and added it
> to the Forum wiki page[5].  I'll also add it to the SIG wiki page[6].
> There were things we did not reach a concrete conclusion on:
>   - What should the SIG be called?  We felt that "self-healing" was
>     pretty darn close to capturing the intent of the topic.  However
>     as a natural pedant, I couldn't help but notice that technically
>     speaking, that would most undesirably exclude Watcher, because the
>     optimization it provides isn't *quite* "healing" - the word
>     "healing" implies that something is sick, and optimization can be
>     applied even when the cloud is perfectly healthy.  Any suggestions
>     for a name with a marginally wider scope would be gratefully
>     received.
>   - Should the SIG be scoped to only focus on self-healing (and
>     self-optimization) of OpenStack infrastructure, or should it also
>     include self-healing of workloads?  My feeling is that we should
>     keep it scoped to the infrastructure which falls under the
>     responsibility of the cloud operators; anything user-facing would
>     be very different from a process perspective.
>   - How should the SIG's governance be set up?  Unfortunately it
>     didn't occur to me to raise this question during the discussion,
>     but I've since seen that the k8s SIG managed to make some
>     decisions in this regard[7], and stealing their idea of a PTL-type
>     model with a minimum of 2 chairs sounds good to me.
>   - Which timezone the IRC meeting should be in?  As usual, there were
>     interested parties from all the usual continents, so no one time
>     would suit everyone.  I guess I can just submit a review to the
>     irc-meetings repo and we can have a voting war in Gerrit ;-/
>     Another option would be to alternate timezones every week or two.
> Feedback on any of this is of course most welcome!  After sending
> this, I'll forward it to openstack-{dev,operators} and ask for any
> feedback to be submitted here.
> Thanks,
> Adam
>   [0] https://etherpad.openstack.org/p/self-healing-queens-ptg
>   [1] https://goo.gl/Pf2KgJ
>   [2] Sampath (Masakari PTL), Saad (Freezer PTL), and I had a productive
>       follow-up discussion on how we could aim to re-scope these two
>       projects to avoid unnecessary duplication of effort.
>   [3] https://ttx.re/introducing-sigs.html
>   [4] https://etherpad.openstack.org/p/self-healing-rocky-forum
>   [5] https://wiki.openstack.org/wiki/Forum/Sydney2017
>   [6] https://wiki.openstack.org/wiki/OpenStack_SIGs
>   [7] https://etherpad.openstack.org/p/queens-ptg-sig-k8s
> _______________________________________________
> Openstack-sigs mailing list
> Openstack-sigs at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-sigs/attachments/20170919/153a2b08/attachment.html>

More information about the openstack-sigs mailing list