[Openstack-sigs] [meta] Proposal for self-healing SIG
Andrea Frittoli
andrea.frittoli at gmail.com
Tue Sep 19 18:21:52 UTC 2017
On Sun, Sep 17, 2017 at 11:34 PM Adam Spiers <aspiers at suse.com> wrote:
> Hi all,
>
> [TL;DR: we want to set up a "self-healing infrastructure" SIG.]
>
Nice!
>
> One of the biggest promises of the cloud vision was the idea that all
> the infrastructure could be managed in a policy-driven fashion,
> reacting to failures and other events by automatically healing and
> optimising services. Most of the components required to implement
> such an architecture already exist, e.g.
>
> - Monasca: Monitoring
> - Aodh: Alarming
> - Congress: Policy-based governance
> - Mistral: Workflow
> - Senlin: Clustering
> - Vitrage: Root Cause Analysis
> - Watcher: Optimization
> - Masakari: Compute plane HA
> - Freezer-dr: DR and compute plane HA
>
> However, there is not yet a clear strategy within the community for
> how these should all tie together.
>
> So at the PTG last week in Denver, we held an initial cross-project
> meeting to discuss this topic.[0] It was well-attended, with
> representation from almost all of the relevant projects, and it felt
> like a very productive session to me. I shall do my best to summarise
> whilst trying to avoid any misrepresentation ...
>
I'm sorry that I missed the session at the PTG :)
Do you have any plan / idea yet about how verification might look like for
the
integration between all the projects in your list and for self-healing
specific
scenarios?
During the QA sessions at the PTG we discussed about HA / fault tolerance
testing. There is a proposal for a community framework for that, however
we have no plan yet about where to run / how to maintain such tests for
OpenStack. It might be a fitting use case for this rising SIG.
Andrea Frittoli (andreaf)
>
> There was general agreement that the following actions would be
> worthwhile:
>
> - Document reference stacks describing what use cases can already be
> addressed with the existing projects. (Even better if some of
> these stacks have already been tested in the wild.)
>
> - Document what integrations between the projects already exist at a
> technical level. (We actually began this during the meeting, by
> placing the projects into phases of a high-level flow, and then
> collaboratively building a Google Drawing to show that.[1])
>
> - Collect real-world use cases from operators, including ones which
> they would like to accomplish but cannot yet.
>
> - From the above, perform gaps analysis to help shape the future
> direction of these projects, e.g. through specs targetting those
> gaps.
>
> - Perform overlap analysis to help ensure that the projects are
> correctly scoped and integrate well without duplicating any
> significant effort.[2]
>
> - Set up a SIG[3] to promote further discussion across the projects
> and with operators. I talked to Thierry afterwards, and
> consequently this email is the first step on that path :-)
>
> - Allocate the SIG a mailing list prefix - "[self-healing]" or
> similar.
>
> - Set up a bi-weekly IRC meeting for the SIG.
>
> - Continue the discussion at the Sydney Forum, since it's an ideal
> opportunity to get developers and operators together and decide
> what the next steps should be.
>
> - Continue the discussion at the next Ops meetup in Tokyo.
>
> I got coerced^Wvolunteered to drive the next steps ;-) So far I
> have created an etherpad proposing the Forum session[4], and added it
> to the Forum wiki page[5]. I'll also add it to the SIG wiki page[6].
>
> There were things we did not reach a concrete conclusion on:
>
> - What should the SIG be called? We felt that "self-healing" was
> pretty darn close to capturing the intent of the topic. However
> as a natural pedant, I couldn't help but notice that technically
> speaking, that would most undesirably exclude Watcher, because the
> optimization it provides isn't *quite* "healing" - the word
> "healing" implies that something is sick, and optimization can be
> applied even when the cloud is perfectly healthy. Any suggestions
> for a name with a marginally wider scope would be gratefully
> received.
>
> - Should the SIG be scoped to only focus on self-healing (and
> self-optimization) of OpenStack infrastructure, or should it also
> include self-healing of workloads? My feeling is that we should
> keep it scoped to the infrastructure which falls under the
> responsibility of the cloud operators; anything user-facing would
> be very different from a process perspective.
>
> - How should the SIG's governance be set up? Unfortunately it
> didn't occur to me to raise this question during the discussion,
> but I've since seen that the k8s SIG managed to make some
> decisions in this regard[7], and stealing their idea of a PTL-type
> model with a minimum of 2 chairs sounds good to me.
>
> - Which timezone the IRC meeting should be in? As usual, there were
> interested parties from all the usual continents, so no one time
> would suit everyone. I guess I can just submit a review to the
> irc-meetings repo and we can have a voting war in Gerrit ;-/
> Another option would be to alternate timezones every week or two.
>
> Feedback on any of this is of course most welcome! After sending
> this, I'll forward it to openstack-{dev,operators} and ask for any
> feedback to be submitted here.
>
> Thanks,
> Adam
>
>
> [0] https://etherpad.openstack.org/p/self-healing-queens-ptg
>
> [1] https://goo.gl/Pf2KgJ
>
> [2] Sampath (Masakari PTL), Saad (Freezer PTL), and I had a productive
> follow-up discussion on how we could aim to re-scope these two
> projects to avoid unnecessary duplication of effort.
>
> [3] https://ttx.re/introducing-sigs.html
>
> [4] https://etherpad.openstack.org/p/self-healing-rocky-forum
>
> [5] https://wiki.openstack.org/wiki/Forum/Sydney2017
>
> [6] https://wiki.openstack.org/wiki/OpenStack_SIGs
>
> [7] https://etherpad.openstack.org/p/queens-ptg-sig-k8s
>
> _______________________________________________
> Openstack-sigs mailing list
> Openstack-sigs at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-sigs/attachments/20170919/153a2b08/attachment.html>
More information about the openstack-sigs
mailing list