[Openstack-sigs] [meta] Proposal for self-healing SIG

Adam Spiers aspiers at suse.com
Wed Oct 4 15:38:09 UTC 2017

So far we've received 10 replies to the Google Form survey regarding 
creation of a self-healing SIG.  All 10 respondents support the 
creation of the SIG, so I think that's a clear mandate to move 
forwards with this. 

However other aspects were not so clear ...  6 people think the SIG's 
scope should be kept just to cloud infrastructure, vs. 3 who thought 
it should also include end users and cloud applications.  With such a 
small sample population size, that's not a huge majority ;-) 

Even more tricky is the issue of what to name the SIG.  Only 4 people 
thought that the "self-healing" name is OK; 3 had other ideas for the 
name, and 3 more wanted another name but couldn't think of anything 
better.  (In fact there would have been 4 in the latter group if I'd 
allowed myself to vote.) 

I see no hurry to close the poll, so we can just leave it open and see 
if any brainwaves appear :-) 

That said, I'd prefer to avoid blocking progress on getting the SIG 
off the ground simply due to the lack of clear answers to these two 
questions.  Even if we don't manage to reach consensus on the name and 
scope before Sydney, hopefully it's something we could figure out in 
Sydney.  And it should be easy to rename the SIG on the wiki; 
presumably likewise with a Forum session.  The only things which would 
be a bit trickier to rename would be an IRC channel and IRC meeting, 
so maybe we should hold off on creating those at least. 

Any other thoughts or suggestions at this point?  Thanks!

Adam Spiers <aspiers at suse.com> wrote: 
>Hi there, 
>If you are receiving this via Bcc, then either you have already 
>shown interest in the idea of a new self-healing SIG proposed below 
>(in which case thank you!), or I have reason to believe that you 
>might be :-) 
>Or maybe you're just subscribed to this SIG list. 
>Regardless of how it arrived in your inbox, it would be extremely 
>helpful if you could spend a couple of minutes giving your opinion 
>on a few simple questions so that we can decide how best to set up 
>the SIG (if at all): 
>  https://goo.gl/forms/UBzBoHtOaD9CKsi73 
>Of course feedback by email is also welcome.
>Thanks a lot!
>Adam Spiers <aspiers at suse.com> wrote:
>>Hi all,
>>[TL;DR: we want to set up a "self-healing infrastructure" SIG.]
>>One of the biggest promises of the cloud vision was the idea that all
>>the infrastructure could be managed in a policy-driven fashion,
>>reacting to failures and other events by automatically healing and
>>optimising services.  Most of the components required to implement
>>such an architecture already exist, e.g.
>> - Monasca: Monitoring
>> - Aodh: Alarming
>> - Congress: Policy-based governance
>> - Mistral: Workflow
>> - Senlin: Clustering
>> - Vitrage: Root Cause Analysis
>> - Watcher: Optimization
>> - Masakari: Compute plane HA
>> - Freezer-dr: DR and compute plane HA
>>However, there is not yet a clear strategy within the community for
>>how these should all tie together.
>>So at the PTG last week in Denver, we held an initial cross-project
>>meeting to discuss this topic.[0]  It was well-attended, with
>>representation from almost all of the relevant projects, and it felt
>>like a very productive session to me.  I shall do my best to summarise
>>whilst trying to avoid any misrepresentation ...
>>There was general agreement that the following actions would be
>> - Document reference stacks describing what use cases can already be
>>   addressed with the existing projects.  (Even better if some of
>>   these stacks have already been tested in the wild.)
>> - Document what integrations between the projects already exist at a
>>   technical level.  (We actually began this during the meeting, by
>>   placing the projects into phases of a high-level flow, and then
>>   collaboratively building a Google Drawing to show that.[1])
>> - Collect real-world use cases from operators, including ones which
>>   they would like to accomplish but cannot yet.
>> - From the above, perform gaps analysis to help shape the future
>>   direction of these projects, e.g. through specs targetting those
>>   gaps.
>> - Perform overlap analysis to help ensure that the projects are
>>   correctly scoped and integrate well without duplicating any
>>   significant effort.[2]
>> - Set up a SIG[3] to promote further discussion across the projects
>>   and with operators.  I talked to Thierry afterwards, and
>>   consequently this email is the first step on that path :-)
>> - Allocate the SIG a mailing list prefix - "[self-healing]" or
>>   similar.
>> - Set up a bi-weekly IRC meeting for the SIG.
>> - Continue the discussion at the Sydney Forum, since it's an ideal
>>   opportunity to get developers and operators together and decide
>>   what the next steps should be.
>> - Continue the discussion at the next Ops meetup in Tokyo.
>>I got coerced^Wvolunteered to drive the next steps ;-)  So far I
>>have created an etherpad proposing the Forum session[4], and added it
>>to the Forum wiki page[5].  I'll also add it to the SIG wiki page[6].
>>There were things we did not reach a concrete conclusion on:
>> - What should the SIG be called?  We felt that "self-healing" was
>>   pretty darn close to capturing the intent of the topic.  However
>>   as a natural pedant, I couldn't help but notice that technically
>>   speaking, that would most undesirably exclude Watcher, because the
>>   optimization it provides isn't *quite* "healing" - the word
>>   "healing" implies that something is sick, and optimization can be
>>   applied even when the cloud is perfectly healthy.  Any suggestions
>>   for a name with a marginally wider scope would be gratefully
>>   received.
>> - Should the SIG be scoped to only focus on self-healing (and
>>   self-optimization) of OpenStack infrastructure, or should it also
>>   include self-healing of workloads?  My feeling is that we should
>>   keep it scoped to the infrastructure which falls under the
>>   responsibility of the cloud operators; anything user-facing would
>>   be very different from a process perspective.
>> - How should the SIG's governance be set up?  Unfortunately it
>>   didn't occur to me to raise this question during the discussion,
>>   but I've since seen that the k8s SIG managed to make some
>>   decisions in this regard[7], and stealing their idea of a PTL-type
>>   model with a minimum of 2 chairs sounds good to me.
>> - Which timezone the IRC meeting should be in?  As usual, there were
>>   interested parties from all the usual continents, so no one time
>>   would suit everyone.  I guess I can just submit a review to the
>>   irc-meetings repo and we can have a voting war in Gerrit ;-/
>>   Another option would be to alternate timezones every week or two.
>>Feedback on any of this is of course most welcome!  After sending
>>this, I'll forward it to openstack-{dev,operators} and ask for any
>>feedback to be submitted here.
>> [0] https://etherpad.openstack.org/p/self-healing-queens-ptg
>> [1] https://goo.gl/Pf2KgJ
>> [2] Sampath (Masakari PTL), Saad (Freezer PTL), and I had a productive
>>     follow-up discussion on how we could aim to re-scope these two
>>     projects to avoid unnecessary duplication of effort.
>> [3] https://ttx.re/introducing-sigs.html
>> [4] https://etherpad.openstack.org/p/self-healing-rocky-forum
>> [5] https://wiki.openstack.org/wiki/Forum/Sydney2017
>> [6] https://wiki.openstack.org/wiki/OpenStack_SIGs
>> [7] https://etherpad.openstack.org/p/queens-ptg-sig-k8s
>>Openstack-sigs mailing list
>>Openstack-sigs at lists.openstack.org
>openstack-sigs mailing list
>openstack-sigs at lists.openstack.org

More information about the openstack-sigs mailing list