[Openstack-sigs] [self-healing] Dublin PTG summary

Adam Spiers aspiers at suse.com
Wed Mar 14 15:49:31 UTC 2018


Hi all,

Thanks to everyone who made it to the self-healing SIG session at the 
Dublin PTG!  We had around 30 people in the end, some really 
productive conversations, and even a group photo!  Although some 
people preferred to stay in the nice warm meeting room rather than 
risk the freezing temperatures of the "Beast from the East" ;-) 

   https://www.dropbox.com/sh/dtei3ovfi7z74vo/AAB3g-QiXB-TBvvZMltPtpcUa/Self%20Healing%20SIG?dl=0&preview=DSC_4333.JPG

You can see the notes from the session here: 

   https://etherpad.openstack.org/p/self-healing-ptg-rocky

but below is a summary of the highlights. 

Documentation of capabilities of existing projects
==================================================

    https://storyboard.openstack.org/#!/story/2001430

Prior to the PTG we started to collect information on all the existing 
integration points available between different projects: 

    https://etherpad.openstack.org/p/self-healing-project-integrations

I showed my initial attempts at visualisation the relationships, but we 
concluded that trying to visualise all integrations on a single graph 
is probably too hard to do in a useful way, so it probably makes more 
sense to visualise and document architectures use case by use case. 
Nevertheless, having a complete list of integration points should 
serve as a useful component in future documentation. 

Documentation repository
========================

We agreed to start documenting use cases and specs in our self-healing 
sig repository: 

    https://git.openstack.org/cgit/openstack/self-healing-sig

Formerly this repository was empty and only existed so that we could 
have StoryBoard project (this is currently a requirement due to the 
way that StoryBoard projects are set up in OpenStack), but we realised 
that it was a great place to collaborate on this information which 
will pretty much always span a few projects, but not as many as 
OpenStack's traditional cross-project initiatives do. 

As a result I have populated it using the -specs cookiecutter 
template, but still need to draft a template for use cases and tidy up 
the index:

    https://storyboard.openstack.org/#!/story/2001628

Discussions on various use cases
================================

Some people are already keen to start work on use cases immediately, 
which is great news.  I don't remember all the details, but one 
example involved monitoring NICs on compute nodes, moving VMs away 
from any compute node seen to be in a sub-optimal state (IIRC an 
extension to Neutron's API might be needed to help with detecting 
this), and then potentially handing over to an operator for manual 
remediation after the automated self-healing phase. 

I touched on this briefly during the presentation I gave on the 
self-healing SIG to the London OpenStack Meetup on Monday night: 

    https://aspiers.github.io/openstack-meetup-london-march-2018-self-healing/#/use-case-1

Stakeholders contact list
=========================

One challenge we might face when working on self-healing use cases 
involving multiple projects is simply getting stuck with an issue 
relating to a specific project and not knowing the best person to ask 
for help.  Of course it is always possible (and indeed advisable) to 
ask on IRC / mailing lists, but we decided that it would be helpful to 
know in advance the names of individuals in each project who have 
already declared an interest in self-healing and volunteered to help 
out.  So I built this etherpad: 

    https://etherpad.openstack.org/p/self-healing-contacts

Please consider signing up to help answer questions relating to your 
area of expertise! 

Health-check API
================

    https://storyboard.openstack.org/#!/story/2001439

Discussion with members of the API SIG on the new health-check API 
initiative:

    https://review.openstack.org/#/c/531456/

There seemed to be consensus that the API SIG would own driving of the 
implementation of the health-check API, whereas the self-healing SIG 
would own the subsequent work to determine and document the various 
use cases for effectively consuming the API.  So the self-healing SIG 
would be the "customer" of the API SIG, which seems to make sense as a 
nice way of structuring the work organisationally. 

Automated testing
=================

There was broad agreement that any implementations of self-healing use 
cases (including standard HA functionality) should have corresponding 
automated tests.  OPNFV has already done a lot of good work in this 
area, and there were a few folks from the OPNFV world present, so 
hopefully the connections established will lead to collaboration and 
reuse of existing work in the future. 

Since the PTG we have had a few more people express a desire to join 
this initiative, which is great news: 

    http://lists.openstack.org/pipermail/openstack-dev/2018-March/128020.html
    https://etherpad.openstack.org/p/extreme-testing-contacts

Operator feedback
=================

It's *crucial* that we hear feedback from operators on their 
real-world experiences and opinions about how self-healing could help 
them, since ultimately making operators' lives easier is the primary 
goal of the SIG. 

Unfortunately there were no operators present in Dublin, but since 
then we have managed to collect some feedback from the Tokyo Ops 
Meetup and London OpenStack Meetup, and hopefully the Vancouver Forum 
will be an ideal opportunity to hear more. 

If you are an operator reading this, please don't be shy!  Let us know 
what problems you would like to see the self-healing SIG focus on! 

On that note, I'll end with a final plea: 

Please join us!
===============

All participation is very welcome, no matter how small.  At this stage 
we're especially interested in hearing about people's experiences, 
opinions and ideas: 

  - Are you already implementing any self-healing functionality in
    your OpenStack clouds?  If so, how are you doing it, and what's
    working and not working?

  - What self-healing functionality do you miss and would like to
    see implemented in the future?

You can find us on: 

  - #openstack-self-healing on Freenode IRC

  - this openstack-sigs at lists.openstack.org mailing list
    (please use the [self-healing] tag)




More information about the openstack-sigs mailing list