[Openstack-sigs] [self-healing] Dublin PTG summary
Tim Bell
Tim.Bell at cern.ch
Wed Mar 14 17:33:15 UTC 2018
Adam,
Happy to come along in Vancouver as soon as you have a slot. Thanks for all the work that's going into this SIG.
Tim
-----Original Message-----
From: Adam Spiers <aspiers at suse.com>
Reply-To: "openstack-sigs at lists.openstack.org" <openstack-sigs at lists.openstack.org>
Date: Wednesday, 14 March 2018 at 16:50
To: OpenStack SIGs list <openstack-sigs at lists.openstack.org>
Subject: [Openstack-sigs] [self-healing] Dublin PTG summary
Hi all,
Thanks to everyone who made it to the self-healing SIG session at the
Dublin PTG! We had around 30 people in the end, some really
productive conversations, and even a group photo! Although some
people preferred to stay in the nice warm meeting room rather than
risk the freezing temperatures of the "Beast from the East" ;-)
https://www.dropbox.com/sh/dtei3ovfi7z74vo/AAB3g-QiXB-TBvvZMltPtpcUa/Self%20Healing%20SIG?dl=0&preview=DSC_4333.JPG
You can see the notes from the session here:
https://etherpad.openstack.org/p/self-healing-ptg-rocky
but below is a summary of the highlights.
Documentation of capabilities of existing projects
==================================================
https://storyboard.openstack.org/#!/story/2001430
Prior to the PTG we started to collect information on all the existing
integration points available between different projects:
https://etherpad.openstack.org/p/self-healing-project-integrations
I showed my initial attempts at visualisation the relationships, but we
concluded that trying to visualise all integrations on a single graph
is probably too hard to do in a useful way, so it probably makes more
sense to visualise and document architectures use case by use case.
Nevertheless, having a complete list of integration points should
serve as a useful component in future documentation.
Documentation repository
========================
We agreed to start documenting use cases and specs in our self-healing
sig repository:
https://git.openstack.org/cgit/openstack/self-healing-sig
Formerly this repository was empty and only existed so that we could
have StoryBoard project (this is currently a requirement due to the
way that StoryBoard projects are set up in OpenStack), but we realised
that it was a great place to collaborate on this information which
will pretty much always span a few projects, but not as many as
OpenStack's traditional cross-project initiatives do.
As a result I have populated it using the -specs cookiecutter
template, but still need to draft a template for use cases and tidy up
the index:
https://storyboard.openstack.org/#!/story/2001628
Discussions on various use cases
================================
Some people are already keen to start work on use cases immediately,
which is great news. I don't remember all the details, but one
example involved monitoring NICs on compute nodes, moving VMs away
from any compute node seen to be in a sub-optimal state (IIRC an
extension to Neutron's API might be needed to help with detecting
this), and then potentially handing over to an operator for manual
remediation after the automated self-healing phase.
I touched on this briefly during the presentation I gave on the
self-healing SIG to the London OpenStack Meetup on Monday night:
https://aspiers.github.io/openstack-meetup-london-march-2018-self-healing/#/use-case-1
Stakeholders contact list
=========================
One challenge we might face when working on self-healing use cases
involving multiple projects is simply getting stuck with an issue
relating to a specific project and not knowing the best person to ask
for help. Of course it is always possible (and indeed advisable) to
ask on IRC / mailing lists, but we decided that it would be helpful to
know in advance the names of individuals in each project who have
already declared an interest in self-healing and volunteered to help
out. So I built this etherpad:
https://etherpad.openstack.org/p/self-healing-contacts
Please consider signing up to help answer questions relating to your
area of expertise!
Health-check API
================
https://storyboard.openstack.org/#!/story/2001439
Discussion with members of the API SIG on the new health-check API
initiative:
https://review.openstack.org/#/c/531456/
There seemed to be consensus that the API SIG would own driving of the
implementation of the health-check API, whereas the self-healing SIG
would own the subsequent work to determine and document the various
use cases for effectively consuming the API. So the self-healing SIG
would be the "customer" of the API SIG, which seems to make sense as a
nice way of structuring the work organisationally.
Automated testing
=================
There was broad agreement that any implementations of self-healing use
cases (including standard HA functionality) should have corresponding
automated tests. OPNFV has already done a lot of good work in this
area, and there were a few folks from the OPNFV world present, so
hopefully the connections established will lead to collaboration and
reuse of existing work in the future.
Since the PTG we have had a few more people express a desire to join
this initiative, which is great news:
http://lists.openstack.org/pipermail/openstack-dev/2018-March/128020.html
https://etherpad.openstack.org/p/extreme-testing-contacts
Operator feedback
=================
It's *crucial* that we hear feedback from operators on their
real-world experiences and opinions about how self-healing could help
them, since ultimately making operators' lives easier is the primary
goal of the SIG.
Unfortunately there were no operators present in Dublin, but since
then we have managed to collect some feedback from the Tokyo Ops
Meetup and London OpenStack Meetup, and hopefully the Vancouver Forum
will be an ideal opportunity to hear more.
If you are an operator reading this, please don't be shy! Let us know
what problems you would like to see the self-healing SIG focus on!
On that note, I'll end with a final plea:
Please join us!
===============
All participation is very welcome, no matter how small. At this stage
we're especially interested in hearing about people's experiences,
opinions and ideas:
- Are you already implementing any self-healing functionality in
your OpenStack clouds? If so, how are you doing it, and what's
working and not working?
- What self-healing functionality do you miss and would like to
see implemented in the future?
You can find us on:
- #openstack-self-healing on Freenode IRC
- this openstack-sigs at lists.openstack.org mailing list
(please use the [self-healing] tag)
_______________________________________________
openstack-sigs mailing list
openstack-sigs at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs
More information about the openstack-sigs
mailing list