[Openstack-sigs] [self-healing] Dublin PTG summary

Tim Bell Tim.Bell at cern.ch
Wed Mar 14 17:33:15 UTC 2018


Adam,

Happy to come along in Vancouver as soon as you have a slot. Thanks for all the work that's going into this SIG.

Tim

-----Original Message-----
From: Adam Spiers <aspiers at suse.com>
Reply-To: "openstack-sigs at lists.openstack.org" <openstack-sigs at lists.openstack.org>
Date: Wednesday, 14 March 2018 at 16:50
To: OpenStack SIGs list <openstack-sigs at lists.openstack.org>
Subject: [Openstack-sigs] [self-healing] Dublin PTG summary

    Hi all,
    
    Thanks to everyone who made it to the self-healing SIG session at the 
    Dublin PTG!  We had around 30 people in the end, some really 
    productive conversations, and even a group photo!  Although some 
    people preferred to stay in the nice warm meeting room rather than 
    risk the freezing temperatures of the "Beast from the East" ;-) 
    
       https://www.dropbox.com/sh/dtei3ovfi7z74vo/AAB3g-QiXB-TBvvZMltPtpcUa/Self%20Healing%20SIG?dl=0&preview=DSC_4333.JPG
    
    You can see the notes from the session here: 
    
       https://etherpad.openstack.org/p/self-healing-ptg-rocky
    
    but below is a summary of the highlights. 
    
    Documentation of capabilities of existing projects
    ==================================================
    
        https://storyboard.openstack.org/#!/story/2001430
    
    Prior to the PTG we started to collect information on all the existing 
    integration points available between different projects: 
    
        https://etherpad.openstack.org/p/self-healing-project-integrations
    
    I showed my initial attempts at visualisation the relationships, but we 
    concluded that trying to visualise all integrations on a single graph 
    is probably too hard to do in a useful way, so it probably makes more 
    sense to visualise and document architectures use case by use case. 
    Nevertheless, having a complete list of integration points should 
    serve as a useful component in future documentation. 
    
    Documentation repository
    ========================
    
    We agreed to start documenting use cases and specs in our self-healing 
    sig repository: 
    
        https://git.openstack.org/cgit/openstack/self-healing-sig
    
    Formerly this repository was empty and only existed so that we could 
    have StoryBoard project (this is currently a requirement due to the 
    way that StoryBoard projects are set up in OpenStack), but we realised 
    that it was a great place to collaborate on this information which 
    will pretty much always span a few projects, but not as many as 
    OpenStack's traditional cross-project initiatives do. 
    
    As a result I have populated it using the -specs cookiecutter 
    template, but still need to draft a template for use cases and tidy up 
    the index:
    
        https://storyboard.openstack.org/#!/story/2001628
    
    Discussions on various use cases
    ================================
    
    Some people are already keen to start work on use cases immediately, 
    which is great news.  I don't remember all the details, but one 
    example involved monitoring NICs on compute nodes, moving VMs away 
    from any compute node seen to be in a sub-optimal state (IIRC an 
    extension to Neutron's API might be needed to help with detecting 
    this), and then potentially handing over to an operator for manual 
    remediation after the automated self-healing phase. 
    
    I touched on this briefly during the presentation I gave on the 
    self-healing SIG to the London OpenStack Meetup on Monday night: 
    
        https://aspiers.github.io/openstack-meetup-london-march-2018-self-healing/#/use-case-1
    
    Stakeholders contact list
    =========================
    
    One challenge we might face when working on self-healing use cases 
    involving multiple projects is simply getting stuck with an issue 
    relating to a specific project and not knowing the best person to ask 
    for help.  Of course it is always possible (and indeed advisable) to 
    ask on IRC / mailing lists, but we decided that it would be helpful to 
    know in advance the names of individuals in each project who have 
    already declared an interest in self-healing and volunteered to help 
    out.  So I built this etherpad: 
    
        https://etherpad.openstack.org/p/self-healing-contacts
    
    Please consider signing up to help answer questions relating to your 
    area of expertise! 
    
    Health-check API
    ================
    
        https://storyboard.openstack.org/#!/story/2001439
    
    Discussion with members of the API SIG on the new health-check API 
    initiative:
    
        https://review.openstack.org/#/c/531456/
    
    There seemed to be consensus that the API SIG would own driving of the 
    implementation of the health-check API, whereas the self-healing SIG 
    would own the subsequent work to determine and document the various 
    use cases for effectively consuming the API.  So the self-healing SIG 
    would be the "customer" of the API SIG, which seems to make sense as a 
    nice way of structuring the work organisationally. 
    
    Automated testing
    =================
    
    There was broad agreement that any implementations of self-healing use 
    cases (including standard HA functionality) should have corresponding 
    automated tests.  OPNFV has already done a lot of good work in this 
    area, and there were a few folks from the OPNFV world present, so 
    hopefully the connections established will lead to collaboration and 
    reuse of existing work in the future. 
    
    Since the PTG we have had a few more people express a desire to join 
    this initiative, which is great news: 
    
        http://lists.openstack.org/pipermail/openstack-dev/2018-March/128020.html
        https://etherpad.openstack.org/p/extreme-testing-contacts
    
    Operator feedback
    =================
    
    It's *crucial* that we hear feedback from operators on their 
    real-world experiences and opinions about how self-healing could help 
    them, since ultimately making operators' lives easier is the primary 
    goal of the SIG. 
    
    Unfortunately there were no operators present in Dublin, but since 
    then we have managed to collect some feedback from the Tokyo Ops 
    Meetup and London OpenStack Meetup, and hopefully the Vancouver Forum 
    will be an ideal opportunity to hear more. 
    
    If you are an operator reading this, please don't be shy!  Let us know 
    what problems you would like to see the self-healing SIG focus on! 
    
    On that note, I'll end with a final plea: 
    
    Please join us!
    ===============
    
    All participation is very welcome, no matter how small.  At this stage 
    we're especially interested in hearing about people's experiences, 
    opinions and ideas: 
    
      - Are you already implementing any self-healing functionality in
        your OpenStack clouds?  If so, how are you doing it, and what's
        working and not working?
    
      - What self-healing functionality do you miss and would like to
        see implemented in the future?
    
    You can find us on: 
    
      - #openstack-self-healing on Freenode IRC
    
      - this openstack-sigs at lists.openstack.org mailing list
        (please use the [self-healing] tag)
    
    
    _______________________________________________
    openstack-sigs mailing list
    openstack-sigs at lists.openstack.org
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs
    



More information about the openstack-sigs mailing list