[Openstack-sigs] [meta] Proposal for self-healing SIG

Bogdan Dobrelya bdobreli at redhat.com
Wed Sep 20 09:05:13 UTC 2017

On 19.09.2017 22:53, Adam Spiers wrote:
> Hi Andrea,
> Andrea Frittoli <andrea.frittoli at gmail.com> wrote:
>> On Sun, Sep 17, 2017 at 11:34 PM Adam Spiers <aspiers at suse.com> wrote:
>>> [TL;DR: we want to set up a "self-healing infrastructure" SIG.]
>> Nice!
>>> One of the biggest promises of the cloud vision was the idea that all
>>> the infrastructure could be managed in a policy-driven fashion,
>>> reacting to failures and other events by automatically healing and
>>> optimising services.  Most of the components required to implement
>>> such an architecture already exist, e.g.
> [snipped]
>>> However, there is not yet a clear strategy within the community for
>>> how these should all tie together.
>>> So at the PTG last week in Denver, we held an initial cross-project
>>> meeting to discuss this topic.[0]  It was well-attended, with
>>> representation from almost all of the relevant projects, and it felt
>>> like a very productive session to me.  I shall do my best to summarise
>>> whilst trying to avoid any misrepresentation ...
>> I'm sorry that I missed the session at the PTG :)
> Sorry that I didn't think to invite QA ;-)
>> Do you have any plan / idea yet about how verification might look
>> like for the integration between all the projects in your list and
>> for self-healing specific scenarios?
> That's a great question!  But honestly - not yet.  It's a little early
> for that, as we've hardly started identifying the various use cases
> yet - that would be one of the first steps after formally establishing
> the SIG.

IMO, we should leverage existing specialized frameworks, like Jepsen
[0]. The Nemesis component supports *much* failure modes. The example
test scenarios and associated publications cover many existing database
and messaging solutions. Although community could contribute for the
latter as well, like adding RabbitMQ RPC example test cases to the repo,
or for a fork at least, I think the Nemesis "disruptor" would benefit us
the most.

[0] https://github.com/jepsen-io/jepsen

>> During the QA sessions at the PTG we discussed about HA / fault tolerance
>> testing. There is a proposal for a community framework for that, however
>> we have no plan yet about where to run / how to maintain such tests for
>> OpenStack.
> I'm guessing you are referring to this?
>    https://review.openstack.org/#/c/443504/
>> It might be a fitting use case for this rising SIG.
> Yes, indeed!  I hope you can join us and help with automating testing
> when it's needed :-)
> _______________________________________________
> Openstack-sigs mailing list
> Openstack-sigs at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs

Best regards,
Bogdan Dobrelya,
Irc #bogdando

More information about the openstack-sigs mailing list