[openstack-dev] Cloud Reliability and Resilience for OpenStack (Fault Injection, Chaos Engineering, and Google SRE)

Ilya Shakhat ishakhat at mirantis.com
Thu Sep 1 08:30:53 UTC 2016


Hi Jorge,

Reliability testing automation is the thing that OpenStack Performance team
[1] is working right now. The whole solution will consist of:
 * Fault injection library os-failures [2] that provides an API to do
different kind of faults on different OpenStack clouds. The lib is
currently under active development, our primary goal is to support
Fuel-based clouds and then DevStack-based.
 * Rally as an engine to run different types of test scenarios. As Boris
mentioned the main feature is called "hooks" - it allows to execute
arbitrary code at predefined points of scenario. In our case we will have
plugin that uses os-failures library.
 * Set of scenarios and reports to them - this will go into OpenStack
Performance Docs [3].
 * Rally plugin for results processing. Basically we are interested in
calculating the following metrics:
     * Count errors appeared during scenario execution (e.g. number of
failed requests)
     * Performance degradation - compare performance (e.g. operation
duration) after the failure against sample data collected before
     * MTTR - how long does it takes for all errors to disappear and how
long does it takes for performance to become normal

If you are interesting in contribution, we have meetings by Tuesday at
15:30 UTC at #openstack-performance IRC channel.

Thanks,
Ilya

[1] https://wiki.openstack.org/wiki/Performance_Team
[2] https://github.com/openstack/os-failures
[3] http://docs.openstack.org/developer/performance-docs/


2016-09-01 10:33 GMT+03:00 Boris Pavlovic <bpavlovic at mirantis.com>:

> Hi Jorge,
>
> Rally team is working on feature called "Hooks".
> "Hooks" are going to allow to use Rally to run workloads and inject any
> actions (including using existing Chaos frameworks)
>
> Here is the patch: https://review.openstack.org/#/c/352276/
> Here is merged spec: https://github.com/openstack/rally/blob/master/
> doc/specs/in-progress/hook_section.rst
>
>
> You are very welcome to join this effort and help Rally team deliver it
> faster.
>
> Thanks!
>
> Best regards,
> Boris Pavlovic
>
> On Wed, Aug 31, 2016 at 11:55 PM, Jorge Cardoso (Cloud Operations and
> Analytics, IT R&D Division) <Jorge.Cardoso at huawei.com> wrote:
>
>>
>>
>> Hi all,
>>
>>
>>
>> Is there any work being done on Reliability for OpenStack using e.g.
>> fault-injection, Chaos Engineering from Netflix, and Site Reliability
>> Engineering principles from Google?
>>
>>
>>
>> I only found this page in the documentation
>> http://docs.openstack.org/developer/performance-docs/test_
>> results/reliability/index.html#openstack-reliability-testing.
>>
>>
>>
>> I am working on Cloud Reliability and Resilience and I would like to
>> explore this area for OpenStack.
>>
>> You can check some of my interests and work at:
>> http://jorge-cardoso.github.io/research/
>>
>>
>>
>> Any interest from you guys?
>>
>> Any suggestions on how to proceed?
>>
>>
>>
>>
>>
>> Best Regards,
>>
>> Jorge Cardoso
>>
>>
>>
>>
>>
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160901/55d887fc/attachment.html>


More information about the OpenStack-dev mailing list