[openstack-dev] [fuel] [HA] How long we need to wait for cloud recovery after some destructive scenarios?

Timur Nurlygayanov tnurlygayanov at mirantis.com
Wed Jun 3 20:05:18 UTC 2015


Anastasia, thank you!

On Wed, Jun 3, 2015 at 1:55 PM, Anastasia Urlapova <aurlapova at mirantis.com>
wrote:

> Timur,
> some numbers and devs recommendations you can find by link[0], it is our
> HA Guid, feel free to contribute.
>
> Nastya.
>
> [0]
> https://wiki.openstack.org/wiki/HAGuideImprovements/TOC#HA_Intro_and_Concepts
>
> On Wed, Jun 3, 2015 at 1:06 PM, Timur Nurlygayanov <
> tnurlygayanov at mirantis.com> wrote:
>
>> Looks like I forgot to add the link to [1] in the first email:
>>
>> [1] https://github.com/stackforge/haos
>>
>> On Wed, Jun 3, 2015 at 12:50 PM, Timur Nurlygayanov <
>> tnurlygayanov at mirantis.com> wrote:
>>
>>> Hi team,
>>>
>>> I'm working on HA / destructive / recovery automated tests [1] for
>>> OpenStack clouds and I want to get some expectations from users, operators
>>> and developers for the speed of OpenStack recovery after some destructive
>>> actions.
>>> For example, how long cluster should be unavailable if one of three
>>> controller will be destroyed? I think that the right answer is '0 seconds,
>>> no downtime' - users shouldn't see anything strange when we lost one
>>> controller in our cloud (if it is 'true' HA configuration).
>>> In the real world I can see that such destructive scenarios require some
>>> time to recover the cloud (1-15 minutes in different cases) - and I just
>>> want to get your expectations or the requirements.
>>>
>>> How fast we can / should fully recover the cloud in the following cases:
>>> 1. Restart RabbitMQ services
>>> 2. Restart MySQL / Galera services
>>> 3. Restart Neutron services (like L3 agents)
>>> 4. Hard shutdown of any OpenStack controllers
>>> 5. Shutdown of the ethernet interfaces of management / data networks
>>>
>>> Of course, it depends on the configuration, but we can describe some
>>> common, 'expected', asseptance values (SLA) for downtime in differrent
>>> destructive cases and use them to verify the clouds today and in the future.
>>> We will use these values in HAOS project [1], which will allow to
>>> validate any clouds with the same scenarios and with the same SLA for
>>> recovery time.
>>>
>>> Any comments are welcome :)
>>> Thank you!
>>>
>>> --
>>>
>>> Timur,
>>> Senior QA Engineer
>>> OpenStack Projects
>>> Mirantis Inc
>>>
>>
>>
>>
>> --
>>
>> Timur,
>> Senior QA Engineer
>> OpenStack Projects
>> Mirantis Inc
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 

Timur,
Senior QA Engineer
OpenStack Projects
Mirantis Inc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150603/f6fb04e0/attachment.html>


More information about the OpenStack-dev mailing list