Open Stack

Mon Aug 29 16:52:01 UTC 2011

I would think we have enough tracking information to support the goal of
identifying failures. In any scenario, some of the failures will simply be
unrecoverable. 

Regarding the process crashing, who's to say the retry process also
wouldn't crash? We could endlessly argue the arbiter/watchdog processes
will crash at each tier. As such, I think it's better to say we need a
simpler mechanism for identifying failures and perhaps a best-effort
retry. 

Retrying can be scary, to say the least. You can't possibly handle all of
the possible failure scenarios, and some of the ones you think you can
might be different in subtle ways such that retrying them only causes more
issues.

I agree with Lamar that we could make things significantly more reliable,
and I think that's where we should start. We may find that, after some
stabilization work, the failure rate is acceptably low and any retry
mechanism is no longer required.

On 8/29/11 11:24 AM, "Kevin L. Mitchell" <kevin.mitchell at rackspace.com>
wrote:

>On Fri, 2011-08-26 at 23:10 +0000, Monsyne Dragon wrote:
>> First off, I think it would be better if whatever had the failure
>> responded by sending a request somewhere (a cast) to say "Hey, this
>> bombed. Retry it. "
>
>What if the failure was due to the process crashing, so that it can't
>possibly send a request/cast off for retry?
>-- 
>Kevin L. Mitchell <kevin.mitchell at rackspace.com>
>
>This email may include confidential information. If you received it in
>error, please delete it.
>_______________________________________________
>Mailing list: https://launchpad.net/~openstack
>Post to     : openstack at lists.launchpad.net
>Unsubscribe : https://launchpad.net/~openstack
>More help   : https://help.launchpad.net/ListHelp

This email may include confidential information. If you received it in error, please delete it.

Open Stack

[Openstack] New nova service proposal

OpenStack

Community

Documentation

Branding & Legal