[openstack-dev] [puppet][Fuel] OpenstackLib Client Provider Better Exception Handling
Matt Fischer
matt at mattfischer.com
Thu Oct 15 01:42:26 UTC 2015
On Thu, Oct 8, 2015 at 5:38 AM, Vladimir Kuklin <vkuklin at mirantis.com>
wrote:
> Hi, folks
>
> * Intro
>
> Per our discussion at Meeting #54 [0] I would like to propose the uniform
> approach of exception handling for all puppet-openstack providers accessing
> any types of OpenStack APIs.
>
> * Problem Description
>
> While working on Fuel during deployment of multi-node HA-aware
> environments we faced many intermittent operational issues, e.g.:
>
> 401/403 authentication failures when we were doing scaling of OpenStack
> controllers due to difference in hashing view between keystone instances
> 503/502/504 errors due to temporary connectivity issues
> non-idempotent operations like deletion or creation - e.g. if you are
> deleting an endpoint and someone is deleting on the other node and you get
> 404 - you should continue with success instead of failing. 409 Conflict
> error should also signal us to re-fetch resource parameters and then decide
> what to do with them.
>
> Obviously, it is not optimal to rerun puppet to correct such errors when
> we can just handle an exception properly.
>
> * Current State of Art
>
> There is some exception handling, but it does not cover all the
> aforementioned use cases.
>
> * Proposed solution
>
> Introduce a library of exception handling methods which should be the same
> for all puppet openstack providers as these exceptions seem to be generic.
> Then, for each of the providers we can introduce provider-specific
> libraries that will inherit from this one.
>
> Our mos-puppet team could add this into their backlog and could work on
> that in upstream or downstream and propose it upstream.
>
> What do you think on that, puppet folks?
>
> [0]
> http://eavesdrop.openstack.org/meetings/puppet_openstack/2015/puppet_openstack.2015-10-06-15.00.html
>
I think that we should look into some solutions here as I'm generally for
something we can solve once and re-use. Currently we solve some of this at
TWC by serializing our deploys and disabling puppet site wide while we do
so. This avoids the issue of Keystone on one node removing and endpoint
while the other nodes (who still have old code) keep trying to add it back.
For connectivity issues especially after service restarts, we're using
puppet-healthcheck [0] and I'd like to discuss that more in Tokyo as an
alternative to explicit retries and delays. It's in the etherpad so
hopefully you can attend.
[0] - https://github.com/puppet-community/puppet-healthcheck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151014/893b468d/attachment.html>
More information about the OpenStack-dev
mailing list