[openstack-dev] [puppet][Fuel] OpenstackLib Client Provider Better Exception Handling
Gilles Dubreuil
gilles at redhat.com
Fri Oct 16 00:44:53 UTC 2015
On 15/10/15 21:10, Vladimir Kuklin wrote:
> Gilles,
>
> 5xx errors like 503 and 502/504 could always be intermittent operational
> issues. E.g. when you access your keystone backends through some proxy
> and there is a connectivity issue between the proxy and backends which
> disappears in 10 seconds, you do not need to rerun the puppet completely
> - just retry the request.
>
Look, I don't have much experience with those errors in real case
scenarios. And this is just a details for my understanding, those
errors are coming from a running HTTP service, therefore this is not a
connectivity issue to the service but something wrong beyond that.
> Regarding "REST interfaces for all Openstack API" - this is very close
> to another topic that I raised ([0]) - using native Ruby application and
> handle the exceptions. Otherwise whenever we have an OpenStack client
> (generic or neutron/glance/etc. one) sending us a message like '[111]
> Connection refused' this message is very much determined by the
> framework that OpenStack is using within this release for clients. It
> could be `requests` or any other type of framework which sends different
> text message depending on its version. So it is very bothersome to write
> a bunch of 'if' clauses or gigantic regexps instead of handling simple
> Ruby exception. So I agree with you here - we need to work with the API
> directly. And, by the way, if you also support switching to native Ruby
> OpenStack API client, please feel free to support movement towards it in
> the thread [0]
>
Yes, I totally agree with you on that approach (native Ruby lib).
This why I mentioned it here because for me the exception handling would
be solved at once.
> Matt and Gilles,
>
> Regarding puppet-healthcheck - I do not think that puppet-healtcheck
> handles exactly what I am mentioning here - it is not running exactly at
> the same time as we run the request.
>
> E.g. 10 seconds ago everything was OK, then we had a temporary
> connectivity issue, then everything is ok again in 10 seconds. Could you
> please describe how puppet-healthcheck can help us solve this problem?
>
> Or another example - there was an issue with keystone accessing token
> database when you have several keystone instances running, or there was
> some desync between these instances, e.g. you fetched the token at
> keystone #1 and then you verify it again keystone #2. Keystone #2 had
> some issues verifying it not due to the fact that token was bad, but due
> to the fact that that keystone #2 had some issues. We would get 401
> error and instead of trying to rerun the puppet we would need just to
> handle this issue locally by retrying the request.
>
> [0] http://permalink.gmane.org/gmane.comp.cloud.openstack.devel/66423
>
> On Thu, Oct 15, 2015 at 12:23 PM, Gilles Dubreuil <gilles at redhat.com
> <mailto:gilles at redhat.com>> wrote:
>
>
>
> On 15/10/15 12:42, Matt Fischer wrote:
> >
> >
> > On Thu, Oct 8, 2015 at 5:38 AM, Vladimir Kuklin <vkuklin at mirantis.com <mailto:vkuklin at mirantis.com>
> > <mailto:vkuklin at mirantis.com <mailto:vkuklin at mirantis.com>>> wrote:
> >
> > Hi, folks
> >
> > * Intro
> >
> > Per our discussion at Meeting #54 [0] I would like to propose the
> > uniform approach of exception handling for all puppet-openstack
> > providers accessing any types of OpenStack APIs.
> >
> > * Problem Description
> >
> > While working on Fuel during deployment of multi-node HA-aware
> > environments we faced many intermittent operational issues, e.g.:
> >
> > 401/403 authentication failures when we were doing scaling of
> > OpenStack controllers due to difference in hashing view between
> > keystone instances
> > 503/502/504 errors due to temporary connectivity issues
>
> The 5xx errors are not connectivity issues:
>
> 500 Internal Server Error
> 501 Not Implemented
> 502 Bad Gateway
> 503 Service Unavailable
> 504 Gateway Timeout
> 505 HTTP Version Not Supported
>
> I believe nothing should be done to trap them.
>
> The connectivity issues are different matter (to be addressed as
> mentioned by Matt)
>
> > non-idempotent operations like deletion or creation - e.g. if you
> > are deleting an endpoint and someone is deleting on the other node
> > and you get 404 - you should continue with success instead of
> > failing. 409 Conflict error should also signal us to re-fetch
> > resource parameters and then decide what to do with them.
> >
> > Obviously, it is not optimal to rerun puppet to correct such errors
> > when we can just handle an exception properly.
> >
> > * Current State of Art
> >
> > There is some exception handling, but it does not cover all the
> > aforementioned use cases.
> >
> > * Proposed solution
> >
> > Introduce a library of exception handling methods which should be
> > the same for all puppet openstack providers as these exceptions seem
> > to be generic. Then, for each of the providers we can introduce
> > provider-specific libraries that will inherit from this one.
> >
> > Our mos-puppet team could add this into their backlog and could work
> > on that in upstream or downstream and propose it upstream.
> >
> > What do you think on that, puppet folks?
> >
>
> The real issue is because we're dealing with openstackclient, a CLI tool
> and not an API. Therefore no error propagation is expected.
>
> Using REST interfaces for all Openstack API would provide all HTTP
> errors:
>
> Check for "HTTP Response Classes" in
> http://ruby-doc.org/stdlib-2.2.3/libdoc/net/http/rdoc/Net/HTTP.html
>
>
> > [0] http://eavesdrop.openstack.org/meetings/puppet_openstack/2015/puppet_openstack.2015-10-06-15.00.html
> >
> >
> > I think that we should look into some solutions here as I'm generally
> > for something we can solve once and re-use. Currently we solve some of
> > this at TWC by serializing our deploys and disabling puppet site wide
> > while we do so. This avoids the issue of Keystone on one node removing
> > and endpoint while the other nodes (who still have old code) keep trying
> > to add it back.
> >
> > For connectivity issues especially after service restarts, we're using
> > puppet-healthcheck [0] and I'd like to discuss that more in Tokyo as an
> > alternative to explicit retries and delays. It's in the etherpad so
> > hopefully you can attend.
>
> +1
>
> >
> > [0] - https://github.com/puppet-community/puppet-healthcheck
> >
> >
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> --
> Yours Faithfully,
> Vladimir Kuklin,
> Fuel Library Tech Lead,
> Mirantis, Inc.
> +7 (495) 640-49-04
> +7 (926) 702-39-68
> Skype kuklinvv
> 35bk3, Vorontsovskaya Str.
> Moscow, Russia,
> www.mirantis.com <http://www.mirantis.ru/>
> www.mirantis.ru <http://www.mirantis.ru/>
> vkuklin at mirantis.com <mailto:vkuklin at mirantis.com>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list