<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Oct 8, 2015 at 5:38 AM, Vladimir Kuklin <span dir="ltr"><<a href="mailto:vkuklin@mirantis.com" target="_blank">vkuklin@mirantis.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Hi, folks<div><br></div><div>* Intro</div><div><br></div><div>Per our discussion at Meeting #54 [0] I would like to propose the uniform approach of exception handling for all puppet-openstack providers accessing any types of OpenStack APIs.</div><div><br></div><div>* Problem Description</div><div><br></div><div>While working on Fuel during deployment of multi-node HA-aware environments we faced many intermittent operational issues, e.g.:</div><div><br></div><div>401/403 authentication failures when we were doing scaling of OpenStack controllers due to difference in hashing view between keystone instances</div><div>503/502/504 errors due to temporary connectivity issues</div><div>non-idempotent operations like deletion or creation - e.g. if you are deleting an endpoint and someone is deleting on the other node and you get 404 - you should continue with success instead of failing. 409 Conflict error should also signal us to re-fetch resource parameters and then decide what to do with them.</div><div><br></div><div>Obviously, it is not optimal to rerun puppet to correct such errors when we can just handle an exception properly.</div><div><br></div><div>* Current State of Art</div><div><br></div><div>There is some exception handling, but it does not cover all the aforementioned use cases.</div><div><br></div><div>* Proposed solution</div><div><br></div><div>Introduce a library of exception handling methods which should be the same for all puppet openstack providers as these exceptions seem to be generic. Then, for each of the providers we can introduce provider-specific libraries that will inherit from this one.</div><div><br></div><div>Our mos-puppet team could add this into their backlog and could work on that in upstream or downstream and propose it upstream.</div><div><br></div><div>What do you think on that, puppet folks?<br clear="all"><div><br></div><div>[0] <a href="http://eavesdrop.openstack.org/meetings/puppet_openstack/2015/puppet_openstack.2015-10-06-15.00.html" target="_blank">http://eavesdrop.openstack.org/meetings/puppet_openstack/2015/puppet_openstack.2015-10-06-15.00.html</a></div></div></div></blockquote><div><br></div><div>I think that we should look into some solutions here as I'm generally for something we can solve once and re-use. Currently we solve some of this at TWC by serializing our deploys and disabling puppet site wide while we do so. This avoids the issue of Keystone on one node removing and endpoint while the other nodes (who still have old code) keep trying to add it back.</div><div><br></div><div>For connectivity issues especially after service restarts, we're using puppet-healthcheck [0] and I'd like to discuss that more in Tokyo as an alternative to explicit retries and delays. It's in the etherpad so hopefully you can attend.</div><div><br></div><div>[0] - <a href="https://github.com/puppet-community/puppet-healthcheck">https://github.com/puppet-community/puppet-healthcheck</a></div></div><br></div></div>