[openstack-dev] [Trove] Guest prepare call polling mechanism issue

Denis Makogon dmakogon at mirantis.com
Wed Jul 23 21:16:16 UTC 2014


On Wed, Jul 23, 2014 at 7:33 PM, Tim Simpson <tim.simpson at rackspace.com>
wrote:

>  To summarize, this is a conversation about the following LaunchPad bug:
> https://launchpad.net/bugs/1325512
> and Gerrit review: https://review.openstack.org/#/c/97194/6
>
>  You are saying the function "_service_is_active" in addition to polling
> the datastore service status also polls the status of the Nova resource. At
> first I thought this wasn't the case, however looking at your pull request
> I was surprised to see on line 320 (
> https://review.openstack.org/#/c/97194/6/trove/taskmanager/models.py)
> polls Nova using the "get" method (which I wish was called "refresh" as to
> me it sounds like a lazy-loader or something despite making a full GET
> request each time).
> So moving this polling out of there into the two respective
> "create_server" methods as you have done is not only going to be useful for
> Heat and avoid the issue of calling Nova 99 times you describe but it will
> actually help operations teams to see more clearly that the issue was with
> a server that didn't provision. We actually had an issue in Staging the
> other day that took us forever to figure out because the
>

Agreed, i guess i would need to update bug-report to add more info about
given issue, but i'm really glad to hear that proposed change would be
useful. And i agree, that from operation/support team would be useful to
track provisioning issues that has nothing common with Trove but tight to
infrastructure.


> server wasn't provisioning, but before anything checked that it was ACTIVE
> the DNS code detected the server had no ip address (never mind it was in a
> FAILED state) so the logs surfaced this as a DNS error. This change should
> help us avoid such issues.
>
>  Thanks,
>
>  Tim
>
>
>  ------------------------------
> *From:* Denis Makogon [dmakogon at mirantis.com]
> *Sent:* Wednesday, July 23, 2014 7:30 AM
> *To:* OpenStack Development Mailing List
> *Subject:* [openstack-dev] [Trove] Guest prepare call polling mechanism
> issue
>
>    Hello, Stackers.
>
>
>  I’d like to discuss guestagent prepare call polling mechanism issue (see
> [1]).
>
>  Let me first describe why this is actually an issue and why it should be
> fixed. For those of you who is familiar with Trove knows that Trove can
> provision instances through Nova API and Heat API (see [2] and see [3]).
>
>
>
>     What’s the difference between this two ways (in general)? The answer
> is simple:
>
> - Heat-based provisioning method has polling mechanism that verifies that
> stack provisioning was completed with successful state (see [4]) which
> means that all stack resources are in ACTIVE state.
>
> - Nova-based provisioning method doesn’t do any polling (which is wrong,
> since instance can’t fail as fast as possible because Trove-taskmanager
> service doesn’t verify that launched server had reached ACTIVE state.
> That’s the issue #1 - compute instance state is unknown, but right after
> resources (deliverd by heat) already in ACTIVE states.
>
>  Once one method [2] or [3] finished, taskmanager trying to prepare data
> for guest (see [5]) and then it tries to send prepare call to guest (see
> [6]). Here comes issue #2 - polling mechanism does at least 100 API calls
> to Nova to define compute instance status.
>
> Also taskmanager does almost the same amount of calls to Trove backend to
> discover guest status which is totally normal.
>
>      So, here comes the question,  why should i call 99 times Nova for
> the same value if the value asked for the first time was completely
> acceptable?
>
>
>
>     There’s only one way to fix it. Since heat-based provisioning
> delivers instance with status validation procedure, the same thing should
> be done for nova-base provisioning (we should extract compute instance
> status polling from guest prepare polling mechanism and integrate it into
> [2]) and leave only guest status discovering in guest prepare polling
> mechanism.
>
>
>
>
>  Benefits? Proposed fix will give an ability for fast-failing for
> corrupted instances, it would reduce amount of redundant Nova API calls
> while attempting to discover guest status.
>
>
>  Proposed fix for this issue - [7].
>
>  [1] - https://launchpad.net/bugs/1325512
>
> [2] -
> https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L198-L215
>
> [3] -
> https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L190-L197
>
> [4] -
> https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L420-L429
>
> [5] -
> https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L217-L256
>
> [6] -
> https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L254-L266
>
> [7] - https://review.openstack.org/#/c/97194/
>
>
>  Thoughts?
>
>  Best regards,
>
> Denis Makogon
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140724/a5a27f99/attachment.html>


More information about the OpenStack-dev mailing list