[openstack-dev] [Trove] Guest prepare call polling mechanism issue

Denis Makogon dmakogon at mirantis.com
Wed Jul 23 12:30:29 UTC 2014


Hello, Stackers.


I’d like to discuss guestagent prepare call polling mechanism issue (see
[1]).

Let me first describe why this is actually an issue and why it should be
fixed. For those of you who is familiar with Trove knows that Trove can
provision instances through Nova API and Heat API (see [2] and see [3]).



    What’s the difference between this two ways (in general)? The answer is
simple:

- Heat-based provisioning method has polling mechanism that verifies that
stack provisioning was completed with successful state (see [4]) which
means that all stack resources are in ACTIVE state.

- Nova-based provisioning method doesn’t do any polling (which is wrong,
since instance can’t fail as fast as possible because Trove-taskmanager
service doesn’t verify that launched server had reached ACTIVE state.
That’s the issue #1 - compute instance state is unknown, but right after
resources (deliverd by heat) already in ACTIVE states.

Once one method [2] or [3] finished, taskmanager trying to prepare data for
guest (see [5]) and then it tries to send prepare call to guest (see [6]).
Here comes issue #2 - polling mechanism does at least 100 API calls to Nova
to define compute instance status.

Also taskmanager does almost the same amount of calls to Trove backend to
discover guest status which is totally normal.

    So, here comes the question,  why should i call 99 times Nova for the
same value if the value asked for the first time was completely acceptable?



    There’s only one way to fix it. Since heat-based provisioning delivers
instance with status validation procedure, the same thing should be done
for nova-base provisioning (we should extract compute instance status
polling from guest prepare polling mechanism and integrate it into [2]) and
leave only guest status discovering in guest prepare polling mechanism.




Benefits? Proposed fix will give an ability for fast-failing for corrupted
instances, it would reduce amount of redundant Nova API calls while
attempting to discover guest status.


Proposed fix for this issue - [7].

[1] - https://launchpad.net/bugs/1325512

[2] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L198-L215

[3] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L190-L197

[4] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L420-L429

[5] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L217-L256

[6] -
https://github.com/openstack/trove/blob/master/trove/taskmanager/models.py#L254-L266

[7] - https://review.openstack.org/#/c/97194/


Thoughts?

Best regards,

Denis Makogon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140723/b1312fa6/attachment.html>


More information about the OpenStack-dev mailing list