[openstack-dev] [mistral] bugfix for "Fix concurrency issues by using READ_COMMITTED" unveils / creates a different bug
ELISHA, Moshe (Moshe)
moshe.elisha at alcatel-lucent.com
Mon Dec 7 16:29:44 UTC 2015
Hi all,
The current bugfix I am working on[1] have unveiled / created a bug.
Test "WorkflowResumeTest.test_resume_different_task_states" sometimes fails because "task4" is executed twice instead of once (See unit test output and workflow below).
This happens because task2 on-complete is running task4 as expected but also task3 executes task4 by mistake.
It is not consistent but it happens quite often - This happens if the unit test resumes the WF and updates action execution of task2 and finishes task2 before task3 is finished.
Scenario:
1. Task2 in method on_action_complete - changes task2 state to RUNNING.
2. Task3 in method on_action_complete - changes task2 state to RUNNING (before task2 calls _on_task_state_change).
3. Task3 in "_on_task_state_change" > "continue_workflow" > "DirectWorkflowController ._find_next_commands" - it finds task2 because task2 is in SUCCESS and processed = False and "_find_next_commands_for_task(task2)" returns task4.
4. Task3 executes command to RunTask task4.
5. Task2 in "_on_task_state_change" > "continue_workflow" > "DirectWorkflowController ._find_next_commands" - it finds task2 because task2 is in SUCCESS and processed = False and "_find_next_commands_for_task(task2)" returns task4.
6. Task2 executes command to RunTask task4.
[1] - https://review.openstack.org/#/c/253819/
If I am not mistaken - this issue also exists in the current code and my bugfix only made it much more often. Can you confirm?
I don't have enough knowledge on how to fix this issue...
For now - I have modified the test_resume_different_task_states unit test to wait for task3 to be processed before updating the action execution of task2.
If you agree this bug exist today as well - we can proceed with my bugfix and open a different bug for that issue.
Thanks.
[stack at melisha-devstack mistral(keystone_admin)]$ tox -e py27 -- WorkflowResumeTest.test_resume_different_task_states
...
======================================================================
FAIL: mistral.tests.unit.engine.test_workflow_resume.WorkflowResumeTest.test_resume_different_task_states
tags: worker-0
----------------------------------------------------------------------
pythonlogging:'': {{{WARNING [oslo_db.sqlalchemy.utils] Id not in sort_keys; is sort_keys unique?}}}
stderr: {{{
/opt/stack/mistral/.tox/py27/local/lib/python2.7/site-packages/novaclient/v2/client.py:109: UserWarning: 'novaclient.v2.client.Client' is not designed to be initialized directly. It is inner class of novaclient. Please, use 'novaclient.client.Client' instead. Related lp bug-report: 1493576
_LW("'novaclient.v2.client.Client' is not designed to be "
}}}
stdout: {{{
Engine test case exception occurred: 4 != 5
Exception type: <class 'testtools.matchers._impl.MismatchError'>
Printing workflow executions...
wb.wf1 [state=SUCCESS, output={u'__execution': {u'params': {}, u'id': u'2807dd99-ca6f-49d7-886d-7d3b79e1c49e', u'spec': {u'type': u'direct', u'name': u'wf1', u'tasks': {u'task4': {u'type': u'direct', u'name': u'task4', u'version': u'2.0', u'action': u'std.echo output="Task 4"'}, u'task2': {u'type': u'direct', u'name': u'task2', u'on-complete': [u'task4'], u'version': u'2.0', u'action': u'std.mistral_http url="http://google.com"'}, u'task3': {u'type': u'direct', u'name': u'task3', u'version': u'2.0', u'action': u'std.echo output="Task 3"'}, u'task1': {u'type': u'direct', u'name': u'task1', u'on-complete': [u'task3', u'pause'], u'version': u'2.0', u'action': u'std.echo output="Hi!"'}}, u'version': u'2.0'}, u'input': {}}, u'task4': u'Task 4', u'task3': u'Task 3', u'__tasks': {u'848c6e92-b1b1-4d54-b11d-c93cfb4fc88f': u'task2', u'00a546e7-8da9-4603-b6be-54d58b14c625': u'task1'}}]
task2 [id=848c6e92-b1b1-4d54-b11d-c93cfb4fc88f, state=SUCCESS, published={}]
task1 [id=00a546e7-8da9-4603-b6be-54d58b14c625, state=SUCCESS, published={}]
task3 [id=8ce20324-7fba-4424-bcd2-1e0c9b27fd4a, state=SUCCESS, published={}]
task4 [id=3758de43-9bc3-4ac9-b3f3-29eb543b16ef, state=SUCCESS, published={}]
task4 [id=f12ee464-0ba5-48c7-8423-9f425a00e675, state=SUCCESS, published={}]
}}}
Traceback (most recent call last):
File "mistral/tests/unit/engine/test_workflow_resume.py", line 321, in test_resume_different_task_states
self.assertEqual(4, len(wf_ex.task_executions))
File "/opt/stack/mistral/.tox/py27/local/lib/python2.7/site-packages/testtools/testcase.py", line 350, in assertEqual
self.assertThat(observed, matcher, message)
File "/opt/stack/mistral/.tox/py27/local/lib/python2.7/site-packages/testtools/testcase.py", line 435, in assertThat
raise mismatch_error
testtools.matchers._impl.MismatchError: 4 != 5
Ran 1 tests in 15.517s (+2.591s)
FAILED (id=301, failures=1 (+1))
error: testr failed (1)
ERROR: InvocationError: '/opt/stack/mistral/.tox/py27/bin/python setup.py testr --slowest --testr-args=WorkflowResumeTest.test_resume_different_task_states'
___________________________________________________________________________________________________________________________________ summary ___________________________________________________________________________________________________________________________________
ERROR: py27: commands failed
Unit test workflow:
---
version: '2.0'
name: wb
workflows:
wf1:
type: direct
tasks:
task1:
action: std.echo output="Hi!"
on-complete:
- task3
- pause
task2:
action: std.mistral_http url="http://google.com"
# This one won't be finished when execution is already PAUSED.
on-complete:
- task4
task3:
action: std.echo output="Task 3"
task4:
action: std.echo output="Task 4"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151207/163ff4a4/attachment.html>
More information about the OpenStack-dev
mailing list