[openstack-dev] [mistral] bugfix for "Fix concurrency issues by using READ_COMMITTED" unveils / creates a different bug

KOFFMAN, Noa (Noa) noa.koffman at alcatel-lucent.com
Wed Dec 9 20:07:08 UTC 2015


Hi,

I have reproduced this issue multiple times before your fix was merged.

In order to reproduce I used a workflow with multiple async actions, and resumed all of them at the same time.

I just created a ticket in launchpad [1], with the workflow used and the mistral engine logs.

[1] - https://bugs.launchpad.net/mistral/+bug/1524477

If anyone could take a look and confirm the bug it would be great.

Thanks
Noa Koffman

________________________________
From: ELISHA, Moshe (Moshe) [moshe.elisha at alcatel-lucent.com]
Sent: Monday, December 07, 2015 6:29 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: [openstack-dev] [mistral] bugfix for "Fix concurrency issues by using READ_COMMITTED" unveils / creates a different bug

Hi all,

The current bugfix I am working on[1] have unveiled / created a bug.
Test “WorkflowResumeTest.test_resume_different_task_states” sometimes fails because “task4” is executed twice instead of once (See unit test output and workflow below).

This happens because task2 on-complete is running task4 as expected but also task3 executes task4 by mistake.

It is not consistent but it happens quite often – This happens if the unit test resumes the WF and updates action execution of task2 and finishes task2 before task3 is finished.
Scenario:


1.       Task2 in method on_action_complete – changes task2 state to RUNNING.

2.       Task3 in method on_action_complete – changes task2 state to RUNNING (before task2 calls _on_task_state_change).

3.       Task3 in “_on_task_state_change” > “continue_workflow” > “DirectWorkflowController ._find_next_commands” – it finds task2 because task2 is in SUCCESS and processed = False and “_find_next_commands_for_task(task2)” returns task4.

4.       Task3 executes command to RunTask task4.

5.       Task2 in “_on_task_state_change” > “continue_workflow” > “DirectWorkflowController ._find_next_commands” – it finds task2 because task2 is in SUCCESS and processed = False and “_find_next_commands_for_task(task2)” returns task4.

6.       Task2 executes command to RunTask task4.


[1] - https://review.openstack.org/#/c/253819/


If I am not mistaken – this issue also exists in the current code and my bugfix only made it much more often. Can you confirm?
I don’t have enough knowledge on how to fix this issue…
For now – I have modified the test_resume_different_task_states unit test to wait for task3 to be processed before updating the action execution of task2.
If you agree this bug exist today as well – we can proceed with my bugfix and open a different bug for that issue.

Thanks.



[stack at melisha-devstack mistral(keystone_admin)]$ tox -e py27 -- WorkflowResumeTest.test_resume_different_task_states
...
======================================================================
FAIL: mistral.tests.unit.engine.test_workflow_resume.WorkflowResumeTest.test_resume_different_task_states
tags: worker-0
----------------------------------------------------------------------
pythonlogging:'': {{{WARNING [oslo_db.sqlalchemy.utils] Id not in sort_keys; is sort_keys unique?}}}
stderr: {{{
/opt/stack/mistral/.tox/py27/local/lib/python2.7/site-packages/novaclient/v2/client.py:109: UserWarning: 'novaclient.v2.client.Client' is not designed to be initialized directly. It is inner class of novaclient. Please, use 'novaclient.client.Client' instead. Related lp bug-report: 1493576
  _LW("'novaclient.v2.client.Client' is not designed to be "
}}}

stdout: {{{
Engine test case exception occurred: 4 != 5
Exception type: <class 'testtools.matchers._impl.MismatchError'>

Printing workflow executions...

wb.wf1 [state=SUCCESS, output={u'__execution': {u'params': {}, u'id': u'2807dd99-ca6f-49d7-886d-7d3b79e1c49e', u'spec': {u'type': u'direct', u'name': u'wf1', u'tasks': {u'task4': {u'type': u'direct', u'name': u'task4', u'version': u'2.0', u'action': u'std.echo output="Task 4"'}, u'task2': {u'type': u'direct', u'name': u'task2', u'on-complete': [u'task4'], u'version': u'2.0', u'action': u'std.mistral_http url="http://google.com"'}, u'task3': {u'type': u'direct', u'name': u'task3', u'version': u'2.0', u'action': u'std.echo output="Task 3"'}, u'task1': {u'type': u'direct', u'name': u'task1', u'on-complete': [u'task3', u'pause'], u'version': u'2.0', u'action': u'std.echo output="Hi!"'}}, u'version': u'2.0'}, u'input': {}}, u'task4': u'Task 4', u'task3': u'Task 3', u'__tasks': {u'848c6e92-b1b1-4d54-b11d-c93cfb4fc88f': u'task2', u'00a546e7-8da9-4603-b6be-54d58b14c625': u'task1'}}]
     task2 [id=848c6e92-b1b1-4d54-b11d-c93cfb4fc88f, state=SUCCESS, published={}]
     task1 [id=00a546e7-8da9-4603-b6be-54d58b14c625, state=SUCCESS, published={}]
     task3 [id=8ce20324-7fba-4424-bcd2-1e0c9b27fd4a, state=SUCCESS, published={}]
     task4 [id=3758de43-9bc3-4ac9-b3f3-29eb543b16ef, state=SUCCESS, published={}]
     task4 [id=f12ee464-0ba5-48c7-8423-9f425a00e675, state=SUCCESS, published={}]
}}}

Traceback (most recent call last):
  File "mistral/tests/unit/engine/test_workflow_resume.py", line 321, in test_resume_different_task_states
    self.assertEqual(4, len(wf_ex.task_executions))
  File "/opt/stack/mistral/.tox/py27/local/lib/python2.7/site-packages/testtools/testcase.py", line 350, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/opt/stack/mistral/.tox/py27/local/lib/python2.7/site-packages/testtools/testcase.py", line 435, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: 4 != 5
Ran 1 tests in 15.517s (+2.591s)
FAILED (id=301, failures=1 (+1))
error: testr failed (1)
ERROR: InvocationError: '/opt/stack/mistral/.tox/py27/bin/python setup.py testr --slowest --testr-args=WorkflowResumeTest.test_resume_different_task_states'
___________________________________________________________________________________________________________________________________ summary ___________________________________________________________________________________________________________________________________
ERROR:   py27: commands failed






Unit test workflow:


---
version: '2.0'

name: wb

workflows:
  wf1:
    type: direct

    tasks:
      task1:
        action: std.echo output="Hi!"
        on-complete:
          - task3
          - pause

      task2:
        action: std.mistral_http url="http://google.com"
        # This one won't be finished when execution is already PAUSED.
        on-complete:
          - task4

      task3:
        action: std.echo output="Task 3"

      task4:
        action: std.echo output="Task 4"







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151209/c38ea6eb/attachment.html>


More information about the OpenStack-dev mailing list