<div dir="ltr">To be specific, we hit this issue when the node running our service is rebooted.<div>Our solution is designed in a way that each and every job is a celery task and inside celery task, we create taskflow flow.</div><div><br></div><div>We enabled late_acks in celery(uses rabbitmq as message broker), so if our service/node goes down, other healthy service can pick the job and completes it.</div><div>This works fine, but we just hit this rare case where the node was rebooted just when taskflow is updating something to the database.</div><div><br></div><div>In this case, it raises an exception and the job is marked failed. Since it is complete(with failure), message is removed from the rabbitmq and other worker would not be able to process it.</div><div>Can taskflow handle such I/O errors gracefully or should application try to catch this exception? If application has to handle it what would happen to that particular database transaction which failed just when the node is rebooted? Who will retry this transaction?</div><div><br></div><div>Thanks,</div><div>Kanthi</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 27, 2016 at 5:39 PM, pnkk <span dir="ltr"><<a href="mailto:pnkk2016@gmail.com" target="_blank">pnkk2016@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi,<div><br></div><div>When taskflow engine is executing a job, the execution failed due to IO error(traceback pasted below).</div><div><br></div><div>2016-05-25 19:45:21.717 7119 ERROR taskflow.engines.action_engine.engine 127.0.1.1 [-]  Engine execution has failed, something bad must of happened (last 10 machine transitions were [('SCHEDULING', 'WAITING'), ('WAITING', 'ANALYZING'), ('ANALYZING', 'SCHEDULING'), ('SCHEDULING', 'WAITING'), ('WAITING', 'ANALYZING'), ('ANALYZING', 'SCHEDULING'), ('SCHEDULING', 'WAITING'), ('WAITING', 'ANALYZING'), ('ANALYZING', 'GAME_OVER'), ('GAME_OVER', 'FAILURE')])<br></div><div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine Traceback (most recent call last):</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py", line 269, in run_iter</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     failure.Failure.reraise_if_any(memory.failures)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/types/failure.py", line 336, in reraise_if_any</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     failures[0].reraise()</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/types/failure.py", line 343, in reraise</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     six.reraise(*self._exc_info)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/engines/action_engine/scheduler.py", line 94, in schedule</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     futures.add(scheduler.schedule(atom))</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/engines/action_engine/scheduler.py", line 67, in schedule</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     return self._task_action.schedule_execution(task)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/engines/action_engine/actions/task.py", line 99, in schedule_execution</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     self.change_state(task, states.RUNNING, progress=0.0)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/engines/action_engine/actions/task.py", line 67, in change_state</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     self._storage.set_atom_state(<a href="http://task.name" target="_blank">task.name</a>, state)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/fasteners/lock.py", line 85, in wrapper</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     return f(self, *args, **kwargs)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/storage.py", line 486, in set_atom_state</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     self._with_connection(self._save_atom_detail, source, clone)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/storage.py", line 341, in _with_connection</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     return functor(conn, *args, **kwargs)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/storage.py", line 471, in _save_atom_detail</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     original_atom_detail.update(conn.update_atom_details(atom_detail))</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/persistence/backends/impl_sqlalchemy.py", line 427, in update_atom_details</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     row = conn.execute(q).first()</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     return meth(self, multiparams, params)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     return connection._execute_clauseelement(self, multiparams, params)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1003, in _execute_clauseelement</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     inline=len(distilled_params) > 1)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "<string>", line 1, in <lambda></div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 494, in compile</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     return self._compiler(dialect, bind=bind, **kw)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 500, in _compiler</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     return dialect.statement_compiler(dialect, self, **kw)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 392, in __init__</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     Compiled.__init__(self, dialect, statement, **kwargs)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 190, in __init__</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     self.string = self.process(self.statement, **compile_kwargs)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 213, in process</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     return obj._compiler_dispatch(self, **kwargs)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/visitors.py", line 81, in _compiler_dispatch</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     return meth(self, **kw)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 1579, in visit_select</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     for name, column in select._columns_plus_names</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 1347, in _label_select_column</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     add_to_result_map=add_to_result_map</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/celery/apps/worker.py", line 288, in _handle_request</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     safe_say('worker: {0} shutdown (MainProcess)'.format(how))</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine   File "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/celery/apps/worker.py", line 73, in safe_say</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine     print('\n{0}'.format(msg), file=sys.__stderr__)</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine IOError: [Errno 5] Input/output error</div><div>2016-05-25 19:45:21.717 7119 TRACE taskflow.engines.action_engine.engine</div></div><div><br></div><div>There could be a transient network issue which prevents taskflow from reaching the mysql node.<br></div><div>Can you please suggest a graceful way of handling it and continue processing the execution?<br></div><div><br></div><div>Thanks,</div><div>Kanthi</div><div><br></div></div>
</blockquote></div><br></div>