<div dir="ltr">I am working on NFV orchestrator based on MANO<div><br></div><div>Regards,</div><div>Kanthi</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jun 2, 2016 at 3:00 AM, Joshua Harlow <span dir="ltr"><<a href="mailto:harlowja@fastmail.com" target="_blank">harlowja@fastmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Interesting way to combine taskflow + celery.<br>
<br>
I didn't expect it to be used like this, but the more power to you!<br>
<br>
Taskflow itself has some similar capabilities via <a href="http://docs.openstack.org/developer/taskflow/workers.html#design" rel="noreferrer" target="_blank">http://docs.openstack.org/developer/taskflow/workers.html#design</a> but anyway what u've done is pretty neat as well.<br>
<br>
I am assuming this isn't an openstack project (due to usage of celery), any details on what's being worked on (am curious here)?<br>
<br>
pnkk wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
Thanks for the nice documentation.<br>
<br>
To my knowledge celery is widely used for distributed task processing.<br>
This fits our requirement perfectly where we want to return immediate<br>
response to the user from our API server and run long running task in<br>
background. Celery also gives flexibility with the worker<br>
types(process(can overcome GIL problems too)/evetlet...) and it also<br>
provides nice message brokers(rabbitmq,redis...)<br>
<br>
We used both celery and taskflow for our core processing to leverage the<br>
benefits of both. Taskflow provides nice primitives like(execute,<br>
revert, pre,post stuf) which takes off the load from the application.<br>
<br>
As far as the actual issue is concerned, I found one way to solve it by<br>
using celery "retry" option. This along with late_acks makes the<br>
application highly fault tolerant.<br>
<br>
<a href="http://docs.celeryproject.org/en/latest/faq.html#faq-acks-late-vs-retry" rel="noreferrer" target="_blank">http://docs.celeryproject.org/en/latest/faq.html#faq-acks-late-vs-retry</a><br>
<br>
Regards,<br>
Kanthi<br>
<br>
<br>
On Sat, May 28, 2016 at 1:51 AM, Joshua Harlow <<a href="mailto:harlowja@fastmail.com" target="_blank">harlowja@fastmail.com</a><br></span><div><div class="h5">
<mailto:<a href="mailto:harlowja@fastmail.com" target="_blank">harlowja@fastmail.com</a>>> wrote:<br>
<br>
Seems like u could just use<br>
<a href="http://docs.openstack.org/developer/taskflow/jobs.html" rel="noreferrer" target="_blank">http://docs.openstack.org/developer/taskflow/jobs.html</a> (it appears<br>
that you may not be?); the job itself would when failed be then<br>
worked on by a different job consumer.<br>
<br>
Have u looked at those? It almost appears that u are using celery as<br>
a job distribution system (similar to the jobs.html link mentioned<br>
above)? Is that somewhat correct (I haven't seen anyone try this,<br>
wondering how u are using it and the choices that directed u to<br>
that, aka, am curious)?<br>
<br>
-Josh<br>
<br>
pnkk wrote:<br>
<br>
To be specific, we hit this issue when the node running our<br>
service is<br>
rebooted.<br>
Our solution is designed in a way that each and every job is a<br>
celery<br>
task and inside celery task, we create taskflow flow.<br>
<br>
We enabled late_acks in celery(uses rabbitmq as message broker),<br>
so if<br>
our service/node goes down, other healthy service can pick the<br>
job and<br>
completes it.<br>
This works fine, but we just hit this rare case where the node was<br>
rebooted just when taskflow is updating something to the database.<br>
<br>
In this case, it raises an exception and the job is marked<br>
failed. Since<br>
it is complete(with failure), message is removed from the<br>
rabbitmq and<br>
other worker would not be able to process it.<br>
Can taskflow handle such I/O errors gracefully or should<br>
application try<br>
to catch this exception? If application has to handle it what would<br>
happen to that particular database transaction which failed just<br>
when<br>
the node is rebooted? Who will retry this transaction?<br>
<br>
Thanks,<br>
Kanthi<br>
<br>
On Fri, May 27, 2016 at 5:39 PM, pnkk <<a href="mailto:pnkk2016@gmail.com" target="_blank">pnkk2016@gmail.com</a><br>
<mailto:<a href="mailto:pnkk2016@gmail.com" target="_blank">pnkk2016@gmail.com</a>><br></div></div><div><div class="h5">
<mailto:<a href="mailto:pnkk2016@gmail.com" target="_blank">pnkk2016@gmail.com</a> <mailto:<a href="mailto:pnkk2016@gmail.com" target="_blank">pnkk2016@gmail.com</a>>>> wrote:<br>
<br>
Hi,<br>
<br>
When taskflow engine is executing a job, the execution<br>
failed due to<br>
IO error(traceback pasted below).<br>
<br>
2016-05-25 19:45:21.717 7119 ERROR<br>
taskflow.engines.action_engine.engine 127.0.1.1 [-] Engine<br>
execution has failed, something bad must of happened (last 10<br>
machine transitions were [('SCHEDULING', 'WAITING'),<br>
('WAITING',<br>
'ANALYZING'), ('ANALYZING', 'SCHEDULING'), ('SCHEDULING',<br>
'WAITING'), ('WAITING', 'ANALYZING'), ('ANALYZING', 'SCHEDULING'),<br>
('SCHEDULING', 'WAITING'), ('WAITING', 'ANALYZING'),<br>
('ANALYZING',<br>
'GAME_OVER'), ('GAME_OVER', 'FAILURE')])<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine Traceback (most<br>
recent call last):<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py",<br>
line 269, in run_iter<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine<br>
failure.Failure.reraise_if_any(memory.failures)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/types/failure.py",<br>
line 336, in reraise_if_any<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine failures[0].reraise()<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/types/failure.py",<br>
line 343, in reraise<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine<br>
six.reraise(*self._exc_info)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/engines/action_engine/scheduler.py",<br>
line 94, in schedule<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine<br>
futures.add(scheduler.schedule(atom))<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/engines/action_engine/scheduler.py",<br>
line 67, in schedule<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine return<br>
self._task_action.schedule_execution(task)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/engines/action_engine/actions/task.py",<br>
line 99, in schedule_execution<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine<br>
self.change_state(task,<br>
states.RUNNING, progress=0.0)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/engines/action_engine/actions/task.py",<br>
line 67, in change_state<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine<br>
self._storage.set_atom_state(<a href="http://task.name" rel="noreferrer" target="_blank">task.name</a> <<a href="http://task.name" rel="noreferrer" target="_blank">http://task.name</a>><br>
<<a href="http://task.name" rel="noreferrer" target="_blank">http://task.name</a>>, state)<br>
<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/fasteners/lock.py",<br>
line 85, in wrapper<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine return f(self, *args,<br>
**kwargs)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/storage.py",<br>
line 486, in set_atom_state<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine<br>
self._with_connection(self._save_atom_detail, source, clone)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/storage.py",<br>
line 341, in _with_connection<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine return functor(conn,<br>
*args, **kwargs)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/storage.py",<br>
line 471, in _save_atom_detail<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine<br>
<br>
original_atom_detail.update(conn.update_atom_details(atom_detail))<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/persistence/backends/impl_sqlalchemy.py",<br>
line 427, in update_atom_details<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine row =<br>
conn.execute(q).first()<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py",<br>
line 914, in execute<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine return meth(self,<br>
multiparams, params)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py",<br>
line 323, in _execute_on_connection<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine return<br>
connection._execute_clauseelement(self, multiparams, params)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py",<br>
line 1003, in _execute_clauseelement<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine<br>
inline=len(distilled_params) > 1)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File "<string>",<br>
line 1, in<br>
<lambda><br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py",<br>
line 494, in compile<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine return<br>
self._compiler(dialect, bind=bind, **kw)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py",<br>
line 500, in _compiler<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine return<br>
dialect.statement_compiler(dialect, self, **kw)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py",<br>
line 392, in __init__<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine<br>
Compiled.__init__(self,<br>
dialect, statement, **kwargs)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py",<br>
line 190, in __init__<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine self.string =<br>
self.process(self.statement, **compile_kwargs)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py",<br>
line 213, in process<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine return<br>
obj._compiler_dispatch(self, **kwargs)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/visitors.py",<br>
line 81, in _compiler_dispatch<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine return meth(self,<br>
**kw)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py",<br>
line 1579, in visit_select<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine for name, column in<br>
select._columns_plus_names<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py",<br>
line 1347, in _label_select_column<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine<br>
add_to_result_map=add_to_result_map<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/celery/apps/worker.py",<br>
line 288, in _handle_request<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine safe_say('worker: {0}<br>
shutdown (MainProcess)'.format(how))<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine File<br>
"/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/celery/apps/worker.py",<br>
line 73, in safe_say<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine<br>
print('\n{0}'.format(msg),<br>
file=sys.__stderr__)<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine IOError: [Errno 5]<br>
Input/output error<br>
2016-05-25 19:45:21.717 7119 TRACE<br>
taskflow.engines.action_engine.engine<br>
<br>
There could be a transient network issue which prevents<br>
taskflow<br>
from reaching the mysql node.<br>
Can you please suggest a graceful way of handling it and<br>
continue<br>
processing the execution?<br>
<br>
Thanks,<br>
Kanthi<br>
<br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe:<br>
<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br></div></div>
<<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><span class=""><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe:<br>
<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br></span>
<<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><span class=""><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</span></blockquote><div class="HOEnZb"><div class="h5">
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</div></div></blockquote></div><br></div>