[openstack-dev] [Mistral] Proposal for the Resume Feature

BORTMAN, Limor (Limor) limor.bortman at alcatel-lucent.com
Tue Jun 16 05:25:12 UTC 2015


+1,
I just have one question. Do we want to able resume for WF  in error state?
I mean isn't real "resume" it should be more of a rerun, don't you think?
So in an error state we will create new executor and just re run it
Thanks Limor



-----Original Message-----
From: Lingxian Kong [mailto:anlin.kong at gmail.com] 
Sent: Tuesday, June 16, 2015 5:47 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Mistral] Proposal for the Resume Feature

Thanks Winson for the write-up, very detailed infomation. (the format was good)

I'm totally in favor of your idea, actually, I really think you proposal is complementary to my proposal in https://etherpad.openstack.org/p/vancouver-2015-design-summit-mistral,
please see 'Workflow rollback/recovery' section.

What I wanna do is configure some 'checkpoints' throughout the workflow, and if some task failed, we could rollback the execution to some checkpoint, and resume the whole workflow after we have fixed some problem, seems like the execution has never been failed before.

It's just a initial idea, I'm waiting for our discussion to see if it really makes sense to users, to get feedback, then we can talk about the implementation and cooperation.

On Tue, Jun 16, 2015 at 7:51 AM, W Chan <m4d.coder at gmail.com> wrote:
> Resending to see if this fixes the formatting for outlines below.
>
>
> I want to continue the discussion on the workflow "resume" feature.
>
>
> Resuming from our last conversation @
> http://lists.openstack.org/pipermail/openstack-dev/2015-March/060265.h
> tml. I don't think we should limit how users resume. There may be 
> different possible scenarios. User can fix the environment or 
> condition that led to the failure of the current task and the user 
> wants to just re-run the failed task.  Or user can actually fix the 
> environment/condition which include fixing what the task was doing, 
> then just want to continue the next set of task(s).
>
>
> The following is a list of proposed changes.
>
>
> 1. A new CLI operation to resume WF (i.e. mistral workflow-resume).
>
>     A. If no additional info is provided, assume this WF is manually 
> paused and there are no task/action execution errors. The WF state is 
> updated to RUNNING. Update using the put method @ 
> ExecutionsController. The put method checks that there's no task/action execution errors.
>
>     B. If WF is in an error state
>
>         i. To resume from failed task, the workflow-resume command 
> requires the WF execution ID, task name, and/or task input.
>
>         ii. To resume from failed with-items task
>
>             a. Re-run the entire task (re-run all items) requires WF
> execution ID, task name             and/or task input.
>
>             b. Re-run a single item requires WF execution ID, task 
> name, with-items index, and/or task input for the item.
>
>             c. Re-run selected items requires WF execution ID, task 
> name, with-items indices, and/or task input for each items.
>
>                 - To resume from the next task(s), the workflow-resume 
> command requires the WF execution ID, failed task name, output for the 
> failed task, and a flag to skip the failed task.
>
>
> 2. Make ERROR -> RUNNING as valid state transition @ 
> is_valid_transition function.
>
>
> 3. Add a comments field to Execution model. Add a note that indicates 
> the execution is launched by workflow-resume. Auto-populated in this case.
>
>
> 4. Resume from failed task.
>
>     A. Re-run task with the same task inputs >> POST new action 
> execution for the task execution @ ActionExecutionsController
>
>     B. Re-run task with different task inputs >> POST new action 
> execution for the task execution, allowed for different input @ 
> ActionExecutionsController
>
>
> 5. Resume from next task(s).
>
>     A. Inject a noop task execution or noop action execution 
> (undecided yet) for the failed task with appropriate output.  The spec 
> is an adhoc spec that copies conditions from the failed task. This 
> provides some audit functionality and should trigger the next set of 
> task executions (in case of branching and such).
>
>
>
> ______________________________________________________________________
> ____ OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



--
Regards!
-----------------------------------
Lingxian Kong

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list