[openstack-dev] About Sahara EDP New Ideas for Liberty

Trevor McKay tmckay at redhat.com
Wed Apr 22 13:48:17 UTC 2015

Hi Ken,

  responses inline

On Wed, 2015-04-22 at 12:36 +0000, Chen, Ken wrote:
> Hi Trevor, 
> I saw below items in Proposed Sprint Topics of sahara liberty.
> https://etherpad.openstack.org/p/sahara-liberty-proposed-sessions. I
> guess these are the EDP ideas we want to discuss on Vancouver design
> summit. We have some comments as below: 

Yes, feel free to add anything else to the pad.  We'll talk about as
much as we have time for.  I'm thinking that most of it will be covered
on Friday, or in between sessions during the week if folks are around.

> o       job scheduler (proposed by weiting) 
>     >> we already have a spec on this, please help review it and give
> your comments and ideas. https://review.openstack.org/#/c/175719/  

Great! thanks

> o       more complex workflows (job dependencies, DAGs, etc. Do we
> rely on Oozie, or something else? 
>     >> Huichun is now figuring this. I am not whether you guys already
> have some detail ideas about this? If needed we can contribute some
> effort. If no details are ready, we can help draw a draft version
> first. 

No work on this so far, although we have talked about it off and on for
a few cycles. Oozie has a lot of capabilities for coordination, but we
are not Oozie-only, so what do we do?  This is the central question.

> o       job interface mapping
> https://blueprints.launchpad.net/sahara/+spec/unified-job-interface-map proposed in Kilo but moved to Liberty 
>        ++ high priority in my opinion.  Should be done early, awesome
> feature 
>     >> seems interesting. We agree EDP UI should be improved. In fact
> we have some unclear thinking about EDP inside our team. Some guys do
> not like current EDP design, and think it is more like a "re-design"
> of oozie or spark UI, instead of a universal interface to users.
> However, we have not a clear strategy on this part. 

Yes, Oozie had a heavy influence on EDP. This is partly historical --
EDP was written rapidly between Havanna and Icehouse and based on Oozie
since it offered handling of jobs and multiple types out of the box. It
was a quick path to EDP functionality.

However, EDP should be more of a universal interface. We only support a
few conceptual operations -- run job, cancel job, and job status. With
those three operations, we should be able to run anything. For example,
recently Telles has been working on Storm support.

The job interface mapping will help generalize how arguments are passed
to jobs and allow us to remove some assumptions about jobs.  I am all
for other generalizations that will move EDP further in the direction of
a general interface.

> o       early error detection to help transient clusters -- how many
> things can we detect early that can go wrong with an EDP job so that
> we return an error before spinning up the cluster (only to find that
> the job fails once the cluster is launched?) Ex, bad swift paths 
>     >> seems easier, but may include some trivial work. 

Some of this will be folded into the job interface mapping.  Ethan has
just updated the spec to include "input_datasource" and
"output_datasource" as argument types. If we know what will be done with
a datasource, we can potentially validate it before the job runs.

> •       Spark plugins -- we have an independent Spark plugin, but we
> also have Spark supported by mapr, and in the future it will be
> supported by Ambari.  Should we continue to carry a simple Spark
> standalone plugin?  Or should we work toward shifting our Spark
> support to one or more vendor plugins? 
> Not sure what this will impact. 

Thought about this more. Overlap in the plugins is fine, as long as
there is someone in the community willing and able to support it. 

> -Ken 

More information about the OpenStack-dev mailing list