[openstack-dev] About Sahara EDP New Ideas for Liberty

Chen, Ken ken.chen at intel.com
Wed Apr 22 12:36:13 UTC 2015


Hi Trevor,
I saw below items in Proposed Sprint Topics of sahara liberty. https://etherpad.openstack.org/p/sahara-liberty-proposed-sessions. I guess these are the EDP ideas we want to discuss on Vancouver design summit. We have some comments as below:

•	EDP Priorities in Liberty - At the last 2 summits, we've looked at possible work for EDP in the cycle and prioritized it. This is helpful, since there is probably more that could be done than can be done in a single cycle :)
o	job scheduler (proposed by weiting)
    >> we already have a spec on this, please help review it and give your comments and ideas. https://review.openstack.org/#/c/175719/ 

o	more complex workflows (job dependencies, DAGs, etc. Do we rely on Oozie, or something else?
    >> Huichun is now figuring this. I am not whether you guys already have some detail ideas about this? If needed we can contribute some effort. If no details are ready, we can help draw a draft version first.

o	job interface mapping https://blueprints.launchpad.net/sahara/+spec/unified-job-interface-map proposed in Kilo but moved to Liberty
	++ high priority in my opinion.  Should be done early, awesome feature
    >> seems interesting. We agree EDP UI should be improved. In fact we have some unclear thinking about EDP inside our team. Some guys do not like current EDP design, and think it is more like a "re-design" of oozie or spark UI, instead of a universal interface to users. However, we have not a clear strategy on this part.

o	early error detection to help transient clusters -- how many things can we detect early that can go wrong with an EDP job so that we return an error before spinning up the cluster (only to find that the job fails once the cluster is launched?) Ex, bad swift paths
    >> seems easier, but may include some trivial work.

•	Spark plugins -- we have an independent Spark plugin, but we also have Spark supported by mapr, and in the future it will be supported by Ambari.  Should we continue to carry a simple Spark standalone plugin?  Or should we work toward shifting our Spark support to one or more vendor plugins?
Not sure what this will impact.

-Ken


-----Original Message-----
From: Trevor McKay [mailto:tmckay at redhat.com] 
Sent: Tuesday, March 24, 2015 10:49 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] About Sahara EDP New Ideas for Liberty

Weiting, Andrew,

Agreed, great ideas!  As Andrew noted, we have discussed some of these things before and it would be great to discuss them in Vancouver.

I think that a Sahara-side workflow manager is the right approach. Oozie has a lot of capability for job coordination, but it won't work for all of our cluster and job types.

Notes on Spark in particular -- when we implemented Spark EDP, we looked at various implementations for a Spark job server.  One was to extend Oozie, one was to use the Ooyala Spark job server, and one was to use ssh around spark-submit.  We chose the last, notes are here:

https://etherpad.openstack.org/p/sahara_spark_edp

We could potentially revisit the Ooyala job server.  My impression at the time was that for the functions we wanted, it was pretty heavy. But if we are going to add job coordination as a general feature, it may be appropriate. I believe in the Spark community it is the dominant solution for job management, open source is here:

https://github.com/spark-jobserver/spark-jobserver

As part of the Spark investigation, I posted on this JIRA, too. This is a JIRA for developing a REST api to the spark job server, which may be enough for us to build our own coordination system:

https://issues.apache.org/jira/browse/SPARK-3644

Best,

Trevor

On Tue, 2015-03-24 at 01:55 +0000, Chen, Weiting wrote:
> Hi Andrew.
> 
>  
> 
> Thanks for response. My reply in line.
> 
>  
> 
> From: Andrew Lazarev [mailto:alazarev at mirantis.com]
> Sent: Saturday, March 21, 2015 12:10 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] About Sahara EDP New Ideas for Liberty
> 
>  
> 
> Hi Weiting,
> 
>  
> 
> 
> >1. Add a schedule feature to run the jobs on time:
> 
> 
> >This request comes from the customer, they usually run the job in a
> specific time every day. So it should be great if there
> 
> 
> > is a scheduler to help arrange the regular job to run.
> 
> 
> Looks like a great feature. And should be quite easy to implement.
> Feel free to create spec for that.
> 
> 
> [Weiting] We are working on the spec and the bp has already been 
> registered in 
> https://blueprints.launchpad.net/sahara/+spec/enable-scheduled-edp-jobs.
> 
>  
> 
> 
> >2. A more complex workflow design in Sahara EDP:
> 
> 
> >Current EDP only provide one job that is running on one cluster.
> 
> 
> Yes. And ability to run several jobs in one oozie workflow is 
> discussed on every summit (e.g. 'coordinated jobs' at 
> https://etherpad.openstack.org/p/kilo-summit-sahara-edp). But for now 
> it was not a priority
> 
> 
>  
> 
> 
> >But in a real case, it should be more complex, they usually use
> multiple jobs to calculate the data and may use several different type 
> clusters to process it..
> 
> 
> It means that workflow manager should be on Sahara side. Looks like a 
> complicated feature. But we would be happy to help with designing and 
> implementing it. Please file proposal for design session on ongoing 
> summit. Are you going to Vancouver?
> 
> 
> [Weiting] I’m not sure I will be there because the plan is still not 
> ready yet. We are also looking for some customer’s real case in big 
> data area and see how they are using data processing in current 
> environment. However, for any idea we can update later.
> 
>  
> 
> 
> >Another concern is about Spark, for Spark it cannot use Oozie to do
> this. So we need to create an abstract layer to help to implement this 
> kind of scenarios.
> 
> 
> If workflow is on Sahara side it should work automatically for all 
> engines.
> 
> [Weiting] Yes, agree.
> 
> 
>  
> 
> 
> Thanks,
> 
> 
> Andrew.
> 
>  
> 
> 
>  
> 
> 
>  
> 
> On Sun, Mar 8, 2015 at 3:17 AM, Chen, Weiting <weiting.chen at intel.com>
> wrote:
> 
>         Hi all.
>         
>          
>         
>         We got several feedbacks about Sahara EDP’s future from some
>         China customers.
>         
>         Here are some ideas we would like to share with you and need
>         your input if we can implement them in Sahara(Liberty).
>         
>          
>         
>         1. Add a schedule feature to run the jobs on time:
>         
>         This request comes from the customer, they usually run the job
>         in a specific time every day. So it should be great if there
>         is a scheduler to help arrange the regular job to run.
>         
>          
>         
>         2. A more complex workflow design in Sahara EDP:
>         
>         Current EDP only provide one job that is running on one
>         cluster. 
>         
>         But in a real case, it should be more complex, they usually
>         use multiple jobs to calculate the data and may use several
>         different type clusters to process it.
>         
>         For example: Raw Data -> Job A(Cluster A) -> Job B(Cluster B)
>         -> Job C(Cluster A) -> Result
>         
>         Actually in my opinion, this kind of job could be easy to
>         implement by using Oozie as a workflow engine. But for current
>         EDP, it doesn’t implement this kind of complex case.
>         
>         Another concern is about Spark, for Spark it cannot use Oozie
>         to do this. So we need to create an abstract layer to help to
>         implement this kind of scenarios.
>         
>          
>         
>         However, any suggestion is welcome. 
>         
>         Thanks.
>         
>          
>         
>         
>         
>         __________________________________________________________________________
>         OpenStack Development Mailing List (not for usage questions)
>         Unsubscribe:
>         OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>         
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>         
>  
> 
> 
> ______________________________________________________________________
> ____ OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


More information about the OpenStack-dev mailing list