<div dir="ltr"><font face="arial, helvetica, sans-serif"><span id="docs-internal-guid-3c5018ef-a19d-f0eb-438d-8f386d9ac51a"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">We want to initiate discussion about Elastic Data Processing (EDP) Savanna component. This functionality is planned to be implemented in the next development phase starting on July 15. The main questions to address: </span></p>
<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt">
<span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">what kind of functionality should be implemented for EDP?</span></p></li><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline">
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">what are the main components and their responsibilities?</span></p>
</li><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">which existing tools like Hue or Oozie should be used?</span></p>
</li></ul><br><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"></span><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">To have something to start, we have prepared an overview of our thoughts in the following document <a href="https://wiki.openstack.org/wiki/Savanna/EDP">https://wiki.openstack.org/wiki/Savanna/EDP</a>. For you convenience, you can find the text below. Your comments and suggestions are welcome.</span></span><br>
</font><div><span><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"><font face="arial, helvetica, sans-serif"><br></font></span></span></div><div><span id="docs-internal-guid-3a543c7e-a19d-7c52-3022-0fe917131d43"><font face="arial, helvetica, sans-serif"><h1 dir="ltr" style="line-height:1.15;margin-top:10pt;margin-bottom:0pt">
<span style="color:rgb(0,0,0);background-color:transparent;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"><font size="4">Key Features</font></span></h1><br><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"></span><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt">
<span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Starting the job:</span></p><ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline">
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Simple REST API and UI</span></p></li><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline">
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">TODO: mockups</span></p></li><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline">
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Job can be entered through UI/API or pulled through VCS</span></p>
</li><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Configurable data source</span></p>
</li></ul><br><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"></span><br><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"></span><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt">
<span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Job execution modes:</span></p><ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline">
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Run job on one of the existing cluster</span></p></li><ul style="margin-top:0pt;margin-bottom:0pt">
<li dir="ltr" style="list-style-type:circle;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Expose information on cluster load</span></p>
</li><li dir="ltr" style="list-style-type:circle;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Provide hints for optimizing data locality TODO: more details</span></p>
</li></ul><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Create new transient cluster for the job</span></p>
</li></ul><br><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"></span><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Job structure:</span></p>
<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt">
<span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Individual job via jar file, Pig or Hive script</span></p></li><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline">
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Oozie workflow</span></p></li><ul style="margin-top:0pt;margin-bottom:0pt">
<li dir="ltr" style="list-style-type:circle;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">In future to support EMR job flows import</span></p>
</li></ul></ul><br><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"></span><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Job execution tracking and monitoring</span></p>
<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt">
<span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Any existent components that can help to visualize? (</span><a href="https://github.com/twitter/ambrose" style="text-decoration:none"><span style="background-color:transparent;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap">Twitter Ambrose</span></a><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">)</span></p>
</li><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Terminate job</span></p>
</li><li dir="ltr" style="list-style-type:disc;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Auto-scaling functionality</span></p>
</li></ul><br><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"></span><h1 dir="ltr" style="line-height:1.15;margin-top:10pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"><font size="4">Main EDP Components </font></span></h1>
<h2 dir="ltr" style="line-height:1.15;margin-top:10pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"><font>Data discovery component</font></span></h2>
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">EDP can have several sources of data for processing. Data can be pulled from Swift, GlusterFS or NoSQL database like Cassandra or HBase. To provide an unified access to this data we’ll introduce a component responsible for discovering data location and providing right configuration for Hadoop cluster. It should have a pluggable system.</span></p>
<h2 dir="ltr" style="line-height:1.15;margin-top:10pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"><font>Job Source</font></span></h2><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt">
<span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Users would like to execute different types of jobs: jar file, Pig and Hive scripts, Oozie job flows, etc. Job description and source code can be supplied in a different way. Some users just want to insert hive script and run it. Other users want to save this script in Savanna internal database for later use. We also need to provide an ability to run a job from source code stored in vcs.</span></p>
<span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"></span><h2 dir="ltr" style="line-height:1.15;margin-top:10pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"><font>Savanna Dispatcher Component</font></span></h2>
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">This component is responsible for provisioning a new cluster, scheduling job on new or existing cluster, resizing cluster and gathering information from clusters about current jobs and utilization. Also, it should provide information to help to make a right decision where to schedule job, create a new cluster or use existing one. For example, current loads on clusters, their proximity to the data location etc. </span></p>
<h2 dir="ltr" style="line-height:1.15;margin-top:10pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"><font>UI Component</font></span></h2><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt">
<span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Integration into OpenStack Dashboard - Horizon. It should provide instruments for job creation, monitoring etc. </span></p>
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Cloudera Hue already provides part of this functionality: submit jobs (jar file, Hive, Pig, Impala), view job status and output.</span></p>
<h2 dir="ltr" style="line-height:1.15;margin-top:10pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"><font>Cluster Level Coordination Component</font> </span></h2>
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Expose information about jobs on a specific cluster. Possible this component should be represent by existing Hadoop projects Hue and Oozie.</span></p>
<span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"></span><h1 dir="ltr" style="line-height:1.15;margin-top:10pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"><font size="4">User Workflow</font></span></h1>
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">- User selects or creates a job to run</span></p>
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">- User chooses data source for appropriate type for this job</span></p>
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">- Dispatcher provides hints to user about a better way to scheduler this job (on existing clusters or create a new one)</span></p>
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">- User makes a decision based on the hint from dispatcher</span></p>
<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">- Dispatcher (if needed) creates or resizes existing cluster and schedules job to it</span></p>
<span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap">- Dispatcher periodically pull status of job and shows it on UI</span></font></span><span><span style="font-size:15px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"><br>
</span></span></div><div><span><font face="arial, helvetica, sans-serif"><span style="color:rgb(0,0,0);background-color:transparent;vertical-align:baseline;white-space:pre-wrap"><br></span></font></span></div><div style><p class="" style="font-family:arial,sans-serif;font-size:13px">
Thanks,</p><p class="" style="font-family:arial,sans-serif;font-size:13px">Alexander Kuznetsov<br></p></div></div>