[openstack-dev] [savanna] Specific job type for streaming mapreduce? (and someday pipes)

Andrew Lazarev alazarev at mirantis.com
Mon Feb 3 22:57:59 UTC 2014


I see two points:
* having Savanna types mapped to Oozie action types is intuitive for hadoop
users and this is something we would like to keep
* it is hard to distinguish different kinds of one job type

Adding 'subtype' field will solve both problems. Having it optional will
not break backward compatibility. Adding database migration
script is also pretty straightforward.

Summarizing, my vote is on "subtype" field.

Thanks,
Andrew.


On Mon, Feb 3, 2014 at 2:10 PM, Trevor McKay <tmckay at redhat.com> wrote:

>
> I was trying my best to avoid adding extra job types to support
> mapreduce variants like streaming or mapreduce with pipes, but it seems
> that adding the types is the simplest solution.
>
> On the API side, Savanna can live without a specific job type by
> examining the data in the job record.  Presence/absence of certain
> things, or null values, etc, can provide adequate indicators to what
> kind of mapreduce it is.  Maybe a little bit subtle.
>
> But for the UI, it seems that explicit knowledge of what the job is
> makes things easier and better for the user.  When a user creates a
> streaming mapreduce job and the UI is aware of the type later on at job
> launch, the user can be prompted to provide the right configs (i.e., the
> streaming mapper and reducer values).
>
> The explicit job type also supports validation without having to add
> extra flags (which impacts the savanna client, and the JSON, etc). For
> example, a streaming mapreduce job does not require any specified
> libraries so the fact that it is meant to be a streaming job needs to be
> known at job creation time.
>
> So, to that end, I propose that we add a MapReduceStreaming job type,
> and probably at some point we will have MapReducePiped too. It's
> possible that we might have other job types in the future too as the
> feature set grows.
>
> There was an effort to make Savanna job types parallel Oozie action
> types, but in this case that's just not possible without introducing a
> "subtype" field in the job record, which leads to a database migration
> script and savanna client changes.
>
> What do you think?
>
> Best,
>
> Trevor
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140203/d0dc170b/attachment.html>


More information about the OpenStack-dev mailing list