[openstack-dev] [savanna] Specific job type for streaming mapreduce? (and someday pipes)

Trevor McKay tmckay at redhat.com
Mon Feb 3 22:10:34 UTC 2014


I was trying my best to avoid adding extra job types to support
mapreduce variants like streaming or mapreduce with pipes, but it seems
that adding the types is the simplest solution.

On the API side, Savanna can live without a specific job type by
examining the data in the job record.  Presence/absence of certain
things, or null values, etc, can provide adequate indicators to what
kind of mapreduce it is.  Maybe a little bit subtle.

But for the UI, it seems that explicit knowledge of what the job is
makes things easier and better for the user.  When a user creates a
streaming mapreduce job and the UI is aware of the type later on at job
launch, the user can be prompted to provide the right configs (i.e., the
streaming mapper and reducer values).

The explicit job type also supports validation without having to add
extra flags (which impacts the savanna client, and the JSON, etc). For
example, a streaming mapreduce job does not require any specified
libraries so the fact that it is meant to be a streaming job needs to be
known at job creation time.

So, to that end, I propose that we add a MapReduceStreaming job type,
and probably at some point we will have MapReducePiped too. It's
possible that we might have other job types in the future too as the
feature set grows.

There was an effort to make Savanna job types parallel Oozie action
types, but in this case that's just not possible without introducing a
"subtype" field in the job record, which leads to a database migration
script and savanna client changes.

What do you think?

Best,

Trevor





More information about the OpenStack-dev mailing list