[openstack-dev] [savanna] Specific job type for streaming mapreduce? (and someday pipes)

Trevor McKay tmckay at redhat.com
Tue Feb 4 18:47:13 UTC 2014


Thanks Andrew.

My author thought, which is in between, is to allow dotted types.
"MapReduce.streaming" for example.

This gives you the subtype flavor but keeps all the APIs the same.
We just need a wrapper function to separate them when we compare types.

Best,

Trevor

On Mon, 2014-02-03 at 14:57 -0800, Andrew Lazarev wrote:
> I see two points:
> * having Savanna types mapped to Oozie action types is intuitive for
> hadoop users and this is something we would like to keep
> * it is hard to distinguish different kinds of one job type
> 
> 
> Adding 'subtype' field will solve both problems. Having it optional
> will not break backward compatibility. Adding database migration
> script is also pretty straightforward.
> 
> 
> Summarizing, my vote is on "subtype" field.
> 
> 
> Thanks,
> Andrew.
> 
> 
> On Mon, Feb 3, 2014 at 2:10 PM, Trevor McKay <tmckay at redhat.com>
> wrote:
>         
>         I was trying my best to avoid adding extra job types to
>         support
>         mapreduce variants like streaming or mapreduce with pipes, but
>         it seems
>         that adding the types is the simplest solution.
>         
>         On the API side, Savanna can live without a specific job type
>         by
>         examining the data in the job record.  Presence/absence of
>         certain
>         things, or null values, etc, can provide adequate indicators
>         to what
>         kind of mapreduce it is.  Maybe a little bit subtle.
>         
>         But for the UI, it seems that explicit knowledge of what the
>         job is
>         makes things easier and better for the user.  When a user
>         creates a
>         streaming mapreduce job and the UI is aware of the type later
>         on at job
>         launch, the user can be prompted to provide the right configs
>         (i.e., the
>         streaming mapper and reducer values).
>         
>         The explicit job type also supports validation without having
>         to add
>         extra flags (which impacts the savanna client, and the JSON,
>         etc). For
>         example, a streaming mapreduce job does not require any
>         specified
>         libraries so the fact that it is meant to be a streaming job
>         needs to be
>         known at job creation time.
>         
>         So, to that end, I propose that we add a MapReduceStreaming
>         job type,
>         and probably at some point we will have MapReducePiped too.
>         It's
>         possible that we might have other job types in the future too
>         as the
>         feature set grows.
>         
>         There was an effort to make Savanna job types parallel Oozie
>         action
>         types, but in this case that's just not possible without
>         introducing a
>         "subtype" field in the job record, which leads to a database
>         migration
>         script and savanna client changes.
>         
>         What do you think?
>         
>         Best,
>         
>         Trevor
>         
>         
>         
>         _______________________________________________
>         OpenStack-dev mailing list
>         OpenStack-dev at lists.openstack.org
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





More information about the OpenStack-dev mailing list