[openstack-dev] [savanna] Specific job type for streaming mapreduce? (and someday pipes)

Trevor McKay tmckay at redhat.com
Wed Feb 5 14:11:14 UTC 2014


Okay,

  Thanks. I'll make a draft CR that sets up Savanna for dotted names,
and one that uses dotted names with streaming.

Best,

Trevor

On Wed, 2014-02-05 at 15:58 +0400, Sergey Lukjanov wrote:
> I like the dot-separated name. There are several reasons for it:
> 
> 
> * it'll not require changes in all Savanna subprojects;
> * eventually we'd like to use not only Oozie for EDP (for example, if
> we'll support Twitter Storm) and this new tools could require
> additional 'subtypes'.
> 
> 
> Thanks for catching this.
> 
> 
> On Tue, Feb 4, 2014 at 10:47 PM, Trevor McKay <tmckay at redhat.com>
> wrote:
>         Thanks Andrew.
>         
>         My author thought, which is in between, is to allow dotted
>         types.
>         "MapReduce.streaming" for example.
>         
>         This gives you the subtype flavor but keeps all the APIs the
>         same.
>         We just need a wrapper function to separate them when we
>         compare types.
>         
>         Best,
>         
>         Trevor
>         
>         On Mon, 2014-02-03 at 14:57 -0800, Andrew Lazarev wrote:
>         > I see two points:
>         > * having Savanna types mapped to Oozie action types is
>         intuitive for
>         > hadoop users and this is something we would like to keep
>         > * it is hard to distinguish different kinds of one job type
>         >
>         >
>         > Adding 'subtype' field will solve both problems. Having it
>         optional
>         > will not break backward compatibility. Adding database
>         migration
>         > script is also pretty straightforward.
>         >
>         >
>         > Summarizing, my vote is on "subtype" field.
>         >
>         >
>         > Thanks,
>         > Andrew.
>         >
>         >
>         > On Mon, Feb 3, 2014 at 2:10 PM, Trevor McKay
>         <tmckay at redhat.com>
>         > wrote:
>         >
>         >         I was trying my best to avoid adding extra job types
>         to
>         >         support
>         >         mapreduce variants like streaming or mapreduce with
>         pipes, but
>         >         it seems
>         >         that adding the types is the simplest solution.
>         >
>         >         On the API side, Savanna can live without a specific
>         job type
>         >         by
>         >         examining the data in the job record.
>          Presence/absence of
>         >         certain
>         >         things, or null values, etc, can provide adequate
>         indicators
>         >         to what
>         >         kind of mapreduce it is.  Maybe a little bit subtle.
>         >
>         >         But for the UI, it seems that explicit knowledge of
>         what the
>         >         job is
>         >         makes things easier and better for the user.  When a
>         user
>         >         creates a
>         >         streaming mapreduce job and the UI is aware of the
>         type later
>         >         on at job
>         >         launch, the user can be prompted to provide the
>         right configs
>         >         (i.e., the
>         >         streaming mapper and reducer values).
>         >
>         >         The explicit job type also supports validation
>         without having
>         >         to add
>         >         extra flags (which impacts the savanna client, and
>         the JSON,
>         >         etc). For
>         >         example, a streaming mapreduce job does not require
>         any
>         >         specified
>         >         libraries so the fact that it is meant to be a
>         streaming job
>         >         needs to be
>         >         known at job creation time.
>         >
>         >         So, to that end, I propose that we add a
>         MapReduceStreaming
>         >         job type,
>         >         and probably at some point we will have
>         MapReducePiped too.
>         >         It's
>         >         possible that we might have other job types in the
>         future too
>         >         as the
>         >         feature set grows.
>         >
>         >         There was an effort to make Savanna job types
>         parallel Oozie
>         >         action
>         >         types, but in this case that's just not possible
>         without
>         >         introducing a
>         >         "subtype" field in the job record, which leads to a
>         database
>         >         migration
>         >         script and savanna client changes.
>         >
>         >         What do you think?
>         >
>         >         Best,
>         >
>         >         Trevor
>         >
>         >
>         >
>         >         _______________________________________________
>         >         OpenStack-dev mailing list
>         >         OpenStack-dev at lists.openstack.org
>         >
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>         >
>         >
>         > _______________________________________________
>         > OpenStack-dev mailing list
>         > OpenStack-dev at lists.openstack.org
>         >
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>         
>         
>         
>         _______________________________________________
>         OpenStack-dev mailing list
>         OpenStack-dev at lists.openstack.org
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>         
> 
> 
> 
> 
> -- 
> Sincerely yours,
> Sergey Lukjanov
> Savanna Technical Lead
> Mirantis Inc.
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





More information about the OpenStack-dev mailing list