[openstack-dev] [savanna] Specific job type for streaming mapreduce? (and someday pipes)
Trevor McKay
tmckay at redhat.com
Wed Feb 5 14:11:14 UTC 2014
Okay,
Thanks. I'll make a draft CR that sets up Savanna for dotted names,
and one that uses dotted names with streaming.
Best,
Trevor
On Wed, 2014-02-05 at 15:58 +0400, Sergey Lukjanov wrote:
> I like the dot-separated name. There are several reasons for it:
>
>
> * it'll not require changes in all Savanna subprojects;
> * eventually we'd like to use not only Oozie for EDP (for example, if
> we'll support Twitter Storm) and this new tools could require
> additional 'subtypes'.
>
>
> Thanks for catching this.
>
>
> On Tue, Feb 4, 2014 at 10:47 PM, Trevor McKay <tmckay at redhat.com>
> wrote:
> Thanks Andrew.
>
> My author thought, which is in between, is to allow dotted
> types.
> "MapReduce.streaming" for example.
>
> This gives you the subtype flavor but keeps all the APIs the
> same.
> We just need a wrapper function to separate them when we
> compare types.
>
> Best,
>
> Trevor
>
> On Mon, 2014-02-03 at 14:57 -0800, Andrew Lazarev wrote:
> > I see two points:
> > * having Savanna types mapped to Oozie action types is
> intuitive for
> > hadoop users and this is something we would like to keep
> > * it is hard to distinguish different kinds of one job type
> >
> >
> > Adding 'subtype' field will solve both problems. Having it
> optional
> > will not break backward compatibility. Adding database
> migration
> > script is also pretty straightforward.
> >
> >
> > Summarizing, my vote is on "subtype" field.
> >
> >
> > Thanks,
> > Andrew.
> >
> >
> > On Mon, Feb 3, 2014 at 2:10 PM, Trevor McKay
> <tmckay at redhat.com>
> > wrote:
> >
> > I was trying my best to avoid adding extra job types
> to
> > support
> > mapreduce variants like streaming or mapreduce with
> pipes, but
> > it seems
> > that adding the types is the simplest solution.
> >
> > On the API side, Savanna can live without a specific
> job type
> > by
> > examining the data in the job record.
> Presence/absence of
> > certain
> > things, or null values, etc, can provide adequate
> indicators
> > to what
> > kind of mapreduce it is. Maybe a little bit subtle.
> >
> > But for the UI, it seems that explicit knowledge of
> what the
> > job is
> > makes things easier and better for the user. When a
> user
> > creates a
> > streaming mapreduce job and the UI is aware of the
> type later
> > on at job
> > launch, the user can be prompted to provide the
> right configs
> > (i.e., the
> > streaming mapper and reducer values).
> >
> > The explicit job type also supports validation
> without having
> > to add
> > extra flags (which impacts the savanna client, and
> the JSON,
> > etc). For
> > example, a streaming mapreduce job does not require
> any
> > specified
> > libraries so the fact that it is meant to be a
> streaming job
> > needs to be
> > known at job creation time.
> >
> > So, to that end, I propose that we add a
> MapReduceStreaming
> > job type,
> > and probably at some point we will have
> MapReducePiped too.
> > It's
> > possible that we might have other job types in the
> future too
> > as the
> > feature set grows.
> >
> > There was an effort to make Savanna job types
> parallel Oozie
> > action
> > types, but in this case that's just not possible
> without
> > introducing a
> > "subtype" field in the job record, which leads to a
> database
> > migration
> > script and savanna client changes.
> >
> > What do you think?
> >
> > Best,
> >
> > Trevor
> >
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
>
> --
> Sincerely yours,
> Sergey Lukjanov
> Savanna Technical Lead
> Mirantis Inc.
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list