[openstack-dev] [savanna] How to handle diverging EDP job configuration settings
Andrew Lazarev
alazarev at mirantis.com
Wed Jan 29 17:08:49 UTC 2014
I like idea of "edp." prefix.
Andrew.
On Wed, Jan 29, 2014 at 6:23 AM, Trevor McKay <tmckay at redhat.com> wrote:
> So, assuming we go forward with this, the followup question is whether
> or not to move "main_class" and "java_opts" for Java actions into
> "edp.java.main_class" and "edp.java.java_opts" configs.
>
> I think yes.
>
> Best,
>
> Trevor
>
> On Wed, 2014-01-29 at 09:15 -0500, Trevor McKay wrote:
> > On Wed, 2014-01-29 at 14:35 +0400, Alexander Ignatov wrote:
> > > Thank you for bringing this up, Trevor.
> > >
> > > EDP gets more diverse and it's time to change its model.
> > > I totally agree with your proposal, but one minor comment.
> > > Instead of "savanna." prefix in job_configs wouldn't it be better to
> make it
> > > as "edp."? I think "savanna." is too more wide word for this.
> >
> > +1, brilliant. EDP is perfect. I was worried about the scope of
> > "savanna." too.
> >
> > > And one more bureaucratic thing... I see you already started
> implementing it [1],
> > > and it is named and goes as new EDP workflow [2]. I think new bluprint
> should be
> > > created for this feature to track all code changes as well as docs
> updates.
> > > Docs I mean public Savanna docs about EDP, rest api docs and samples.
> >
> > Absolutely, I can make it new blueprint. Thanks.
> >
> > > [1] https://review.openstack.org/#/c/69712
> > > [2]
> https://blueprints.launchpad.net/openstack/?searchtext=edp-oozie-streaming-mapreduce
> > >
> > > Regards,
> > > Alexander Ignatov
> > >
> > >
> > >
> > > On 28 Jan 2014, at 20:47, Trevor McKay <tmckay at redhat.com> wrote:
> > >
> > > > Hello all,
> > > >
> > > > In our first pass at EDP, the model for job settings was very
> consistent
> > > > across all of our job types. The execution-time settings fit into
> this
> > > > (superset) structure:
> > > >
> > > > job_configs = {'configs': {}, # config settings for oozie and hadoop
> > > > 'params': {}, # substitution values for Pig/Hive
> > > > 'args': []} # script args (Pig and Java actions)
> > > >
> > > > But we have some things that don't fit (and probably more in the
> > > > future):
> > > >
> > > > 1) Java jobs have 'main_class' and 'java_opts' settings
> > > > Currently these are handled as additional fields added to the
> > > > structure above. These were the first to diverge.
> > > >
> > > > 2) Streaming MapReduce (anticipated) requires mapper and reducer
> > > > settings (different than the mapred.xxxx.class settings for
> > > > non-streaming MapReduce)
> > > >
> > > > Problems caused by adding fields
> > > > --------------------------------
> > > > The job_configs structure above is stored in the database. Each time
> we
> > > > add a field to the structure above at the level of configs, params,
> and
> > > > args, we force a change to the database tables, a migration script
> and a
> > > > change to the JSON validation for the REST api.
> > > >
> > > > We also cause a change for python-savannaclient and potentially other
> > > > clients.
> > > >
> > > > This kind of change seems bad.
> > > >
> > > > Proposal: Borrow a page from Oozie and add "savanna." configs
> > > > -------------------------------------------------------------
> > > > I would like to fit divergent job settings into the structure we
> already
> > > > have. One way to do this is to leverage the 'configs' dictionary.
> This
> > > > dictionary primarily contains settings for hadoop, but there are a
> > > > number of "oozie.xxx" settings that are passed to oozie as configs or
> > > > set by oozie for the benefit of running apps.
> > > >
> > > > What if we allow "savanna." settings to be added to configs? If we
> do
> > > > that, any and all special configuration settings for specific job
> types
> > > > or subtypes can be handled with no database changes and no api
> changes.
> > > >
> > > > Downside
> > > > --------
> > > > Currently, all 'configs' are rendered in the generated oozie
> workflow.
> > > > The "savanna." settings would be stripped out and processed by
> Savanna,
> > > > thereby changing that behavior a bit (maybe not a big deal)
> > > >
> > > > We would also be mixing "savanna." configs with config_hints for
> jobs,
> > > > so users would potentially see "savanna.xxxx" settings mixed with
> oozie
> > > > and hadoop settings. Again, maybe not a big deal, but it might blur
> the
> > > > lines a little bit. Personally, I'm okay with this.
> > > >
> > > > Slightly different
> > > > ------------------
> > > > We could also add a "'savanna-configs': {}" element to job_configs to
> > > > keep the configuration spaces separate.
> > > >
> > > > But, now we would have 'savanna-configs' (or another name),
> 'configs',
> > > > 'params', and 'args'. Really? Just how many different types of
> values
> > > > can we come up with? :)
> > > >
> > > > I lean away from this approach.
> > > >
> > > > Related: breaking up the superset
> > > > ---------------------------------
> > > >
> > > > It is also the case that not every job type has every value type.
> > > >
> > > > Configs Params Args
> > > > Hive Y Y N
> > > > Pig Y Y Y
> > > > MapReduce Y N N
> > > > Java Y N Y
> > > >
> > > > So do we make that explicit in the docs and enforce it in the api
> with
> > > > errors?
> > > >
> > > > Thoughts? I'm sure there are some :)
> > > >
> > > > Best,
> > > >
> > > > Trevor
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > OpenStack-dev mailing list
> > > > OpenStack-dev at lists.openstack.org
> > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> > >
> > > _______________________________________________
> > > OpenStack-dev mailing list
> > > OpenStack-dev at lists.openstack.org
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140129/6b94ba7d/attachment.html>
More information about the OpenStack-dev
mailing list