[openstack-dev] [savanna] How to handle diverging EDP job configuration settings

Sergey Lukjanov slukjanov at mirantis.com
Wed Jan 29 15:44:16 UTC 2014


Trevor,

it sounds reasonable to move main_class and java_opts to edp.java.

Jon,

does you mean neutron-related info for namespaces support? If yes than
neutron isn't the user-side config.

Thanks.


On Wed, Jan 29, 2014 at 6:37 PM, Jon Maron <jmaron at hortonworks.com> wrote:

> I imagine 'neutron' would follow suit as well..
>
> On Jan 29, 2014, at 9:23 AM, Trevor McKay <tmckay at redhat.com> wrote:
>
> > So, assuming we go forward with this, the followup question is whether
> > or not to move "main_class" and "java_opts" for Java actions into
> > "edp.java.main_class" and "edp.java.java_opts" configs.
> >
> > I think yes.
> >
> > Best,
> >
> > Trevor
> >
> > On Wed, 2014-01-29 at 09:15 -0500, Trevor McKay wrote:
> >> On Wed, 2014-01-29 at 14:35 +0400, Alexander Ignatov wrote:
> >>> Thank you for bringing this up, Trevor.
> >>>
> >>> EDP gets more diverse and it's time to change its model.
> >>> I totally agree with your proposal, but one minor comment.
> >>> Instead of "savanna." prefix in job_configs wouldn't it be better to
> make it
> >>> as "edp."? I think "savanna." is too more wide word for this.
> >>
> >> +1, brilliant. EDP is perfect.  I was worried about the scope of
> >> "savanna." too.
> >>
> >>> And one more bureaucratic thing... I see you already started
> implementing it [1],
> >>> and it is named and goes as new EDP workflow [2]. I think new bluprint
> should be
> >>> created for this feature to track all code changes as well as docs
> updates.
> >>> Docs I mean public Savanna docs about EDP, rest api docs and samples.
> >>
> >> Absolutely, I can make it new blueprint.  Thanks.
> >>
> >>> [1] https://review.openstack.org/#/c/69712
> >>> [2]
> https://blueprints.launchpad.net/openstack/?searchtext=edp-oozie-streaming-mapreduce
> >>>
> >>> Regards,
> >>> Alexander Ignatov
> >>>
> >>>
> >>>
> >>> On 28 Jan 2014, at 20:47, Trevor McKay <tmckay at redhat.com> wrote:
> >>>
> >>>> Hello all,
> >>>>
> >>>> In our first pass at EDP, the model for job settings was very
> consistent
> >>>> across all of our job types. The execution-time settings fit into this
> >>>> (superset) structure:
> >>>>
> >>>> job_configs = {'configs': {}, # config settings for oozie and hadoop
> >>>>           'params': {},  # substitution values for Pig/Hive
> >>>>           'args': []}    # script args (Pig and Java actions)
> >>>>
> >>>> But we have some things that don't fit (and probably more in the
> >>>> future):
> >>>>
> >>>> 1) Java jobs have 'main_class' and 'java_opts' settings
> >>>>  Currently these are handled as additional fields added to the
> >>>> structure above.  These were the first to diverge.
> >>>>
> >>>> 2) Streaming MapReduce (anticipated) requires mapper and reducer
> >>>> settings (different than the mapred.xxxx.class settings for
> >>>> non-streaming MapReduce)
> >>>>
> >>>> Problems caused by adding fields
> >>>> --------------------------------
> >>>> The job_configs structure above is stored in the database. Each time
> we
> >>>> add a field to the structure above at the level of configs, params,
> and
> >>>> args, we force a change to the database tables, a migration script
> and a
> >>>> change to the JSON validation for the REST api.
> >>>>
> >>>> We also cause a change for python-savannaclient and potentially other
> >>>> clients.
> >>>>
> >>>> This kind of change seems bad.
> >>>>
> >>>> Proposal: Borrow a page from Oozie and add "savanna." configs
> >>>> -------------------------------------------------------------
> >>>> I would like to fit divergent job settings into the structure we
> already
> >>>> have.  One way to do this is to leverage the 'configs' dictionary.
>  This
> >>>> dictionary primarily contains settings for hadoop, but there are a
> >>>> number of "oozie.xxx" settings that are passed to oozie as configs or
> >>>> set by oozie for the benefit of running apps.
> >>>>
> >>>> What if we allow "savanna." settings to be added to configs?  If we do
> >>>> that, any and all special configuration settings for specific job
> types
> >>>> or subtypes can be handled with no database changes and no api
> changes.
> >>>>
> >>>> Downside
> >>>> --------
> >>>> Currently, all 'configs' are rendered in the generated oozie workflow.
> >>>> The "savanna." settings would be stripped out and processed by
> Savanna,
> >>>> thereby changing that behavior a bit (maybe not a big deal)
> >>>>
> >>>> We would also be mixing "savanna." configs with config_hints for jobs,
> >>>> so users would potentially see "savanna.xxxx" settings mixed with
> oozie
> >>>> and hadoop settings.  Again, maybe not a big deal, but it might blur
> the
> >>>> lines a little bit.  Personally, I'm okay with this.
> >>>>
> >>>> Slightly different
> >>>> ------------------
> >>>> We could also add a "'savanna-configs': {}" element to job_configs to
> >>>> keep the configuration spaces separate.
> >>>>
> >>>> But, now we would have 'savanna-configs' (or another name), 'configs',
> >>>> 'params', and 'args'.  Really? Just how many different types of values
> >>>> can we come up with? :)
> >>>>
> >>>> I lean away from this approach.
> >>>>
> >>>> Related: breaking up the superset
> >>>> ---------------------------------
> >>>>
> >>>> It is also the case that not every job type has every value type.
> >>>>
> >>>>            Configs   Params    Args
> >>>> Hive            Y         Y        N
> >>>> Pig             Y         Y        Y
> >>>> MapReduce       Y         N        N
> >>>> Java            Y         N        Y
> >>>>
> >>>> So do we make that explicit in the docs and enforce it in the api with
> >>>> errors?
> >>>>
> >>>> Thoughts? I'm sure there are some :)
> >>>>
> >>>> Best,
> >>>>
> >>>> Trevor
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> OpenStack-dev mailing list
> >>>> OpenStack-dev at lists.openstack.org
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>
> >>>
> >>> _______________________________________________
> >>> OpenStack-dev mailing list
> >>> OpenStack-dev at lists.openstack.org
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >>
> >>
> >> _______________________________________________
> >> OpenStack-dev mailing list
> >> OpenStack-dev at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Sincerely yours,
Sergey Lukjanov
Savanna Technical Lead
Mirantis Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140129/d5dd762d/attachment.html>


More information about the OpenStack-dev mailing list