[openstack-dev] [savanna] How to handle diverging EDP job configuration settings

Jon Maron jmaron at hortonworks.com
Wed Jan 29 14:37:11 UTC 2014


I imagine ‘neutron’ would follow suit as well..

On Jan 29, 2014, at 9:23 AM, Trevor McKay <tmckay at redhat.com> wrote:

> So, assuming we go forward with this, the followup question is whether
> or not to move "main_class" and "java_opts" for Java actions into
> "edp.java.main_class" and "edp.java.java_opts" configs.
> 
> I think yes.
> 
> Best,
> 
> Trevor
> 
> On Wed, 2014-01-29 at 09:15 -0500, Trevor McKay wrote:
>> On Wed, 2014-01-29 at 14:35 +0400, Alexander Ignatov wrote:
>>> Thank you for bringing this up, Trevor.
>>> 
>>> EDP gets more diverse and it's time to change its model.
>>> I totally agree with your proposal, but one minor comment.
>>> Instead of "savanna." prefix in job_configs wouldn't it be better to make it
>>> as "edp."? I think "savanna." is too more wide word for this.
>> 
>> +1, brilliant. EDP is perfect.  I was worried about the scope of
>> "savanna." too.
>> 
>>> And one more bureaucratic thing... I see you already started implementing it [1], 
>>> and it is named and goes as new EDP workflow [2]. I think new bluprint should be 
>>> created for this feature to track all code changes as well as docs updates. 
>>> Docs I mean public Savanna docs about EDP, rest api docs and samples.
>> 
>> Absolutely, I can make it new blueprint.  Thanks.
>> 
>>> [1] https://review.openstack.org/#/c/69712
>>> [2] https://blueprints.launchpad.net/openstack/?searchtext=edp-oozie-streaming-mapreduce
>>> 
>>> Regards,
>>> Alexander Ignatov
>>> 
>>> 
>>> 
>>> On 28 Jan 2014, at 20:47, Trevor McKay <tmckay at redhat.com> wrote:
>>> 
>>>> Hello all,
>>>> 
>>>> In our first pass at EDP, the model for job settings was very consistent
>>>> across all of our job types. The execution-time settings fit into this
>>>> (superset) structure:
>>>> 
>>>> job_configs = {'configs': {}, # config settings for oozie and hadoop
>>>> 	       'params': {},  # substitution values for Pig/Hive
>>>> 	       'args': []}    # script args (Pig and Java actions)
>>>> 
>>>> But we have some things that don't fit (and probably more in the
>>>> future):
>>>> 
>>>> 1) Java jobs have 'main_class' and 'java_opts' settings
>>>>  Currently these are handled as additional fields added to the
>>>> structure above.  These were the first to diverge.
>>>> 
>>>> 2) Streaming MapReduce (anticipated) requires mapper and reducer
>>>> settings (different than the mapred.xxxx.class settings for
>>>> non-streaming MapReduce)
>>>> 
>>>> Problems caused by adding fields
>>>> --------------------------------
>>>> The job_configs structure above is stored in the database. Each time we
>>>> add a field to the structure above at the level of configs, params, and
>>>> args, we force a change to the database tables, a migration script and a
>>>> change to the JSON validation for the REST api.
>>>> 
>>>> We also cause a change for python-savannaclient and potentially other
>>>> clients.
>>>> 
>>>> This kind of change seems bad.
>>>> 
>>>> Proposal: Borrow a page from Oozie and add "savanna." configs
>>>> -------------------------------------------------------------
>>>> I would like to fit divergent job settings into the structure we already
>>>> have.  One way to do this is to leverage the 'configs' dictionary.  This
>>>> dictionary primarily contains settings for hadoop, but there are a
>>>> number of "oozie.xxx" settings that are passed to oozie as configs or
>>>> set by oozie for the benefit of running apps.
>>>> 
>>>> What if we allow "savanna." settings to be added to configs?  If we do
>>>> that, any and all special configuration settings for specific job types
>>>> or subtypes can be handled with no database changes and no api changes.
>>>> 
>>>> Downside
>>>> --------
>>>> Currently, all 'configs' are rendered in the generated oozie workflow.
>>>> The "savanna." settings would be stripped out and processed by Savanna,
>>>> thereby changing that behavior a bit (maybe not a big deal)
>>>> 
>>>> We would also be mixing "savanna." configs with config_hints for jobs,
>>>> so users would potentially see "savanna.xxxx" settings mixed with oozie
>>>> and hadoop settings.  Again, maybe not a big deal, but it might blur the
>>>> lines a little bit.  Personally, I'm okay with this.
>>>> 
>>>> Slightly different
>>>> ------------------
>>>> We could also add a "'savanna-configs': {}" element to job_configs to
>>>> keep the configuration spaces separate.
>>>> 
>>>> But, now we would have 'savanna-configs' (or another name), 'configs',
>>>> 'params', and 'args'.  Really? Just how many different types of values
>>>> can we come up with? :)
>>>> 
>>>> I lean away from this approach.
>>>> 
>>>> Related: breaking up the superset
>>>> ---------------------------------
>>>> 
>>>> It is also the case that not every job type has every value type.
>>>> 
>>>>            Configs   Params    Args
>>>> Hive            Y         Y        N
>>>> Pig             Y         Y        Y
>>>> MapReduce       Y         N        N
>>>> Java            Y         N        Y
>>>> 
>>>> So do we make that explicit in the docs and enforce it in the api with
>>>> errors?
>>>> 
>>>> Thoughts? I'm sure there are some :)
>>>> 
>>>> Best,
>>>> 
>>>> Trevor
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> 
>>> 
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
>> 
>> 
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.



More information about the OpenStack-dev mailing list