[Openstack] [Sahara] Jobs get marked as "failed" immediately on Spark cluster
Jeremy Freudberg
jfreud at bu.edu
Tue Aug 9 15:15:25 UTC 2016
Hey Vitaly,
I solved the issue. As you pointed out,
https://github.com/openstack/sahara/blob/master/sahara/service/edp/job_manager.py#L124
was quite relevant.
However, you linked to version of this file on master branch. Liberty
branch file looks a little different:
https://github.com/openstack/sahara/blob/stable/liberty/sahara/service/edp/job_manager.py#L115
So, fix is already there. It is just only there in master and Mitaka
branches. Maybe we can backport this fix to Liberty EOL release?
Thanks for your help,
Jeremy Freudberg
On Mon, Aug 8, 2016 at 6:05 PM, Vitaly Gridnev <vgridnev at mirantis.com> wrote:
> Hello.
>
> I haven't saw issues like that. Can you explain more precisely how you are
> running job, what configs and arguments are passed, job binaries, and so on.
> Minimal example for reproducing this issue should be a best option.
>
> Moreover, this looks like a absolutely strange issue (if job was executed
> successfully). It means that launching command was successful (see [0]). But
> there is no copy operations after launching command.
>
> [0]
> https://github.com/openstack/sahara/blob/stable/liberty/sahara/service/edp/spark/engine.py#L339
>
> On Mon, Aug 8, 2016 at 11:02 PM, Jeremy Freudberg <jfreud at bu.edu> wrote:
>>
>> Hi all, I am experiencing a strange bug running jobs on Sahara (Red
>> Hat Liberty).
>>
>> When submitting a job to a Spark 1.3.1 cluster, I get the following
>> error immediately:
>>
>> 2016-08-08 15:56:09.546 20949 WARNING sahara.service.edp.job_manager
>> [req-fb5b47
>> 22-861a-4063-bc22-5e96b417376c ] [instance: none, job_execution:
>> ee747ffb-9be5-4
>> 5b0-aa0b-c719668a43aa] Can't run job execution (reason: '__deepcopy__')
>>
>> However, even though the job is marked as failed in Sahara API and
>> dashboard, the job still runs and succeeds on the cluster. (i.e. I see
>> the results in Swift/HDFS).
>>
>> I only experience this behavior on Spark clusters (no other plugins)
>> but it does affect all job types. (Even simple ones like Shell).
>>
>> Any help is greatly appreciated.
>>
>> Thanks,
>> Jeremy Freudberg
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openstack at lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
>
> --
> Best Regards,
> Vitaly Gridnev,
> Project Technical Lead of OpenStack DataProcessing Program (Sahara)
> Mirantis, Inc
More information about the Openstack
mailing list