Open Stack

Mon Aug 8 22:05:45 UTC 2016

Hello.

I haven't saw issues like that. Can you explain more precisely how you are
running job, what configs and arguments are passed, job binaries, and so
on. Minimal example for reproducing this issue should be a best option.

Moreover, this looks like a absolutely strange issue (if job was executed
successfully). It means that launching command was successful (see [0]).
But there is no copy operations after launching command.

[0]
https://github.com/openstack/sahara/blob/stable/liberty/sahara/service/edp/spark/engine.py#L339

On Mon, Aug 8, 2016 at 11:02 PM, Jeremy Freudberg <jfreud at bu.edu> wrote:

> Hi all, I am experiencing a strange bug running jobs on Sahara (Red
> Hat Liberty).
>
> When submitting a job to a Spark 1.3.1 cluster, I get the following
> error immediately:
>
> 2016-08-08 15:56:09.546 20949 WARNING sahara.service.edp.job_manager
> [req-fb5b47
> 22-861a-4063-bc22-5e96b417376c ] [instance: none, job_execution:
> ee747ffb-9be5-4
> 5b0-aa0b-c719668a43aa] Can't run job execution (reason: '__deepcopy__')
>
> However, even though the job is marked as failed in Sahara API and
> dashboard, the job still runs and succeeds on the cluster. (i.e. I see
> the results in Swift/HDFS).
>
> I only experience this behavior on Spark clusters (no other plugins)
> but it does affect all job types. (Even simple ones like Shell).
>
> Any help is greatly appreciated.
>
> Thanks,
> Jeremy Freudberg
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
> openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
> openstack
>

-- 
Best Regards,
Vitaly Gridnev,
Project Technical Lead of OpenStack DataProcessing Program (Sahara)
Mirantis, Inc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20160809/1af4d02b/attachment.html>

Open Stack

[Openstack] [Sahara] Jobs get marked as "failed" immediately on Spark cluster

OpenStack

Community

Documentation

Branding & Legal