[Openstack] [Sahara] Jobs get marked as "failed" immediately on Spark cluster
vgridnev at mirantis.com
Mon Aug 8 22:05:45 UTC 2016
I haven't saw issues like that. Can you explain more precisely how you are
running job, what configs and arguments are passed, job binaries, and so
on. Minimal example for reproducing this issue should be a best option.
Moreover, this looks like a absolutely strange issue (if job was executed
successfully). It means that launching command was successful (see ).
But there is no copy operations after launching command.
On Mon, Aug 8, 2016 at 11:02 PM, Jeremy Freudberg <jfreud at bu.edu> wrote:
> Hi all, I am experiencing a strange bug running jobs on Sahara (Red
> Hat Liberty).
> When submitting a job to a Spark 1.3.1 cluster, I get the following
> error immediately:
> 2016-08-08 15:56:09.546 20949 WARNING sahara.service.edp.job_manager
> 22-861a-4063-bc22-5e96b417376c ] [instance: none, job_execution:
> 5b0-aa0b-c719668a43aa] Can't run job execution (reason: '__deepcopy__')
> However, even though the job is marked as failed in Sahara API and
> dashboard, the job still runs and succeeds on the cluster. (i.e. I see
> the results in Swift/HDFS).
> I only experience this behavior on Spark clusters (no other plugins)
> but it does affect all job types. (Even simple ones like Shell).
> Any help is greatly appreciated.
> Jeremy Freudberg
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
> Post to : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
Project Technical Lead of OpenStack DataProcessing Program (Sahara)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Openstack