Open Stack

Mon Aug 8 20:02:02 UTC 2016

Hi all, I am experiencing a strange bug running jobs on Sahara (Red
Hat Liberty).

When submitting a job to a Spark 1.3.1 cluster, I get the following
error immediately:

2016-08-08 15:56:09.546 20949 WARNING sahara.service.edp.job_manager [req-fb5b47
22-861a-4063-bc22-5e96b417376c ] [instance: none, job_execution: ee747ffb-9be5-4
5b0-aa0b-c719668a43aa] Can't run job execution (reason: '__deepcopy__')

However, even though the job is marked as failed in Sahara API and
dashboard, the job still runs and succeeds on the cluster. (i.e. I see
the results in Swift/HDFS).

I only experience this behavior on Spark clusters (no other plugins)
but it does affect all job types. (Even simple ones like Shell).

Any help is greatly appreciated.

Thanks,
Jeremy Freudberg

Open Stack

[Openstack] [Sahara] Jobs get marked as "failed" immediately on Spark cluster

OpenStack

Community

Documentation

Branding & Legal