[Openstack] [Sahara] Jobs get marked as "failed" immediately on Spark cluster

Jeremy Freudberg jfreud at bu.edu
Mon Aug 8 20:02:02 UTC 2016


Hi all, I am experiencing a strange bug running jobs on Sahara (Red
Hat Liberty).

When submitting a job to a Spark 1.3.1 cluster, I get the following
error immediately:

2016-08-08 15:56:09.546 20949 WARNING sahara.service.edp.job_manager [req-fb5b47
22-861a-4063-bc22-5e96b417376c ] [instance: none, job_execution: ee747ffb-9be5-4
5b0-aa0b-c719668a43aa] Can't run job execution (reason: '__deepcopy__')

However, even though the job is marked as failed in Sahara API and
dashboard, the job still runs and succeeds on the cluster. (i.e. I see
the results in Swift/HDFS).

I only experience this behavior on Spark clusters (no other plugins)
but it does affect all job types. (Even simple ones like Shell).

Any help is greatly appreciated.

Thanks,
Jeremy Freudberg




More information about the Openstack mailing list