[openstack-dev] [sahara] Spark CDH followup and questions related to DIB
Trevor McKay
tmckay at redhat.com
Mon Feb 2 18:04:32 UTC 2015
Hello all,
I tried a Spark image with the cdh5 element Daniele describes below,
but it did not fix the jackson version issue. The spark assembly still
depends on inconsistent versions.
Looking into the spark git a little bit more, I discovered that in the
cdh5-1.2.0_5.3.0 branch the jackson version is settable. I built spark
on this branch with jackson 1.9.13 and was able to run Spark EDP without
any classpath manipulations. But, it doesn't appear to be released yet.
A couple questions come out of this:
1) When do we move to cdh5.3 for spark images? Do we try to do this in
Kilo?
The work is already started, as noted below. Daniele has done initial
work using cdh5 for the spark plugin and the Intel folks are working on
cdh5 and cdh5.3 for the CDH plugin.
2) Do we carry a Spark assembly for Sahara ourselves, or wait for a
release tarball from CDH that uses this branch and sets a consistent
jackson version?
I asked about any plans to release a tarball from this
branch on the apache spark users list, waiting for a response.
One alternative is for us to host our own spark build that we can use in
sahara-image-elements. The other idea is for us to wait for a release
tarball at http://archive.apache.org/dist/spark/ and continue to use the
classpath workaround in spark EDP for the time being.
3) Do we fix up sahara-image-elements to support multiple spark
versions?
Historically sahara-image-elements only supports a single version for
spark images. This is different from the other plugins. Since we have
agreed to carry support for a release cycle of older versions after
introducing a new one, should we support both cdh4 and cdh5.x? This will
require changes in diskimage_create.sh.
4) Like #3, do we fix up the spark plugin in Sahara to handle multiple
versions? This is similar to the work the Intel folks are doing now to
separate cdh5 and cdh5.3 code in the cdh plugin.
I am wondering if the above 4 issues result in too much work to add to
kilo-3. Do we make an incremental improvement over Juno, having
spark-swift integration in EDP on cdh4 but without other changes and
address the above issues in L, or do we push on and try to resolve it
all for Kilo?
Best regards,
Trevor
On Wed, 2015-01-28 at 11:57 -0500, Trevor McKay wrote:
> Daniele,
>
> Excellent! I'll have to keep a closer eye on bigfoot activity :) I'll
> pursue this.
>
> Best,
>
> Trevor
>
> On Wed, 2015-01-28 at 17:40 +0100, Daniele Venzano wrote:
> > Hello everyone,
> >
> > there is already some code in our repository:
> > https://github.com/bigfootproject/savanna-image-elements
> >
> > I did the necessary changes to have the Spark element use the cdh5
> > element. I updated also to Spark 1.2. The old cloudera HDFS-only
> > element is still needed for generating cdh4 images (but probably cdh4
> > support can be thrown away).
> >
> > Unfortunately I do not have the time to do the necessary
> > testing/validation and submit for review. I also changed the CDH
> > element so that it can install only HDFS, if so required.
> > The changes I made are simple and all contained in the last commit on
> > the master branch of that repo.
> >
> > The image generated with this code runs in Sahara without any further
> > changes. Feel free to take the code, clean it up and submit for review.
> >
> > Dan
> >
> > On Wed, Jan 28, 2015 at 10:43:30AM -0500, Trevor McKay wrote:
> > > Intel folks,
> > >
> > > Belated welcome to Sahara! Thank you for your recent commits.
> > >
> > > Moving this thread to openstack-dev so others may contribute, cc'ing
> > > Daniele and Pietro who pioneered the Spark plugin.
> > >
> > > I'll respond with another email about Oozie work, but I want to
> > > address the Spark/Swift issue in CDH since I have been working
> > > on it and there is a task which still needs to be done -- that
> > > is to upgrade the CDH version in the spark image and see if
> > > the situation improves (see below)
> > >
> > > Relevant reviews are here:
> > >
> > > https://review.openstack.org/146659
> > > https://review.openstack.org/147955
> > > https://review.openstack.org/147985
> > > https://review.openstack.org/146659
> > >
> > > In the first review, you can see that we set an extra driver
> > > classpath to pull in '/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar.
> > >
> > > This is because the spark-assembly JAR in CDH4 contains classes from
> > > jackson-mapper-asl-1.8.8 and jackson-core-asl-1.9.x. When the
> > > hadoop-swift.jar dereferences a Swift path, it calls into code
> > > from jackson-mapper-asl-1.8.8 which uses JsonClass. But JsonClass
> > > was removed in jackson-core-asl-1.9.x, so there is an exception.
> > >
> > > Therefore, we need to use the classpath to either upgrade the version of
> > > jackson-mapper-asl to 1.9.x or downgrade the version of jackson-core-asl
> > > to 1.8.8 (both work in my testing). However, the first of these options
> > > requires us to bundle an extra jar. Since /usr/lib/hadoop already
> > > contains jackson-core-asl-1.8.8, it is easier to just add that to the
> > > classpath and downgrade the jackson version.
> > >
> > > Note, there are some references to this problem on the spark mailing list,
> > > we are not the only ones to encounter it.
> > >
> > > However, I am not completely comfortable with mixing versions and
> > > patching the classpath this way. It looks to me like the Spark assembly
> > > used in CDH5 has consistent versions, and I would like to try updating
> > > the CDH version in sahara-image-elments to CDH5 for Spark. If this fixes
> > > the problem and removes the need for the extra classpath, that would be
> > > great.
> > >
> > > Would someone like to take on this change? (modifying sahara-image-elements
> > > to use CDH5 for Spark images) I can make a blueprint for
> > > it.
> > >
> > > More to come about Oozie topics.
> > >
> > > Best regards,
> > >
> > > Trevor
> > >
> > > On Thu, 2015-01-15 at 15:34 +0000, Chen, Weiting wrote:
> > > > Hi Mckay.
> > > >
> > > >
> > > >
> > > > We are Intel team and contributing OpenStack Sahara project.
> > > >
> > > > We are new in Sahara and would like to do more contributions in this
> > > > project.
> > > >
> > > > So far, we are focusing on Sahara CDH Plugin.
> > > >
> > > > So if there is any issues related on this, please feel free to discuss
> > > > with us.
> > > >
> > > >
> > > >
> > > > During IRC meeting, there are two issues you mentioned and we would
> > > > like to discuss with you.
> > > >
> > > > 1. Oozie Workflow Support:
> > > >
> > > > Do you have any plan could share with us about your idea?
> > > >
> > > > Because in our case, we are testing to run a java action job with
> > > > HBase library support and also facing some problems about Oozie
> > > > support.
> > > >
> > > > So it should be good to share the experience with each other.
> > > >
> > > >
> > > >
> > > > 2. Spark CDH Issues:
> > > >
> > > > Could you provide more information about this issue? In CDH Plugin, we
> > > > have used CDH 5 to finish swift test. So it should be fine to upgrade
> > > > CDH 4 to 5.
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list