@rickyChen
2016-07-27T10:05:48.000000Z
字数 2104
阅读 3783
Spark
Hadoop 2.5.0-cdh5.3.2
Intellij IDEA 2016.3
java version "1.8.0_45"
scala 2.11.8
从官网拉取预编译版本spark-2.0.0-preview-bin-hadoop2.7.tgz到/usr/local目录下,解压并创建软链接
cd /usr/localwget http://mirror.bit.edu.cn/apache/spark/spark-2.0.0-preview/spark-2.0.0-preview-bin-hadoop2.7.tgztar -xvf spark-2.0.0-preview-bin-hadoop2.7.tgzln -s spark-2.0.0-preview-bin-hadoop2.7 spark
修改Spark配置文件,指定Hadoop Yarn路径
cd /usr/local/spark/confmv spark-default.conf.template spark-default.confmv spark-env.sh.template spark-env.shvim spark-env.shexport HADOOP_CONF_DIR=/etc/hadoop/conf
从http://search.maven.org/上拉取Spark Streaming连接Kafka Broker的相关依赖spark-streaming-kafka-0-8-assembly_2.11-2.0.0-preview.jar
cd /usr/loca/sparkbin/spark-submit --jars jars/spark-streaming-kafka-0-8-assembly_2.11-2.0.0-preview.jar examples/src/main/python/streaming/direct_kafka_wordcount.py broker Topic
1.4.1
<dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming-kafka_2.10</artifactId><version>1.4.1</version></dependency>
2.0.0-preview
<dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming-kafka-0-8_2.10</artifactId><version>2.0.0-preview</version></dependency>
trait Logging{}
private[spark] trait Logging{...}
原项目中继承org.apache.spark.Logging类,并自定义了日志打印方法。但是在2.0.0-preview版本中,org.apache.spark.Logging这个类并不存在,转移到org.apache.spark.internal.Logging。由上述源码可知,在2.0.0-preview版本中,这已经不是DeveloperApi了。
解决思路:
统一整个系统Scala、JDK版本,更换Spark包。
填坑过程:
1. 因为原来线上跑的Spark版本为spark-1.4.1-bin-hadoop2.4,从Spark官网拉取spark-2.0.0-preview-bin-hadoop2.4预编译版本,运行DirectKafkaWordCount.scala报上述错误,怀疑是Scala版本问题,于是统一系统Scala版本为2.11.8,JDK版本为1.8.0_45,任报上述错误。
2. 编写SparkPi Scala程序,打包上传服务器,运行验证Spark安装正确。
3. 运行官方案例direct_kafka_wordcount.py测试,错误依旧存在
bin/spark-submit examples/src/main/python/streaming/direct_kafka_wordcount.py broker topic
4. 拉取spark-2.0.0-preview-bin-hadoop2.7版本,错误消失,具体原因暂时不明。