@tsing1226
2016-01-14T11:00:55.000000Z
字数 4656
阅读 1307
spark
- 解压软件
tar -zxf jdk-7u67-linux-x64.tar.gz -C /opt/modules/
- sudo vi /etc/profile
#JAVA_HOME
export JAVA_HOME=/opt/modules/jdk1.7.0_67
export PATH=$JAVA_HOME/bin:$PATH
source /etc/profile
验证JDK是否成功安装
$java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
- 解压安装
tar -zxf apache-maven-3.0.5-bin.tar.gz -C /opt/modules/
- sudo vi /etc/profile
## MAEVN_HOME
export MAVEN_HOME=/opt/modules/apache-maven-3.0.5
export PATH=$MAVEN_HOME/bin:$PATH
source /etc/profile
mvn -version
Apache Maven 3.0.5 (r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 05:51:28-0800)
Maven home: /opt/modules/apache-maven-3.0.5
Java version: 1.7.0_67, vendor: Oracle Corporation
Java home: /opt/modules/jdk1.7.0_67/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-358.el6.x86_64", arch: "amd64", family: "unix"
maven中央仓库的配置
- vim /opt/modules/apache-maven-3.0.5/conf/settings.xml
<mirror>
<id>nexus-osc</id>
<mirrorOf>*</mirrorOf>
<name>Nexus osc</name>
<url>http://maven.oschina.net/content/groups/public/</url>
</mirror>
- 解压安装Scala
tar -zxf scala-2.10.4.tgz -C /opt/modules/
- 配置环境变量
sudo vi /etc/profile
- 添加
##SCALA_HOME
export SCALA_HOME=/opt/modules/scala-2.10.4
export PATH=$SCALA_HOME/bin:$PATH
- 检测版本是否安装成功
scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
下载地址:spark-1.3.0.tar.gz
tar -zxf spark-1.3.0.tar.gz -C /opt/modules/
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=1024M -XX:ReservedCodeCacheSize=1024m"
build/mvn clean package -DskipTests -Dhadoop-2.4 -Dhadoop.version=2.5.0 -Pyarn -Phive-0.13.1 -Phive-thriftserver
[WARNING] The requested profile "hadoop-2.5.0" could not be activated because it does not exist.
[ERROR] Failed to execute goal on project spark-streaming-mqtt_2.10: Could not resolve dependencies for project org.apache.spark:spark-streaming-mqtt_2.10:jar:1.3.0: Failure to find org.eclipse.paho:org.eclipse.paho.client.mqttv3:jar:1.0.1 in http://maven.oschina.net/content/groups/public/ was cached in the local repository, resolution will not be reattempted until the update interval of nexus-osc has elapsed or updates are forced -[Help 1]
放在/home/{user.name}/.m2/repository/org/apache/spark/spark-parent_2.10/1.3.0
/home/{user.name}/.m2/repository/org/eclipse/paho/org.eclipse.paho.client.mqttv3/1.0.2(1.0.1版本不能下载,我们选择1.0.2版本)
- 修改pom.xml文件(spark-1.3.0-src/external/mqtt下)
<dependency>
<groupId>org.eclipse.paho</groupId>
<artifactId>org.eclipse.paho.client.mqttv3</artifactId>
<version>1.0.2</version>
</dependency>
我们编译的各个组件的版本选择hadoop各个组件的版本,对pom.xml文件修改对应的版本
<java.version>1.7</java.version>
<sbt.project.name>spark</sbt.project.name>
<scala.macros.version>2.0.1</scala.macros.version>
<hadoop.version>2.5.0</hadoop.version>
<protobuf.version>2.5.0</protobuf.version>
<yarn.version>${hadoop.version}</yarn.version>
<hbase.version>0.98.6-hadoop2</hbase.version>
<hbase.artifact>hbase</hbase.artifact>
<flume.version>1.5.0</flume.version>
<zookeeper.version>3.4.5</zookeeper.version>
<hive.group>org.spark-project.hive</hive.group>
<!-- Version used in Maven Hive dependency -->
<hive.version>0.13.1a</hive.version>
<!-- Version used for internal directory structure -->
<hive.version.short>0.13.1</hive.version.short>
<scala.version>2.10.4</scala.version>
<scala.binary.version>2.10</scala.binary.version>
<snappy.version>1.1.1</snappy.version>
编译完源代码后,虽然直接用编译后的目录再加以配置就可以运行spark,但是这时目录很庞大,部署起来很不方便,所以需要生成部署包。spark源码根目录下带有一个脚本文件make-distribution.sh可以生成部署包,其参数有:
--hadoop VERSION:打包时所用的Hadoop版本号,不加此参数时hadoop版本为1.0.4。
--with-yarn:是否支持Hadoop YARN,不加参数时为不支持yarn。
--with-hive:是否在Spark SQL 中支持hive,不加此参数时为不支持hive。
- --skip-java-test:是否在编译的过程中略过java测试,不加此参数时为略过。
- --with-tachyon:是否支持内存文件系统Tachyon,不加此参数时不支持tachyon。
- --tgz:在根目录下生成
spark-$VERSION-bin.tgz
,不加此参数时不生成tgz文件,只生成/dist目录。- --name NAME:和--tgz结合可以生成
spark-$VERSION-bin-$NAME.tgz
的部署包,不加此参数时NAME为hadoop的版本号。
如果要生成spark支持yarn、hadoop2.2.0的部署包,只需要将源代码复制到指定目录,进入该目录后运行:
./make-distribution.sh --hadoop 2.2.0 --with-yarn --tgz
如果要生成spark支持yarn、hive的部署包,只需要将源代码复制到指定目录,进入该目录后运行
./make-distribution.sh --hadoop 2.2.0 --with-yarn --with-hive --tgz
如果要生成spark支持yarn、hadoop2.2.0、techyon的部署包,只需要将源代码复制到指定目录,进入该目录后运行
./make-distribution.sh --hadoop 2.2.0 --with-yarn --with-tachyon --tgz
生成在部署包位于根目录下,文件名类似于spark-1.0.0-bin-2.2.0.tgz。
值得注意的是:make-distribution.sh已经带有Maven编译过程,所以不需要先编译再打包。
首先修改make-distribution.sh各个组件的版本号:
VERSION=1.3.0
SPARK_HADOOP_VERSION=2.5.0
SPARK_HIVE=1
SCALA_VERSION=2.10.4
所以我们根据自己版本生成部署包:
./make-distribution.sh --tgz -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.0 -Phive -Phive-thriftserver -Phive-0.13.1
生成在部署包位于根目录下,文件名为:spark-1.3.0-bin-2.5.0.tgz
1、http://spark.apache.org/docs/1.3.0/building-spark.html
2、http://www.aboutyun.com/thread-8398-1-1.html