[关闭]
@tsing1226 2016-01-15T16:42:40.000000Z 字数 2988 阅读 828

spark

Spark1.3.0部署及应用示例

Spark1.3.0搭建环境

安装JDK

  • 解压软件
tar -zxf jdk-7u67-linux-x64.tar.gz -C /opt/modules/
  • sudo vi /etc/profile
#JAVA_HOME
export JAVA_HOME=/opt/modules/jdk1.7.0_67
export PATH=$JAVA_HOME/bin:$PATH
  • source /etc/profile

  • 验证JDK是否成功安装

> $ java -version

java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

安装Scala(2.10.4)

  • 解压安装Scala
tar -zxf scala-2.10.4.tgz -C /opt/modules/
  • 配置环境变量
sudo vi /etc/profile
  • 添加
##SCALA_HOME
export SCALA_HOME=/opt/modules/scala-2.10.4
export PATH=$SCALA_HOME/bin:$PATH
  • 检测版本是否安装成功
scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL

安装Hadoop2.5.0

参考地址:https://www.zybuluo.com/tsing1226/note/255903

Spark1.3.0安装部署

解压

tar -zxf spark-1.3.0-bin-2.5.tar.gz -C /opt/cdh3.5.6/

配置环境变量

export SPARK_HOME=/opt/cdh3.5.6/spark-1.3.0-bin-2.5.0

配置文件

  • spark-defaults.conf
spark.master  spark://hadoop-senior02.grc.com:7077
  • slaves
hadoop-senior02.grc.com
  • spark-env.sh
JAVA_HOME=/opt/modules/jdk1.7.0_67
SCALA_HOME=/opt/modules/scala-2.10.4
HADOOP_CONF_DIR=/opt/cdh3.5.6/hadoop-2.5.0-cdh5.3.6/etc/hadoop
SPARK_MASTER_IP=hadoop-senior02.grc.com
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8080
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081
SPARK_WORKER_INSTANCES=1

启动

> sbin/start-master.sh 
> sbin/start-slaves.sh
  • 运行在standalone模式
./bin/spark-shell
  • 运行在本地模式
./bin/spark-shell --master local[k]

验证

> jps
> WebUI http://hadoop-senior02.grc.com:8080/

运行wordcount ON Spark

创建文件

>$ touch wordcount.txt

hadoop  mapreduce
yarn    spark
Hadoop  MapReduce
hello   like

创建文件目录

 bin/hdfs dfs -mkdir -p spark/wordcount/input

上传文件

bin/hdfs dfs -put /opt/datas/wordcount.txt spark/wordcount

获取文件

val rdd=sc.textFile("hdfs://hadoop-senior02.grc.com:8020/user/grc/spark/wordcount/input/wordcount.txt")

操作

val kvrdd=rdd.flatMap(line=>line.split("\t")).map(word=>(word,1)).reduceByKey((a,b)=>(a+b))

保存数据处理结果

 kvrdd.saveAsTextFile("hdfs://hadoop-senior02.grc.com:8020/user/grc/spark/wordcount/output")

查看数据处理结果

bin/hdfs dfs -text spark/wordcount/output/p*
(MapReduce,1)
(mapreduce,1)
(hello,1)
(yarn,1)
(spark,1)
(hadoop,1)
(like,1)
(Hadoop,1)

Spark history server

配置文件

  • 创建目录
bin/hdfs dfs -mkdir -p /user/grc/spark/logs
  • 服务器端配置spark-env.sh
SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://hadoop-senior02.grc.com:8020/user/grc/spark/logs"
  • 客户端配置spark-defaults.conf

spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop-senior02.grc.com:8020/user/grc/spark/logs
spark.eventLog.compress true

  • 启动Spark historyserver
./sbin/start-history-server.sh

运行测试

//create spark context
val rdd=sc.textFile("hdfs://hadoop-senior02.grc.com:8020//user/grc/spark/wordcount/input/wordcount.txt")

//rdd transformation

val kvrdd=rdd.flatMap(line=>line.split("\t")).map(word=>(word,1)).reduceByKey((a,b)=>(a+b))

//save file
 kvrdd.saveAsTextFile("hdfs://hadoop-senior02.grc.com:8020/user/grc/spark/wordcount/output")
//close spark context
sc.stop
  • 应用提交

应用提交

  • 运行测试

运行测试

Spark学习网站

> [https://databricks.com/spark/about](https://databricks.com/spark/about)
> [http://spark.apache.org/](http://spark.apache.org/)
> [https://github.com/apache/spark](http://spark.apache.org/)
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注