@xtccc 2016-01-03T11:19:17.000000Z 字数 947 阅读 2710

Spark Streaming

SparkStreaming

Fault-Tolerant Processing

如果一个Spark Streaming app driver host崩溃了，那么它就丢失所有已经收到但是还未被处理的数据。为了应对这种情况，Spark可以将它收到的数据写入到HDFS中，这样可以在Spark app崩溃时恢复数据。这个特性称为 Spark Streaming Recovery ，在CDH 5.4开始可以用于生产环境。

开启该特性的步骤：

设置“spark.streaming.receiver.writeAheadLog.enable”

sparkConf.set("spark.streaming.receiver.writeAheadLog.enable", "true")

使用上述的“SparkConf”创建一个“StreamingContext”实例，并指定一个checkpoint目录
调用“StreamingContext”的“getOrCreate”方法来创建一个新的“StreamingContext”实例，或者是基于checkpoint目录恢复原有的“StreamingContext”实例

def createContext(): StreamingContext = {
    val conf = new SparkConf()
    conf.set("spark.streaming.receiver.writeAheadLog.enable", "true")
    val ssc = new StreamingContext(conf) // new cobtext
    val kafkaStream = KafkaStream.createStream(...)
    // Do some transformations on the stream ... and write it out etc 
    ssc.checkpoint(checkPointDirectory) // set checkpoint dir
    ssc
}
// Get StreamingContext from checkpoing data or create a new one
val ssc = StreamingContext.getOrCreate(checkPointDirectory, createContext _)

内容目录

- - AWS 4
  - Access/Secrete Key
  - Spot Fleet Instance
  - Elastic Load Balancer
  - S3
- - Akka 10
  - 常见问题
  - Configs
  - Logging
  - 异常
  - Long Running Job / Blocking
  - Dispatcher
  - Akka Cluster
  - 消息
  - Actor 生命周期
  - Akka 第一课
- - Boot 1
  - Spring Boot Tutorial
- - Cassandra 6
  - Spark and Cassandra
  - Q & A
  - Deployment
  - Operations
  - Architecture
  - CQL
- - Cloudera 3
  - 升级CDH（Using Local Repo）
  - 迁移Cloudera Manager Server 至另一个节点
  - Cloudera运维常见问题
- - Database 5
  - Prepared Statements
  - MySQL skills
  - MySQL tuning
  - Slick
  - MySQL使用
- - ElasticSearch 5
  - Indexing, updating, and deleting data
  - Query
  - Documents
  - Mapping, Index and Analyzers
  - 概念、部署、运行
- - English 12
  - 2018年07/08/09月
  - 2018年04/05/06月
  - 2018年03月
  - 2018年02月
  - 2018年01月
  - 2017年11月
  - 2017年09月
  - 2017年08月
  - 2017年07月
  - 2017年06月
  - 2017年05月
  - 2017年04月
- - Gradle 11
  - Q&A
  - 发布artifact到repo
  - 自定义插件
  - Distribution
  - shadow/shade
  - 生命周期
  - 多项目构建
  - 混合编译Java/Scala代码
  - Wrapper (gradlew)
  - Dependencies
  - Gradle Tasks
- - HBase 3
  - HBase Architecture
  - HBase运维问题
  - HBase应用的常见异常
- - HDFS 3
  - Custom File Input Format
  - HA
  - 文件权限（含ACL）
- - Java 8
  - 一些奇怪的异常
  - Reflections
  - Java on MAC
  - GC经验之谈
  - Thread
  - 使用Eclipse Memory Analyzer
  - JVM内存模型
  - SLF4J的使用
- - Kafka 4
  - Consumer API
  - Producer API
  - Q&A
  - Kafka基础
- - Kerberos 6
  - Disabling Kerberos for CDH
  - HBase Authentication
  - 为CDH 5集群添加Kerberos支持
  - Kerberos 配置
  - Understanding Kerberos
  - Installing Kerberos
- - Kryo 1
  - __Kryo Serializer__
- - Linux 7
  - systemd实现自启动
  - 安装FileBeat客户端
  - 发送邮件
  - 远程登录的工作
  - Custom YUM Repo
  - Linux常见问题
  - 在Linux中创建自己的Service
- - Maven 2
  - 用Assembly构建发布包
  - Maven常见问题
- - NLP 1
  - 中文分词
- - Oozie 5
  - REST API For Oozie
  - 通过HUE运行Workflow
  - 构建和安装
  - Workflow Examples
  - Oozie入门
- - Phoenix 8
  - Phoenix客户端
  - Q&A
  - Build Phoenix Against HBase 1.0 (CDH 5.4.7)
  - JDBC SQL
  - Bulk CSV Data Loading
  - Index
  - Table and View
  - Introduction to Apache Phoenix
- - RabbitMQ 4
  - Clustering and HA
  - 最佳实践
  - 消息处理
  - 基础入门知识
- - Redis 2
  - Commands
  - Redis运维
- - Scala 12
  - Serialization / Deserialization
  - Boot / Loading / Runtime
  - Concurrency & Synchronization
  - 执行Shell Command / Scripts
  - Self Type Annotation
  - Actor
  - 容器
  - implicit 关键字
  - Iterable & Iterator
  - 自定义三目运算符
  - Generic (范型)
  - Scala与Java容器类型的转换
- - Shell 5
  - 返回值
  - 处理String
  - start.sh, stop.sh, status.sh
  - 处理Script的参数
  - 多行注释
- - Spark 4
  - Shuffle
  - Errors and Exception
  - Runnning Spark On YARN
  - Serialization
- - SparkStreaming 1
  - Spark Streaming
- - Spring 1
  - Spring Boot Tutorial
- - YARN 2
  - YARN Architecture
  - YARN 使用
- - ZooKeeper 3
  - ZK入门
  - ZooKeeper运维实践
  - ACL
- - 开发工具 3
  - CURL
  - Git
  - IntelliJ
- - 开发技巧 9
  - 使用SIGAR监控系统资源
  - Pooling
  - Logging
  - 正则表达式
  - Linux使用问题
  - 其他
  - YAML解析
  - JSON
  - base64编码与解码
- - 推荐系统 3
  - 基本的指标概念
  - 利用用户的行为数据
  - 基于领域的协同过滤算法： UserCF and ItemCF
- - 数据挖掘&机器学习 1
  - Decision Tree
- - 算法 1
  - 寻找近似Quantiles
- - 未分类 1
  - 解析configuration文件
- 以下【标签】将用于标记这篇文稿：

添加新批注

在作者公开此批注前，只有你和作者可见。

私有
公开
删除

回复批注