@xtccc
2016-01-03T19:19:17.000000Z
字数 947
阅读 2457
SparkStreaming
如果一个Spark Streaming app driver host崩溃了,那么它就丢失所有已经收到但是还未被处理的数据。为了应对这种情况,Spark可以将它收到的数据写入到HDFS中,这样可以在Spark app崩溃时恢复数据。这个特性称为 Spark Streaming Recovery ,在CDH 5.4开始可以用于生产环境。
开启该特性的步骤:
sparkConf.set("spark.streaming.receiver.writeAheadLog.enable", "true")
def createContext(): StreamingContext = {
val conf = new SparkConf()
conf.set("spark.streaming.receiver.writeAheadLog.enable", "true")
val ssc = new StreamingContext(conf) // new cobtext
val kafkaStream = KafkaStream.createStream(...)
// Do some transformations on the stream ... and write it out etc
ssc.checkpoint(checkPointDirectory) // set checkpoint dir
ssc
}
// Get StreamingContext from checkpoing data or create a new one
val ssc = StreamingContext.getOrCreate(checkPointDirectory, createContext _)