[关闭]
@w1992wishes 2018-11-20T16:00:47.000000Z 字数 7702 阅读 1171

【Spark】Spark 启动消息通信

spark


本篇结构:

一、前言

在介绍了 Spark RPC 框架的大致内容后,下面以 Standalone 运行模式分析下 Spark 启动过程中的通信。

启动过程中主要是 Master 和 Worker 之间的通信。首先 Worker 向 Master 发送注册消息,然后 Master 回复注册成功或者注册失败的消息,如果 Worker 收到注册成功的消息,就定时向 Master 发送心跳信息。

二、详细过程

一般启动 spark 集群是运行 start-all.sh 脚本,该脚本先启动 Master,后启动 Worker。

start-master.sh 最后会在 Master 节点运行 org.apache.spark.deploy.master.Master 类,start-slaves.sh 中最后会在各 slave 节点运行 org.apache.spark.deploy.worker.Worker。

下面分步骤分析启动过程中的通信。

2.1、Worker 向 Master 发送注册信息

  1. def main(argStrings: Array[String]) {
  2. Utils.initDaemon(log)
  3. val conf = new SparkConf
  4. val args = new WorkerArguments(argStrings, conf)
  5. val rpcEnv = startRpcEnvAndEndpoint(args.host, args.port, args.webUiPort, args.cores,
  6. args.memory, args.masters, args.workDir, conf = conf)
  7. rpcEnv.awaitTermination()
  8. }

从 Worker 的 main 方法看起,先 startRpcEnvAndEndpoint 启动 RpcEnv 和 Endpoint。

在 startRpcEnvAndEndpoint 中调用 RpcEnv.create(systemName, host, port, conf, securityMgr) 方法,前面有介绍 RpcEnv.create() 方法创建的是 NettyRpcEnv,在 NettyRpcEnv 创建时会初始化 Dispathcer,进而创建 Inbox,在 Inbox 初始化时会将 Onstart 消息放入 messages 中,然后 MessageLoop 会调用 inbox.process() 方法消费该消息,经过 match case 匹配,调用到 endpoint.onStart():

NettyRpcEnv -> Dispathcer -> EndpointData -> Inbox -> messages.add(OnStart) -> MessageLoop -> data.inbox.process(Dispatcher.this) -> Worker.onStart()

所以重点关注 onStart() 方法:

  1. override def onStart() {
  2. assert(!registered)
  3. logInfo("Starting Spark worker %s:%d with %d cores, %s RAM".format(
  4. host, port, cores, Utils.megabytesToString(memory)))
  5. logInfo(s"Running Spark version ${org.apache.spark.SPARK_VERSION}")
  6. logInfo("Spark home: " + sparkHome)
  7. createWorkDir()
  8. shuffleService.startIfEnabled()
  9. webUi = new WorkerWebUI(this, workDir, webUiPort)
  10. webUi.bind()
  11. workerWebUiUrl = s"http://$publicAddress:${webUi.boundPort}"
  12. registerWithMaster()
  13. metricsSystem.registerSource(workerSource)
  14. metricsSystem.start()
  15. // Attach the worker metrics servlet handler to the web ui after the metrics system is started.
  16. metricsSystem.getServletHandlers.foreach(webUi.attachHandler)
  17. }

和注册相关的在 registerWithMaster() 方法:

  1. private def registerWithMaster() {
  2. // onDisconnected may be triggered multiple times, so don't attempt registration
  3. // if there are outstanding registration attempts scheduled.
  4. registrationRetryTimer match {
  5. case None =>
  6. registered = false
  7. registerMasterFutures = tryRegisterAllMasters()
  8. connectionAttemptCount = 0
  9. registrationRetryTimer = Some(forwordMessageScheduler.scheduleAtFixedRate(
  10. new Runnable {
  11. override def run(): Unit = Utils.tryLogNonFatalError {
  12. Option(self).foreach(_.send(ReregisterWithMaster))
  13. }
  14. },
  15. INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS,
  16. INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS,
  17. TimeUnit.SECONDS))
  18. case Some(_) =>
  19. logInfo("Not spawning another attempt to register with the master, since there is an" +
  20. " attempt scheduled already.")
  21. }
  22. }

来重点关注 tryRegisterAllMasters() 方法:

  1. private def tryRegisterAllMasters(): Array[JFuture[_]] = {
  2. masterRpcAddresses.map { masterAddress =>
  3. registerMasterThreadPool.submit(new Runnable {
  4. override def run(): Unit = {
  5. try {
  6. logInfo("Connecting to master " + masterAddress + "...")
  7. val masterEndpoint = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME)
  8. registerWithMaster(masterEndpoint)
  9. } catch {
  10. case ie: InterruptedException => // Cancelled
  11. case NonFatal(e) => logWarning(s"Failed to connect to master $masterAddress", e)
  12. }
  13. }
  14. })
  15. }
  16. }

因为可能存在多个 Master(高HA),所以 Worker 需要向多个 Master 发送注册消息,Worker 在向多个 Master 发送注册消息时,采用了 registerMasterThreadPool 线程池的方式,线程池线程的数量等于 Master 的数量。

根据代码可知,在注册过程中,先获取 Master 的引用 val masterEndpoint = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME),接着调用 registerWithMaster(masterEndpoint) 方法:

  1. private def registerWithMaster(masterEndpoint: RpcEndpointRef): Unit = {
  2. masterEndpoint.ask[RegisterWorkerResponse](RegisterWorker(
  3. workerId, host, port, self, cores, memory, workerWebUiUrl))
  4. .onComplete {
  5. // This is a very fast action so we can use "ThreadUtils.sameThread"
  6. case Success(msg) =>
  7. Utils.tryLogNonFatalError {
  8. handleRegisterResponse(msg)
  9. }
  10. case Failure(e) =>
  11. logError(s"Cannot register with master: ${masterEndpoint.address}", e)
  12. System.exit(1)
  13. }(ThreadUtils.sameThread)
  14. }

registerWithMaster 方法中,通过 master 节点引用,调用其 ask 方法,该方法的流程在前面的文章有分析,ask 注册信息,等待回复,如果成功进入 handleRegisterResponse 方法,如果失败,记录日志并退出。

2.2、Master 接收 Worker 的注册消息

上篇关于 Spark Rpc 框架一文分析了 ask 的时序图,Worker 的注册消息会通过 TransportClient 发送出去,Master 接收到消息后最后会来到 Dispatcher 的 postMessage,将消息丢到 Inbox 中,由 Inbox.process() 方法进行处理,该方法中经过 match case 匹配会调用到 Master 的 receiveAndReply() 方法中:

  1. override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
  2. case RegisterWorker(
  3. id, workerHost, workerPort, workerRef, cores, memory, workerWebUiUrl) =>
  4. logInfo("Registering worker %s:%d with %d cores, %s RAM".format(
  5. workerHost, workerPort, cores, Utils.megabytesToString(memory)))
  6. if (state == RecoveryState.STANDBY) {
  7. context.reply(MasterInStandby)
  8. } else if (idToWorker.contains(id)) {
  9. context.reply(RegisterWorkerFailed("Duplicate worker ID"))
  10. } else {
  11. val worker = new WorkerInfo(id, workerHost, workerPort, cores, memory,
  12. workerRef, workerWebUiUrl)
  13. if (registerWorker(worker)) {
  14. persistenceEngine.addWorker(worker)
  15. context.reply(RegisteredWorker(self, masterWebUiUrl))
  16. schedule()
  17. } else {
  18. val workerAddress = worker.endpoint.address
  19. logWarning("Worker registration failed. Attempted to re-register worker at same " +
  20. "address: " + workerAddress)
  21. context.reply(RegisterWorkerFailed("Attempted to re-register worker at same address: "
  22. + workerAddress))
  23. }
  24. }
  25. ...
  26. }

分析上面的代码,先判断 Master 是否是 STANDBY 状态,reply MasterInStandby 消息,如果注册列表发现已经有该 Worker,则回复 RegisterWorkerFailed 注册失败的消息。

正常情况下则通过 registerWorker 方法将 worker 加入到 workers 中,然后 Master 向 Worker 回复 RegisteredWorker 注册成功的消息。

下面是 registerWorker() 方法代码:

  1. private def registerWorker(worker: WorkerInfo): Boolean = {
  2. // There may be one or more refs to dead workers on this same node (w/ different ID's),
  3. // remove them.
  4. workers.filter { w =>
  5. (w.host == worker.host && w.port == worker.port) && (w.state == WorkerState.DEAD)
  6. }.foreach { w =>
  7. workers -= w
  8. }
  9. val workerAddress = worker.endpoint.address
  10. if (addressToWorker.contains(workerAddress)) {
  11. val oldWorker = addressToWorker(workerAddress)
  12. if (oldWorker.state == WorkerState.UNKNOWN) {
  13. // A worker registering from UNKNOWN implies that the worker was restarted during recovery.
  14. // The old worker must thus be dead, so we will remove it and accept the new worker.
  15. removeWorker(oldWorker)
  16. } else {
  17. logInfo("Attempted to re-register worker at same address: " + workerAddress)
  18. return false
  19. }
  20. }
  21. workers += worker
  22. idToWorker(worker.id) = worker
  23. addressToWorker(workerAddress) = worker
  24. if (reverseProxy) {
  25. webUi.addProxyTargets(worker.id, worker.webUiAddress)
  26. }
  27. true
  28. }

2.3、Worker 接收到 Master 注册成功的消息后,定时发送心跳

  1. private def registerWithMaster(masterEndpoint: RpcEndpointRef): Unit = {
  2. masterEndpoint.ask[RegisterWorkerResponse](RegisterWorker(
  3. workerId, host, port, self, cores, memory, workerWebUiUrl))
  4. .onComplete {
  5. // This is a very fast action so we can use "ThreadUtils.sameThread"
  6. case Success(msg) =>
  7. Utils.tryLogNonFatalError {
  8. handleRegisterResponse(msg)
  9. }
  10. case Failure(e) =>
  11. logError(s"Cannot register with master: ${masterEndpoint.address}", e)
  12. System.exit(1)
  13. }(ThreadUtils.sameThread)
  14. }
  15. private def handleRegisterResponse(msg: RegisterWorkerResponse): Unit = synchronized {
  16. msg match {
  17. case RegisteredWorker(masterRef, masterWebUiUrl) =>
  18. logInfo("Successfully registered with master " + masterRef.address.toSparkURL)
  19. registered = true
  20. changeMaster(masterRef, masterWebUiUrl)
  21. forwordMessageScheduler.scheduleAtFixedRate(new Runnable {
  22. override def run(): Unit = Utils.tryLogNonFatalError {
  23. self.send(SendHeartbeat)
  24. }
  25. }, 0, HEARTBEAT_MILLIS, TimeUnit.MILLISECONDS)
  26. if (CLEANUP_ENABLED) {
  27. logInfo(
  28. s"Worker cleanup enabled; old application directories will be deleted in: $workDir")
  29. forwordMessageScheduler.scheduleAtFixedRate(new Runnable {
  30. override def run(): Unit = Utils.tryLogNonFatalError {
  31. self.send(WorkDirCleanup)
  32. }
  33. }, CLEANUP_INTERVAL_MILLIS, CLEANUP_INTERVAL_MILLIS, TimeUnit.MILLISECONDS)
  34. }
  35. val execs = executors.values.map { e =>
  36. new ExecutorDescription(e.appId, e.execId, e.cores, e.state)
  37. }
  38. masterRef.send(WorkerLatestState(workerId, execs.toList, drivers.keys.toSeq))
  39. case RegisterWorkerFailed(message) =>
  40. if (!registered) {
  41. logError("Worker registration failed: " + message)
  42. System.exit(1)
  43. }
  44. case MasterInStandby =>
  45. // Ignore. Master not yet ready.
  46. }
  47. }
  48. // Send a heartbeat every (heartbeat timeout) / 4 milliseconds
  49. private val HEARTBEAT_MILLIS = conf.getLong("spark.worker.timeout", 60) * 1000 / 4

由上代码可知,定时发送心跳的代码是由 HEARTBEAT_MILLIS 决定,可由 spark.worker.timeout 参数控制,默认是其值的 1/4。

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注