(版本定制)第18課：SparkStreaming中空RDD處理及流處理程序優雅的停止

本期內容：

成都創新互聯公司專注于麗水網站建設服務及定制，我們擁有豐富的企業做網站經驗。熱誠為您提供麗水營銷型網站建設，麗水網站制作、麗水網頁設計、麗水網站官網定制、微信平臺小程序開發服務，打造麗水網絡公司原創品牌,更為您提供麗水網站排名全網營銷落地服務。

1. Spark Streaming中RDD為空處理

2. Streaming Context程序停止方式

Spark Streaming運用程序是根據我們設定的Batch Duration來產生RDD，產生的RDD存在partitons數據為空的情況，但是還是會執行foreachPartition，會獲取計算資源，然后計算一下，這種情況就會浪費

集群計算資源，所以需要在程序運行的時候進行過濾，參考如下代碼：

package com.dt.spark.sparkstreaming
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}

object OnlineForeachRDD2DB {
  def main(args: Array[String]){
val conf = new SparkConf() //創建SparkConf對象
  conf.setAppName("OnlineForeachRDD2DB") //設置應用程序的名稱，在程序運行的監控界面可以看到名稱
  conf.setMaster("spark://Master:7077") //此時，程序在Spark集群
  /**
* 設置batchDuration時間間隔來控制Job生成的頻率并且創建Spark Streaming執行的入口
*/
  val ssc = new StreamingContext(conf, Seconds(300))
  val lines = ssc.socketTextStream("Master", 9999)
  val words = lines.flatMap(line => line.split(" "))
  val wordCounts = words.map(word => (word,1)).reduceByKey(_ + _)
  wordCounts.foreachRDD{ rdd =>
/**
* 例如：rdd為空，rdd為空會產生什么問題呢？
* rdd沒有任何元素，但是也會做做foreachPartition，也會進行寫數據庫的操作或者把數據寫到HDFS上，
*   rdd里面沒有任何記錄，但是還會獲取計算資源，然后計算一下，消耗計算資源，這個時候純屬浪費資源，
* 所以必須對空rdd進行處理；

* 例如：使用rdd.count()>0，但是rdd.count()會觸發一個Job；

* 使用rdd.isEmpty()的時候，take也會觸發Job；

* def isEmpty(): Boolean = withScope {

* partitions.length == 0 || take(1).length == 0

* }

*
* rdd.partitions.isEmpty里判斷的是length是否等于0，就代表是否有partition
* def isEmpty: Boolean = { length == 0 }
* 注：rdd.isEmpty()和rdd.partitions.isEmpty是兩種概念；
*/

//
if(rdd.partitions.length > 0) {
rdd.foreachPartition{ partitonOfRecord =>
if(partitionOfRecord.hasNext） // 判斷下partition中是否存在數據

{

   val connection = ConnectionPool.getConnection()
partitonOfRecord.foreach(record => {
  val sql = "insert into streaming_itemcount(item,rcount) values('" + record._1 + "'," + record._2 + ")"
  val stmt = connection.createStatement()
  stmt.executeUpdate(sql)
  stmt.close()
})
  ConnectionPool.returnConnection(connection)
}

}

}
}

ssc.start()
ssc.awaitTermination()
}
}

二、SparkStreaming程序停止方式

第一種是不管接受到數據是否處理完成，直接被停止掉。

第二種是接受到數據全部處理完成才停止掉，一般采用第二種方式。

第一種停止方式：

/**
* Stop the execution of the streams immediately (does not wait for all received data
* to be processed). By default, if `stopSparkContext` is not specified, the underlying
* SparkContext will also be stopped. This implicit behavior can be configured using the
* SparkConf configuration spark.streaming.stopSparkContextByDefault.
*
* 把streams的執行直接停止掉(并不會等待所有接受到的數據處理完成)，默認情況下SparkContext也會被停止掉，
* 隱式的行為可以做配置，配置參數為spark.streaming.stopSparkContextByDefault。
*
* @param stopSparkContext If true, stops the associated SparkContext. The underlying SparkContext
* will be stopped regardless of whether this StreamingContext has been
* started.
*/
def stop(stopSparkContext: Boolean = conf.getBoolean("spark.streaming.stopSparkContextByDefault", true)
): Unit = synchronized {
stop(stopSparkContext, false)

}

第二種停止方式：

/**
* Stop the execution of the streams, with option of ensuring all received data
* has been processed.
*

* 所有接受到的數據全部被處理完成，才把streams的執行停止掉

*
* @param stopSparkContext if true, stops the associated SparkContext. The underlying SparkContext
* will be stopped regardless of whether this StreamingContext has been
* started.
* @param stopGracefully if true, stops gracefully by waiting for the processing of all
* received data to be completed
*/
def stop(stopSparkContext: Boolean, stopGracefully: Boolean): Unit = {
var shutdownHookRefToRemove: AnyRef = null
if (AsynchronousListenerBus.withinListenerThread.value) {
  throw new SparkException("Cannot stop StreamingContext within listener thread of" +
" AsynchronousListenerBus")
}
synchronized {
  try {
state match {
  case INITIALIZED =>
logWarning("StreamingContext has not been started yet")
  case STOPPED =>
logWarning("StreamingContext has already been stopped")
  case ACTIVE =>
scheduler.stop(stopGracefully)
// Removing the streamingSource to de-register the metrics on stop()
env.metricsSystem.removeSource(streamingSource)
uiTab.foreach(_.detach())
StreamingContext.setActiveContext(null)
waiter.notifyStop()
if (shutdownHookRef != null) {
shutdownHookRefToRemove = shutdownHookRef
shutdownHookRef = null
}
logInfo("StreamingContext stopped successfully")
}
} finally {
// The state should always be Stopped after calling `stop()`, even if we haven't started yet
state = STOPPED
}
}
if (shutdownHookRefToRemove != null) {
ShutdownHookManager.removeShutdownHook(shutdownHookRefToRemove)
}
// Even if we have already stopped, we still need to attempt to stop the SparkContext because
// a user might stop(stopSparkContext = false) and then call stop(stopSparkContext = true).
if (stopSparkContext) sc.stop()
}

本文題目：(版本定制)第18課：SparkStreaming中空RDD處理及流處理程序優雅的停止
文章出自：http://m.newbst.com/article32/pjchsc.html

成都網站建設公司_創新互聯，為您提供網站建設、自適應網站、微信小程序、域名注冊、做網站、網站收錄

聲明：本網站發布的內容（圖片、視頻和文字）以用戶投稿、用戶轉載內容為主，如果涉及侵權請盡快告知，我們將會在第一時間刪除。文章觀點不代表本網站立場，如需處理請聯系客服。電話：028-86922220；郵箱：631063699@qq.com。內容未經允許不得轉載，或轉載時需注明來源：創新互聯

猜你還喜歡下面的內容

免费观看又色又爽又黄的小说免费_美女福利视频国产片_亚洲欧美精品_美国一级大黄大色毛片

(版本定制)第18課：SparkStreaming中空RDD處理及流處理程序優雅的停止