锘??xml version="1.0" encoding="utf-8" standalone="yes"?>
濡備笅鍥炬墍紺轟負Spark鐨勬暣涓敓鎬佸湀錛屾渶搴曞眰涓鴻祫婧愮鐞嗗櫒錛岄噰鐢∕esos銆乊arn絳夎祫婧愮鐞嗛泦緹ゆ垨鑰匰park 鑷甫鐨凷tandalone妯″紡錛屽簳灞傚瓨鍌ㄤ負鏂囦歡緋葷粺鎴栬呭叾浠栨牸寮忕殑瀛樺偍緋葷粺濡侶Base銆係park浣滀負璁$畻妗嗘灦錛屼負涓婂眰澶氱搴旂敤鎻愪緵鏈嶅姟銆?Graphx鍜孧LBase鎻愪緵鏁版嵁鎸栨帢鏈嶅姟錛屽鍥捐綆楀拰鎸栨帢榪唬璁$畻絳夈係hark鎻愪緵SQL鏌ヨ鏈嶅姟錛屽吋瀹笻ive璇硶錛屾ц兘姣擧ive蹇?-50 鍊嶏紝BlinkDB鏄竴涓氳繃鏉冭 鏁版嵁綺劇‘搴︽潵鎻愬崌鏌ヨ鏅屽簲鏃墮棿鐨勪氦浜扴QL鏌ヨ寮曟搸錛屼簩鑰呴兘鍙綔涓轟氦浜掑紡鏌ヨ浣跨敤銆係park Streaming灝嗘祦寮忚綆楀垎瑙f垚涓緋誨垪鐭皬鐨勬壒澶勭悊璁$畻錛屽茍涓旀彁渚涢珮鍙潬鍜屽悶鍚愰噺鏈嶅姟銆?/p>
2.Spark鍩烘湰鍘熺悊
Spark榪愯妗嗘灦濡備笅鍥炬墍紺猴紝棣栧厛鏈夐泦緹よ祫婧愮鐞嗘湇鍔★紙Cluster Manager錛夊拰榪愯浣滀笟浠誨姟鐨勭粨鐐癸紙Worker Node錛夛紝鐒跺悗灝辨槸姣忎釜搴旂敤鐨勪換鍔℃帶鍒剁粨鐐笵river鍜屾瘡涓満鍣ㄨ妭鐐逛笂鏈夊叿浣撲換鍔$殑鎵ц榪涚▼錛圗xecutor錛夈?/strong>
涓嶮R璁$畻妗嗘灦鐩告瘮錛孍xecutor鏈変簩涓紭鐐癸細涓涓槸澶氱嚎紼嬫潵鎵ц鍏蜂綋鐨勪換鍔★紝鑰屼笉鏄儚MR閭f牱閲囩敤榪涚▼妯″瀷錛?鍑忓皯浜嗕換鍔$殑鍚姩寮紼嶃備簩涓槸Executor涓婁細鏈変竴涓狟lockManager瀛樺偍妯″潡錛岀被浼間簬KV緋葷粺錛堝唴瀛樺拰紓佺洏鍏卞悓浣滀負瀛樺偍璁懼錛夛紝褰撻渶瑕佽凱浠?澶氳疆鏃訛紝鍙互灝嗕腑闂磋繃紼嬬殑鏁版嵁鍏堟斁鍒拌繖涓瓨鍌ㄧ郴緇熶笂錛屼笅嬈¢渶瑕佹椂鐩存帴璇昏瀛樺偍涓婃暟鎹紝鑰屼笉闇瑕佽鍐欏埌hdfs絳夌浉鍏崇殑鏂囦歡緋葷粺閲岋紝鎴栬呭湪浜や簰寮忔煡璇㈠満鏅?涓嬶紝浜嬪厛灝嗚〃Cache鍒拌瀛樺偍緋葷粺涓婏紝鎻愰珮璇誨啓IO鎬ц兘銆傚彟澶朣park鍦ㄥ仛Shuffle鏃訛紝鍦℅roupby錛孞oin絳夊満鏅笅鍘繪帀浜嗕笉蹇呰鐨?Sort鎿嶄綔錛岀浉姣斾簬MapReduce鍙湁Map鍜孯educe浜岀妯″紡錛孲park榪樻彁渚涗簡鏇村姞涓板瘜鍏ㄩ潰鐨勮繍綆楁搷浣滃 filter,groupby,join絳夈?/p>
Notes: 鍦ㄩ泦緹?cluster)鏂瑰紡涓? Cluster Manager榪愯鍦ㄤ竴涓猨vm榪涚▼涔嬩腑錛岃寃orker榪愯鍦ㄥ彟涓涓猨vm榪涚▼涓傚湪local cluster涓紝榪欎簺jvm榪涚▼閮藉湪鍚屼竴鍙版満鍣ㄤ腑錛屽鏋滄槸鐪熸鐨剆tandalone鎴朚esos鍙奩arn闆嗙兢錛寃orker涓巑aster鎴栧垎甯冧簬涓嶅悓鐨勪富鏈轟箣涓娿?/p>
JOB鐨勭敓鎴愬拰榪愯
job鐢熸垚鐨勭畝鍗曟祦紼嬪涓?/p>
1.棣栧厛搴旂敤紼嬪簭鍒涘緩SparkContext鐨勫疄渚嬶紝濡傚疄渚嬩負sc
2.鍒╃敤SparkContext鐨勫疄渚嬫潵鍒涘緩鐢熸垚RDD
3.緇忚繃涓榪炰覆鐨則ransformation鎿嶄綔錛屽師濮嬬殑RDD杞崲鎴愪負鍏跺畠綾誨瀷鐨凴DD
4.褰揳ction浣滅敤浜庤漿鎹箣鍚嶳DD鏃訛紝浼氳皟鐢⊿parkContext鐨剅unJob鏂規硶
5.sc.runJob鐨勮皟鐢ㄦ槸鍚庨潰涓榪炰覆鍙嶅簲鐨勮搗鐐癸紝鍏抽敭鎬х殑璺冨彉灝卞彂鐢熷湪姝ゅ
璋冪敤璺緞澶ц嚧濡備笅
1.sc.runJob->dagScheduler.runJob->submitJob
2.DAGScheduler::submitJob浼氬垱寤篔obSummitted鐨別vent鍙戦佺粰鍐呭祵綾籩ventProcessActor
3.eventProcessActor鍦ㄦ帴鏀跺埌JobSubmmitted涔嬪悗璋冪敤processEvent澶勭悊鍑芥暟
4.job鍒皊tage鐨勮漿鎹紝鐢熸垚finalStage騫舵彁浜よ繍琛岋紝鍏抽敭鏄皟鐢╯ubmitStage
5.鍦╯ubmitStage涓細璁$畻stage涔嬮棿鐨勪緷璧栧叧緋伙紝渚濊禆鍏崇郴鍒嗕負瀹戒緷璧栧拰紿勪緷璧栦袱縐?/p>
6.濡傛灉璁$畻涓彂鐜板綋鍓嶇殑stage娌℃湁浠諱綍渚濊禆鎴栬呮墍鏈夌殑渚濊禆閮藉凡緇忓噯澶囧畬姣曪紝鍒欐彁浜ask
7.鎻愪氦task鏄皟鐢ㄥ嚱鏁皊ubmitMissingTasks鏉ュ畬鎴?/p>
8.task鐪熸榪愯鍦ㄥ摢涓獁orker涓婇潰鏄敱TaskScheduler鏉ョ鐞嗭紝涔熷氨鏄笂闈㈢殑submitMissingTasks浼氳皟鐢═askScheduler::submitTasks
9.TaskSchedulerImpl涓細鏍規嵁Spark鐨勫綋鍓嶈繍琛屾ā寮忔潵鍒涘緩鐩稿簲鐨刡ackend,濡傛灉鏄湪鍗曟満榪愯鍒欏垱寤篖ocalBackend
10.LocalBackend鏀跺埌TaskSchedulerImpl浼犻掕繘鏉ョ殑ReceiveOffers浜嬩歡
11.receiveOffers->executor.launchTask->TaskRunner.run
Spark閲囩敤浜哠cala鏉ョ紪鍐欙紝鍦ㄥ嚱鏁拌〃杈句笂Scala鏈夊ぉ鐒剁殑浼樺娍錛屽洜姝ゅ湪琛ㄨ揪澶嶆潅鐨勬満鍣ㄥ涔犵畻娉曡兘鍔涙瘮鍏朵粬 璇█鏇村己涓旂畝鍗曟槗鎳傘傛彁渚涘悇縐嶆搷浣滃嚱鏁版潵寤虹珛璧稲DD鐨凞AG璁$畻妯″瀷銆傛妸姣忎竴涓搷浣滈兘鐪嬫垚鏋勫緩涓涓猂DD鏉ュ寰咃紝鑰孯DD鍒欒〃紺虹殑鏄垎甯冨湪澶氬彴鏈哄櫒涓婄殑 鏁版嵁闆嗗悎錛屽茍涓斿彲浠ュ甫涓婂悇縐嶆搷浣滃嚱鏁般傚涓嬪浘鎵紺猴細
棣栧厛浠巋dfs鏂囦歡閲岃鍙栨枃鏈唴瀹規瀯寤烘垚涓涓猂DD錛岀劧鍚庝嬌鐢╢ilter錛堬級鎿嶄綔鏉ュ涓婃鐨凴DD榪涜榪囨護錛屽啀浣?鐢╩ap錛堬級鎿嶄綔鍙栧緱璁板綍鐨勭涓涓瓧孌碉紝鏈鍚庡皢鍏禼ache鍦ㄥ唴瀛樹笂錛屽悗闈㈠氨鍙互瀵逛箣鍓峜ache榪囩殑鏁版嵁鍋氬叾浠栫殑鎿嶄綔銆傛暣涓繃紼嬮兘灝嗗艦鎴愪竴涓狣AG璁$畻 鍥撅紝姣忎釜鎿嶄綔姝ラ閮芥湁瀹歸敊鏈哄埗錛屽悓鏃惰繕鍙互灝嗛渶瑕佸嬈′嬌鐢ㄧ殑鏁版嵁cache璧鋒潵錛屼緵鍚庣畫榪唬浣跨敤.
3.Shark鐨勫伐浣滃師鐞?/strong>
Shark鏄熀浜嶴park璁$畻妗嗘灦涔嬩笂涓斿吋瀹笻ive璇硶鐨凷QL鎵ц寮曟搸錛岀敱浜庡簳灞傜殑璁$畻閲囩敤浜哠park錛屾?鑳芥瘮MapReduce鐨凥ive鏅亶蹇?鍊嶄互涓婏紝濡傛灉鏄函鍐呭瓨璁$畻鐨凷QL錛岃蹇?鍊嶄互涓婏紝褰撴暟鎹叏閮╨oad鍦ㄥ唴瀛樼殑璇濓紝灝嗗揩10鍊嶄互涓婏紝鍥犳 Shark鍙互浣滀負浜や簰寮忔煡璇㈠簲鐢ㄦ湇鍔℃潵浣跨敤銆?br />
涓婂浘灝辨槸鏁翠釜Shark鐨勬鏋跺浘錛屼笌鍏朵粬鐨凷QL寮曟搸鐩告瘮錛岄櫎浜嗗熀浜嶴park鐨勭壒鎬у錛孲hark鏄畬鍏ㄥ吋瀹笻ive鐨勮娉曪紝琛ㄧ粨鏋勪互鍙奤DF鍑芥暟絳夛紝宸叉湁鐨凥iveSql鍙互鐩存帴榪涜榪佺Щ鑷砈hark涓娿?/p>
涓嶩ive鐩告瘮錛孲hark鐨勭壒鎬у涓嬶細
1.浠ュ湪綰挎湇鍔$殑鏂瑰紡鎵ц浠誨姟錛岄伩鍏嶄換鍔¤繘紼嬬殑鍚姩鍜岄攢姣佸紑紼嶏紝閫氬父MapReduce閲岀殑姣忎釜浠誨姟閮芥槸鍚姩鍜屽叧闂繘紼嬬殑鏂瑰紡鏉ヨ繍琛岀殑錛岃屽湪Shark涓紝Server榪愯鍚庯紝鎵鏈夌殑宸ヤ綔鑺傜偣涔熼殢涔嬪惎鍔紝闅忓悗浠ュ父椹繪湇鍔$殑褰㈠紡涓嶆柇鐨勬帴鍙桽erver鍙戞潵鐨勪換鍔°?/p>
2.Groupby鍜孞oin鎿嶄綔涓嶉渶瑕丼ort宸ヤ綔錛屽綋鏁版嵁閲忓唴瀛樿兘瑁呬笅鏃訛紝涓杈規帴鏀舵暟鎹竴杈規墽琛岃綆楁搷浣溿傚湪Hive涓紝涓嶇浠諱綍鎿嶄綔鍦∕ap鍒癛educe鐨勮繃紼嬮兘闇瑕佸Key榪涜Sort鎿嶄綔銆?/p>
3.瀵逛簬鎬ц兘瑕佹眰鏇撮珮鐨勮〃錛屾彁渚涘垎甯冨紡Cache緋葷粺灝嗚〃鏁版嵁浜嬪厛Cache鑷沖唴瀛樹腑錛屽悗緇殑鏌ヨ灝嗙洿鎺ヨ闂唴瀛樻暟鎹紝涓嶅啀闇瑕佺鐩樺紑紼嶃?/p>
4.榪樻湁寰堝Spark鐨勭壒鎬э紝濡傚彲浠ラ噰鐢═orrent鏉ュ箍鎾彉閲忓拰灝忔暟鎹紝灝嗘墽琛岃鍒掔洿鎺ヤ紶閫佺粰Task錛孌AG榪囩▼涓殑涓棿鏁版嵁涓嶉渶瑕佽惤鍦板埌Hdfs鏂囦歡緋葷粺銆?/p>
Spark鏋舵瀯閲囩敤浜嗗垎甯冨紡璁$畻涓殑Master-Slave妯″瀷銆侻aster鏄搴旈泦緹や腑鐨勫惈鏈塎aster榪涚▼鐨勮妭鐐癸紙ClusterManager錛夛紝Slave鏄泦緹や腑鍚湁Worker榪涚▼鐨勮妭鐐廣侻aster浣滀負鏁翠釜闆嗙兢鐨勬帶鍒跺櫒錛岃礋璐f暣涓泦緹ょ殑姝e父榪愯錛沇orker鐩稿綋浜庢槸璁$畻鑺傜偣錛屾帴鏀朵富鑺傜偣鍛戒護涓庤繘琛岀姸鎬佹眹鎶ワ紱Executor璐熻矗浠誨姟鐨勬墽琛岋紱Client浣滀負鐢ㄦ埛鐨勫鎴風璐熻矗鎻愪氦搴旂敤錛孌river璐熻矗鎺у埗涓涓簲鐢ㄧ殑鎵ц錛屽鍥句笅鍥撅細
Spark 妗嗘灦鍥?/p>
Spark闆嗙兢閮ㄧ講鍚庯紝闇瑕佸湪涓昏妭鐐瑰拰浠庤妭鐐瑰垎鍒惎鍔∕aster榪涚▼鍜學orker榪涚▼錛屽鏁翠釜闆嗙兢榪涜鎺у埗銆傚湪涓涓猄park搴旂敤鐨勬墽琛岃繃紼嬩腑錛孌river鍜學orker鏄袱涓噸瑕佽鑹層侱river 紼嬪簭鏄簲鐢ㄩ昏緫鎵ц鐨勮搗鐐癸紝璐熻矗浣滀笟鐨勮皟搴︼紝鍗砊ask浠誨姟鐨勫垎鍙戯紝鑰屽涓猈orker鐢ㄦ潵綆$悊璁$畻鑺傜偣鍜屽垱寤篍xecutor騫惰澶勭悊浠誨姟銆傚湪鎵ц闃舵錛孌river浼氬皢Task鍜孴ask鎵渚濊禆鐨刦ile鍜宩ar搴忓垪鍖栧悗浼犻掔粰瀵瑰簲鐨刉orker鏈哄櫒錛屽悓鏃禘xecutor瀵圭浉搴旀暟鎹垎鍖虹殑浠誨姟榪涜澶勭悊銆?/p>
Spark鐨勬灦鏋勪腑鐨勫熀鏈粍浠朵粙緇嶏細
ClusterManager錛氬湪Standalone妯″紡涓嵆涓篗aster錛堜富鑺傜偣錛夛紝鎺у埗鏁翠釜闆嗙兢錛岀洃鎺orker銆傚湪YARN妯″紡涓負璧勬簮綆$悊鍣ㄣ?/p>
Worker錛氫粠鑺傜偣錛岃礋璐f帶鍒惰綆楄妭鐐癸紝鍚姩Executor鎴朌river銆傚湪YARN妯″紡涓負NodeManager錛岃礋璐h綆楄妭鐐圭殑鎺у埗銆?/p>
Driver錛氳繍琛孉pplication鐨刴ain()鍑芥暟騫跺垱寤篠parkContext銆?/p>
Executor錛氭墽琛屽櫒錛屽湪worker node涓婃墽琛屼換鍔$殑緇勪歡銆佺敤浜庡惎鍔ㄧ嚎紼嬫睜榪愯浠誨姟銆傛瘡涓狝pplication鎷ユ湁鐙珛鐨勪竴緇凟xecutors銆?/p>
SparkContext錛氭暣涓簲鐢ㄧ殑涓婁笅鏂囷紝鎺у埗搴旂敤鐨勭敓鍛藉懆鏈熴?/p>
RDD錛歋park鐨勫熀鏈綆楀崟鍏冿紝涓緇凴DD鍙艦鎴愭墽琛岀殑鏈夊悜鏃犵幆鍥綬DD Graph銆?/p>
DAG Scheduler錛氬疄鐜板皢Spark浣滀笟鍒嗚В鎴愪竴鍒板涓猄tage錛屾瘡涓猄tage鏍規嵁RDD鐨凱artition涓暟鍐沖畾Task鐨勪釜鏁幫紝鐒跺悗鐢熸垚鐩稿簲鐨凾ask set鏀懼埌TaskScheduler涓?/p>
TaskScheduler錛氬皢浠誨姟錛圱ask錛夊垎鍙戠粰Executor鎵ц銆?/p>
Stage錛氫竴涓猄park浣滀笟涓鑸寘鍚竴鍒板涓猄tage銆?/p>
Task錛氫竴涓猄tage鍖呭惈涓鍒板涓猅ask錛岄氳繃澶氫釜Task瀹炵幇騫惰榪愯鐨勫姛鑳姐?/p>
Transformations錛氳漿鎹?Transformations) (濡傦細map, filter, groupBy, join絳?錛孴ransformations鎿嶄綔鏄疞azy鐨勶紝涔熷氨鏄浠庝竴涓猂DD杞崲鐢熸垚鍙︿竴涓猂DD鐨勬搷浣滀笉鏄┈涓婃墽琛岋紝Spark鍦ㄩ亣鍒癟ransformations鎿嶄綔鏃跺彧浼氳褰曢渶瑕佽繖鏍風殑鎿嶄綔錛屽茍涓嶄細鍘繪墽琛岋紝闇瑕佺瓑鍒版湁Actions鎿嶄綔鐨勬椂鍊欐墠浼氱湡姝e惎鍔ㄨ綆楄繃紼嬭繘琛岃綆椼?/p>
Actions錛氭搷浣?Actions) (濡傦細count, collect, save絳?錛孉ctions鎿嶄綔浼氳繑鍥炵粨鏋滄垨鎶奟DD鏁版嵁鍐欏埌瀛樺偍緋葷粺涓侫ctions鏄Е鍙慡park鍚姩璁$畻鐨勫姩鍥犮?/p>
SparkEnv錛氱嚎紼嬬駭鍒殑涓婁笅鏂囷紝瀛樺偍榪愯鏃剁殑閲嶈緇勪歡鐨勫紩鐢ㄣ?/p>
SparkEnv鍐呭垱寤哄茍鍖呭惈濡備笅涓浜涢噸瑕佺粍浠剁殑寮曠敤銆?/em>
MapOutPutTracker錛氳礋璐huffle鍏冧俊鎭殑瀛樺偍銆?/em>
BroadcastManager錛氳礋璐e箍鎾彉閲忕殑鎺у埗涓庡厓淇℃伅鐨勫瓨鍌ㄣ?/em>
BlockManager錛氳礋璐e瓨鍌ㄧ鐞嗐佸垱寤哄拰鏌ユ壘鍧椼?/em>
MetricsSystem錛氱洃鎺ц繍琛屾椂鎬ц兘鎸囨爣淇℃伅銆?/em>
SparkConf錛氳礋璐e瓨鍌ㄩ厤緗俊鎭?/em>
Spark榪愯閫昏緫鍥?/p>
鍦⊿park搴旂敤涓紝鏁翠釜鎵ц嫻佺▼鍦ㄩ昏緫涓婁細褰㈡垚鏈夊悜鏃犵幆鍥撅紙DAG錛夈侫ction綆楀瓙瑙﹀彂涔嬪悗錛屽皢鎵鏈夌瘡縐殑綆楀瓙褰㈡垚涓涓湁鍚戞棤鐜浘錛岀劧鍚庣敱璋冨害鍣ㄨ皟搴﹁鍥句笂鐨勪換鍔¤繘琛岃繍綆椼係park鐨勮皟搴︽柟寮忎笌MapReduce鏈夋墍涓嶅悓銆係park鏍規嵁RDD涔嬮棿涓嶅悓鐨勪緷璧栧叧緋誨垏鍒嗗艦鎴愪笉鍚岀殑闃舵錛圫tage錛夛紝涓涓樁孌靛寘鍚竴緋誨垪鍑芥暟鎵ц嫻佹按綰褲傚浘涓殑A銆丅銆丆銆丏銆丒銆丗鍒嗗埆浠h〃涓嶅悓鐨凴DD錛孯DD鍐呯殑鏂規浠h〃鍒嗗尯銆傛暟鎹粠HDFS杈撳叆Spark錛屽艦鎴怰DD A鍜孯DD C錛孯DD C涓婃墽琛宮ap鎿嶄綔錛岃漿鎹負RDD D錛?RDD B鍜?RDD E鎵цjoin鎿嶄綔錛岃漿鎹負F錛岃屽湪B鍜孍榪炴帴杞寲涓篎鐨勮繃紼嬩腑鍙堜細鎵цShuffle錛屾渶鍚嶳DD F 閫氳繃鍑芥暟saveAsSequenceFile杈撳嚭騫朵繚瀛樺埌HDFS鎴?Hbase涓?/p>