国产91精品一区二区麻豆亚洲,精品日韩亚洲AV无码,日韩一卡2卡3卡4卡新区亚洲

SPARK環境搭建-WINDOWS版本

轉載: Spark環境搭建-WIndows版本

這段時間在看Scala語言方面的資料，接觸到了Spark，于是昨天下午在公司，把Spark的環境搭建起來了。安裝的時候陪到了一個問題，在網上沒有找到解決方案，于是自己查了一下原因。現在做一下筆記。

1. spark的下載文件可以在官方找到，地址：http://spark.incubator.apache.org/downloads.html ，這次裝的是截至目前為止，最新的版本：0.9

2. 下載完以后，直接解壓到指定的路徑，例如，d：/programs

3. 安裝scala，并制定Scala_Home路徑，scala安裝請查看官網

4. 按照Spark官方的安裝指南，在解壓的目錄下，運行

sbt/sbt package

命令就可以。

但是這是針對linux和OS X系統的，在windows下運行這條命令，會報錯：

not a valid command

這個問題是因為，spark知道的sbt腳本無法在windows下運行，只要在網上下載一個windows版本的sbt，然后將里面的文件拷貝到Spark目錄下的sbt （http://www.scala-sbt.org/），然后在運行命令，安裝就會成功。

試試spark-shell

1 scala> val textFile = sc.textFile("README.md")
2 14/02/14 16:38:12 INFO MemoryStore: ensureFreeSpace(35480) called with curMem=177376, maxMem=308713881
3 14/02/14 16:38:12 INFO MemoryStore: Block broadcast_5 stored as values to memory (estimated size 34.6 KB, free 294.2 MB)
4
5 textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[16] at textFile at <console>:12
6
7 scala> textFile.count
8 14/02/14 16:38:14 INFO FileInputFormat: Total input paths to process : 1
9 14/02/14 16:38:14 INFO SparkContext: Starting job: count at <console>:15
10 14/02/14 16:38:14 INFO DAGScheduler: Got job 7 (count at <console>:15) with 1 output partitions (allowLocal=false)
11 14/02/14 16:38:14 INFO DAGScheduler: Final stage: Stage 7 (count at <console>:15)
12 14/02/14 16:38:14 INFO DAGScheduler: Parents of final stage: List()
13 14/02/14 16:38:14 INFO DAGScheduler: Missing parents: List()
14 14/02/14 16:38:14 INFO DAGScheduler: Submitting Stage 7 (MappedRDD[16] at textFile at <console>:12), which has no missin
15 g parents
16 14/02/14 16:38:14 INFO DAGScheduler: Submitting 1 missing tasks from Stage 7 (MappedRDD[16] at textFile at <console>:12)
17
18 14/02/14 16:38:14 INFO TaskSchedulerImpl: Adding task set 7.0 with 1 tasks
19 14/02/14 16:38:14 INFO TaskSetManager: Starting task 7.0:0 as TID 5 on executor localhost: localhost (PROCESS_LOCAL)
20 14/02/14 16:38:14 INFO TaskSetManager: Serialized task 7.0:0 as 1560 bytes in 1 ms
21 14/02/14 16:38:14 INFO Executor: Running task ID 5
22 14/02/14 16:38:14 INFO BlockManager: Found block broadcast_5 locally
23 14/02/14 16:38:14 INFO HadoopRDD: Input split: file:/D:/program/spark-0.9.0-incubating/README.md:0+4491
24 14/02/14 16:38:14 INFO Executor: Serialized size of result for 5 is 563
25 14/02/14 16:38:14 INFO Executor: Sending result for 5 directly to driver
26 14/02/14 16:38:14 INFO Executor: Finished task ID 5
27 14/02/14 16:38:14 INFO TaskSetManager: Finished TID 5 in 6 ms on localhost (progress: 0/1)
28 14/02/14 16:38:14 INFO DAGScheduler: Completed ResultTask(7, 0)
29 14/02/14 16:38:14 INFO TaskSchedulerImpl: Remove TaskSet 7.0 from pool
30 14/02/14 16:38:14 INFO DAGScheduler: Stage 7 (count at <console>:15) finished in 0.009 s
31 14/02/14 16:38:14 INFO SparkContext: Job finished: count at <console>:15, took 0.012329265 s
32 res10: Long = 119
33
34 scala> textFile.first
35 14/02/14 16:38:24 INFO SparkContext: Starting job: first at <console>:15
36 14/02/14 16:38:24 INFO DAGScheduler: Got job 8 (first at <console>:15) with 1 output partitions (allowLocal=true)
37 14/02/14 16:38:24 INFO DAGScheduler: Final stage: Stage 8 (first at <console>:15)
38 14/02/14 16:38:24 INFO DAGScheduler: Parents of final stage: List()
39 14/02/14 16:38:24 INFO DAGScheduler: Missing parents: List()
40 14/02/14 16:38:24 INFO DAGScheduler: Computing the requested partition locally
41 14/02/14 16:38:24 INFO HadoopRDD: Input split: file:/D:/program/spark-0.9.0-incubating/README.md:0+4491
42 14/02/14 16:38:24 INFO SparkContext: Job finished: first at <console>:15, took 0.002671379 s
43 res11: String = # Apache Spark
44
45 scala> val linesWithSpark = textFile.filter(line => line.contains("Spark"))
46 linesWithSpark: org.apache.spark.rdd.RDD[String] = FilteredRDD[17] at filter at <console>:14
47
48 scala> textFile.filter(line=> line.contains("spark")).count
49 14/02/14 16:38:37 INFO SparkContext: Starting job: count at <console>:15
50 14/02/14 16:38:37 INFO DAGScheduler: Got job 9 (count at <console>:15) with 1 output partitions (allowLocal=false)
51 14/02/14 16:38:37 INFO DAGScheduler: Final stage: Stage 9 (count at <console>:15)
52 14/02/14 16:38:37 INFO DAGScheduler: Parents of final stage: List()
53 14/02/14 16:38:37 INFO DAGScheduler: Missing parents: List()
54 14/02/14 16:38:37 INFO DAGScheduler: Submitting Stage 9 (FilteredRDD[18] at filter at <console>:15), which has no missin
55 g parents
56 14/02/14 16:38:37 INFO DAGScheduler: Submitting 1 missing tasks from Stage 9 (FilteredRDD[18] at filter at <console>:15)
57
58 14/02/14 16:38:37 INFO TaskSchedulerImpl: Adding task set 9.0 with 1 tasks
59 14/02/14 16:38:37 INFO TaskSetManager: Starting task 9.0:0 as TID 6 on executor localhost: localhost (PROCESS_LOCAL)
60 14/02/14 16:38:37 INFO TaskSetManager: Serialized task 9.0:0 as 1642 bytes in 0 ms
61 14/02/14 16:38:37 INFO Executor: Running task ID 6
62 14/02/14 16:38:37 INFO BlockManager: Found block broadcast_5 locally
63 14/02/14 16:38:37 INFO HadoopRDD: Input split: file:/D:/program/spark-0.9.0-incubating/README.md:0+4491
64 14/02/14 16:38:37 INFO Executor: Serialized size of result for 6 is 563
65 14/02/14 16:38:37 INFO Executor: Sending result for 6 directly to driver
66 14/02/14 16:38:37 INFO Executor: Finished task ID 6
67 14/02/14 16:38:37 INFO TaskSetManager: Finished TID 6 in 10 ms on localhost (progress: 0/1)
68 14/02/14 16:38:37 INFO DAGScheduler: Completed ResultTask(9, 0)
69 14/02/14 16:38:37 INFO TaskSchedulerImpl: Remove TaskSet 9.0 from pool
70 14/02/14 16:38:37 INFO DAGScheduler: Stage 9 (count at <console>:15) finished in 0.010 s
71 14/02/14 16:38:37 INFO SparkContext: Job finished: count at <console>:15, took 0.020335125 s
72 res12: Long = 7

另外Spark官網提供了入門的四段視頻，但是國內被墻了，無法觀看youtube，我把這四段視頻放到了土豆網，大家可以看看。

Spark Screencast 1 – 搭建Spark環境

Spark Screencast 2 – Spark文檔總覽

Spark Screencast 3 – 轉換和緩存

Spark Screencast 4 – Scala獨立任務

-----------------------------------------------------
Silence, the way to avoid many problems;
Smile, the way to solve many problems;

posted on 2014-02-14 16:21 Chan Chen 閱讀(3034) 評論(0) 編輯收藏所屬分類: Scala / Java

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關文章: SPARK環境搭建-WINDOWS版本 Java CyclicBarrier介紹 Java 枚舉7常見種用法 JVM參數設定 spring mvc singleton的驗證 Java關鍵字final、static使用總結 Spring Quartz Corn Expression Jps介紹以及解決jps無法查看某個已經啟動的java進程問題關于memcache取多值的性能比較 Pool resources using Apache's Commons Pool Framework

Chan Chen Coding...

導航

統計

文章分類

文章檔案

最新評論

SPARK環境搭建-WINDOWS版本