Spark实战-用Scala编写WordCount程序

一.添加pom依赖:

pom.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0</version>
</dependency>

<build>
<plugins>

<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<id>compile-scala</id>
<phase>compile</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>test-compile-scala</id>
<phase>test-compile</phase>
<goals>
<goal>add-source</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>2.11.4</scalaVersion>
</configuration>
</plugin>
</plugins>
</build>

二.编写代码:

1.本地模式:

WordCount.scala

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
package WordCoutScala

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object WordCount {

//定义主方法
def main(args: Array[String]): Unit = {

//创建SparkConf对象
//如果Master是local,表示运行在本地模式上,即可以在开发工具中直接运行
//如果要提交到集群中运行,不需要设置Master
//集群模式
//val conf = new SparkConf().setAppName("My Scala Word Count")

//本地模式
val conf = new SparkConf().setAppName("My Scala Word Count").setMaster("local")

//创建SparkContext对象
val sc = new SparkContext(conf)

val result = sc.textFile("hdfs://192.168.1.120:9000/TestFile/test_WordCount.txt")
.flatMap(_.split(" "))
.map((_,1))
.reduceByKey(_+_)

result.foreach(println)

//集群模式
// val result = sc.textFile(args(0))
// .flatMap(_.split(" "))
// .map((_,1))
// .reduceByKey(_+_)
// .saveAsTextFile(args(1))

sc.stop()
}
}

结果

2.集群模式:

(1)编写WordCount.scala
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object WordCount {

//定义主方法
def main(args: Array[String]): Unit = {

//创建SparkConf对象
//如果Master是local,表示运行在本地模式上,即可以在开发工具中直接运行
//如果要提交到集群中运行,不需要设置Master
//集群模式
val conf = new SparkConf().setAppName("My Scala Word Count")

//本地模式
//val conf = new SparkConf().setAppName("My Scala Word Count").setMaster("local")

//创建SparkContext对象
val sc = new SparkContext(conf)

// val result = sc.textFile("hdfs://192.168.1.120:9000/TestFile/test_WordCount.txt")
// .flatMap(_.split(" "))
// .map((_,1))
// .reduceByKey(_+_)
//
// result.foreach(println)

//集群模式
val result = sc.textFile(args(0))
.flatMap(_.split(" "))
.map((_,1))
.reduceByKey(_+_)
.saveAsTextFile(args(1))

sc.stop()
}
}
(2)打包

打包

(3)上传到Spark节点:

上传

(4)运行:
1
bin/spark-submit --master spark://hadoop:7077 --class WordCoutScala.WordCount /opt/TestFile/ScalaProject-1.0-SNAPSHOT.jar hdfs://hadoop:9000/TestFile/test_WordCount.txt hdfs://hadoop:9000/output/1209/demo1

运行

(5)结果:

image.png

打赏
  • 版权声明: 本博客所有文章除特别声明外,著作权归作者所有。转载请注明出处!
  • Copyrights © 2015-2021 Movle
  • 访问人数: | 浏览次数:

请我喝杯咖啡吧~

支付宝
微信