Spark SQL实战:使用SparkSession创建DataFrame执行sql

1.需求:

在IDEA中编写代码,创建DataFrame 执行sql命令:

2.数据源:

(1)student.txt
1
2
3
1	tom  15
2 lucy 20
3 mike 18

3.编写代码:

(1)添加依赖:

pom.xml

1
2
3
4
5
6
7
8
9
10
11
12
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.0</version>
</dependency>
(2)Demo1.scala
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
package day1209

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.IntegerType
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.Row

/**
* 创建DataFrame 执行sql
*/

object Demo1 {
def main(args: Array[String]): Unit = {
//创建Spark Session对象
val spark = SparkSession.builder().master("local").appName("UnderstandSparkSession").getOrCreate()

//从指定地址读取文件 创建RDD
val personRDD = spark.sparkContext.textFile("/users/macbook/TestInfo/student.txt").map(_.split("\t"))

//指定schema
val schema = StructType(
List(
StructField("id", IntegerType),
StructField("name", StringType),
StructField("age", IntegerType)))

//将RDD转换为 rowRDD
val rowRDD = personRDD.map(p => Row(p(0).toInt, p(1).trim(), p(2).toInt))

//创建DataFrame 将schema与row对应
val personDataFrame = spark.createDataFrame(rowRDD, schema)

//注册视图
personDataFrame.createOrReplaceTempView("t_person")

//执行SQL
val df = spark.sql("select * from t_person order by age desc")

//显示结果
df.show()

//停止Spark
spark.stop()
}
}

4.运行结果:

打赏
  • 版权声明: 本博客所有文章除特别声明外,著作权归作者所有。转载请注明出处!
  • Copyrights © 2015-2021 Movle
  • 访问人数: | 浏览次数:

请我喝杯咖啡吧~

支付宝
微信