代码如下,步骤流程在代码注释中可见:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
# -*- coding: utf-8 -*- import pandas as pd from pyspark.sql import SparkSession from pyspark.sql import SQLContext from pyspark import SparkContext #初始化数据 #初始化pandas DataFrame df = pd.DataFrame([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ]], index = [ 'row1' , 'row2' ], columns = [ 'c1' , 'c2' , 'c3' ]) #打印数据 print df #初始化spark DataFrame sc = SparkContext() if __name__ = = "__main__" : spark = SparkSession\ .builder\ .appName( "testDataFrame" )\ .getOrCreate() sentenceData = spark.createDataFrame([ ( 0.0 , "I like Spark" ), ( 1.0 , "Pandas is useful" ), ( 2.0 , "They are coded by Python " ) ], [ "label" , "sentence" ]) #显示数据 sentenceData.select( "label" ).show() #spark.DataFrame 转换成 pandas.DataFrame sqlContest = SQLContext(sc) spark_df = sqlContest.createDataFrame(df) #显示数据 spark_df.select( "c1" ).show() # pandas.DataFrame 转换成 spark.DataFrame pandas_df = sentenceData.toPandas() #打印数据 print pandas_df |
程序结果:
以上这篇pyspark.sql.DataFrame与pandas.DataFrame之间的相互转换实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/zhurui_idea/article/details/72981715