File tree 3 files changed +51
-1
lines changed
3 files changed +51
-1
lines changed Original file line number Diff line number Diff line change @@ -29,6 +29,7 @@ execute arbitrary CQL queries in your Spark applications.
29
29
- Filters rows on the server side via the CQL ` WHERE ` clause
30
30
- Allows for execution of arbitrary CQL statements
31
31
- Plays nice with Cassandra Virtual Nodes
32
+ - Works with PySpark DataFrames
32
33
33
34
## Version Compatibility
34
35
@@ -75,6 +76,7 @@ See [Building And Artifacts](doc/12_building_and_artifacts.md)
75
76
- [ Building And Artifacts] ( doc/12_building_and_artifacts.md )
76
77
- [ The Spark Shell] ( doc/13_spark_shell.md )
77
78
- [ DataFrames] ( doc/14_data_frames.md )
79
+ - [ Python] ( doc/15_python.md )
78
80
- [ Frequently Asked Questions] ( doc/FAQ.md )
79
81
80
82
## Community
Original file line number Diff line number Diff line change @@ -144,4 +144,6 @@ df.write
144
144
.format(" org.apache.spark.sql.cassandra" )
145
145
.options(Map ( " table" -> " words_copy" , " keyspace" -> " test" ))
146
146
.save()
147
- ```
147
+ ```
148
+
149
+ [ Next - Python DataFrames] ( 15_python.md )
Original file line number Diff line number Diff line change
1
+ # Documentation
2
+
3
+ ## PySpark with Data Frames - Experimental
4
+
5
+ With the inclusion of the Cassandra Data Source, PySpark can now be used with the Connector to
6
+ access Cassandra data. This does not require DataStax Enterprise but you are limited to DataFrame
7
+ only operations.
8
+
9
+ ### Setup
10
+
11
+ To enable Cassandra access the Spark Cassandra Connector assembly jar must be included on both the
12
+ driver and executor classpath for the PySpark Java Gateway. This can be done by starting the PySpark
13
+ shell similarlly to how the spark shell is started.
14
+
15
+ ``` bash
16
+ ./bin/pyspark \
17
+ --driver-class-path spark-cassandra-connector-assembly-1.4.0-M1-SNAPSHOT.jar \
18
+ --jars spark-cassandra-connector-assembly-1.4.0-M1-SNAPSHOT.jar
19
+ ```
20
+
21
+ ### Loading a DataFrame in Python
22
+
23
+ A DataFrame can be created which links to cassandra by using the the ` org.apache.spark.sql.cassandra `
24
+ source and by specifying keyword arguements for ` keyspace ` and ` table ` .
25
+
26
+ ``` python
27
+ sqlContext.read\
28
+ .format(" org.apache.spark.sql.cassandra" )\
29
+ .options(table = " kv" , keyspace = " test" )\
30
+ .load().show()
31
+ ```
32
+
33
+ ```
34
+ +-+-+
35
+ |k|v|
36
+ +-+-+
37
+ |5|5|
38
+ |1|1|
39
+ |2|2|
40
+ |4|4|
41
+ |3|3|
42
+ +-+-+
43
+ ```
44
+
45
+ The options and parameters are identical to the Scala Data Frames Api so
46
+ please see [ Data Frames] ( 14_data_frames.md ) for more information.
You can’t perform that action at this time.
0 commit comments