You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A library for reading and writing data from and to [Redis](http://redis.io) with [Apache Spark](http://spark.apache.org/), for Spark SQL and DataFrames.
2
3
3
-
Spark-Redis is a connector for reading/writing from Redis cluster or non-cluster directly via Spark.
4
-
It supports all the types of Redis structures: Plain Key/Value, Hash, ZSet, Set, List.
5
-
In Spark, the data from Redis is represented as an RDD with the tolerance of reshard and down of nodes.
4
+
Spark-Redis provides access to all of Redis' data structures - String, Hash, List, Set and Sorted Set - from Spark as RDDs. The library can be used both with Redis stand-alone as well as clustered databases. When used with Redis cluster, Spark-Redis is aware of its partitioning scheme and adjusts in response to resharding and node failure events.
6
5
7
-
Integrating Redis and Spark gives us a system that combines the best of both worlds.
In order to add the Spark-Redis jar file to Spark, you can use the --jars command line option.
59
-
For example, to include it when starting the spark-shell:
51
+
52
+
### Jedis and read-only Redis cluster slave nodes
53
+
Jedis' current version - v2.7 - does not support reading from Redis cluster's slave nodes. This functionality will only be included in its upcoming version, v2.8.
54
+
55
+
To use Spark-Redis with Redis cluster's slave nodes, the library's source includes a pre-release of Jedis v2.8 under the `with-slaves` branch. Switch to that branch by entering the following before running `mvn clean install`:
56
+
```
57
+
git checkout with-slaves
58
+
```
59
+
60
+
## Using the library
61
+
Add Spark-Redis to Spark with the `--jars` command line option. For example, use it from spark-shell, include it in the following manner:
Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_79)
72
-
Type in expressions to have them evaluated.
73
-
Type :help for more information.
74
74
...
75
75
```
76
-
To read data from Redis Server, you can use the library by loading the implicits from `com.redislabs.provider.redis._` .
77
76
78
-
In the example we can see how to read from Redis Server.
77
+
The following sections contain code snippets that demonstrate the use of Spark-Redis. To use the sample code, you'll need to replace `your.redis.server` and `6379` with your Redis database's IP address or hostname and port, respectively.
78
+
79
+
### The keys RDD
80
+
Since data access in Redis is based on keys, to use Spark-Redis you'll first need a keys RDD. The following example shows how to read key names from Redis into an RDD:
79
81
```
80
82
import com.redislabs.provider.redis._
81
-
val keysRDD = sc.fromRedisKeyPattern(("127.0.0.1", 7000), "keyPattern", 5)
82
-
#keyPattern should be a plain string or a RedisRegex.
83
-
#keysRDD is a RDD holds all the keys of keyPattern of the redis server.
84
-
#keysRDD is divided into 5(default 3) partitions by hash slots.
83
+
val keysRDD = sc.fromRedisKeyPattern(("your.redis.server", 6379, "foo*", 5)
85
84
```
86
85
87
-
Using Redis' Key/Values
86
+
The above example populates the keys RDD by retrieving the key names from Redis that match the given pattern (`foo*`). Furthermore, it overrides the default setting of 3 partitions in the RDD with a new value of 5 - each partition consists of a set of Redis cluster hashslots contain the matched key names.
87
+
88
+
89
+
### Reading data
90
+
91
+
Each of Redis' data types can be read to an RDD. The following snippet demonstrates reading Redis Strings.
92
+
93
+
#### Strings
94
+
88
95
```
89
96
import com.redislabs.provider.redis._
90
-
val keysRDD = sc.fromRedisKeyPattern(("127.0.0.1", 7000), "keyPattern", 5)
91
-
val kvRDD = keysRDD.getKV
92
-
#kvRDD is a RDD holds all the k/v pairs whose k's pattern is keyPattern, and k must be of 'string' type in redis-server.
97
+
val keysRDD = sc.fromRedisKeyPattern(("your.redis.server", 6379), "keyPattern", 5)
98
+
99
+
val stringRDD = keysRDD.getKV
93
100
```
94
101
95
-
Using Redis' Hash
102
+
Once run, `stringRDD` will contain the string values of all keys whose names are in provided in `keysRDD`. To read other data types, replace the last line in the example above with one of the following lines according to the actual type that's used.
103
+
104
+
#### Hashes
96
105
```
97
-
import com.redislabs.provider.redis._
98
-
val keysRDD = sc.fromRedisKeyPattern(("127.0.0.1", 7000), "keyPattern", 5)
99
106
val hashRDD = keysRDD.getHash
100
-
#hashRDD is a RDD holds all the dicts' contents, and the dicts' names must be of keyPattern and exists in the redis-server.
101
107
```
102
108
103
-
Using Redis' ZSet
104
-
```
105
-
import com.redislabs.provider.redis._
106
-
val keysRDD = sc.fromRedisKeyPattern(("127.0.0.1", 7000), "keyPattern", 5)
107
-
val zsetRDD = keysRDD.getZSet
108
-
#zsetRDD is a RDD holds all the zsets' contents(key, score), and the zsets' names must be of keyPattern and exists in the redis-server.
109
-
```
109
+
This will populate `hashRDD` with the fields and values of the Redis Hashes given by `keysRDD`.
110
110
111
-
Using Redis' List
111
+
#### Lists
112
112
```
113
-
import com.redislabs.provider.redis._
114
-
val keysRDD = sc.fromRedisKeyPattern(("127.0.0.1", 7000), "keyPattern", 5)
115
113
val listRDD = keysRDD.getList
116
-
#listRDD is a RDD holds all the lists' contents, and the lists' names must be of keyPattern and exists in the redis-server.
117
114
```
115
+
The contents (members) of the Redis Lists in `keysRDD` will be stored in `listRDD`
118
116
119
-
Using Redis' Set
117
+
#### Sets
120
118
```
121
-
import com.redislabs.provider.redis._
122
-
val keysRDD = sc.fromRedisKeyPattern(("127.0.0.1", 7000), "keyPattern", 5)
123
119
val setRDD = keysRDD.getSet
124
-
#setRDD is a RDD holds all the sets' contents, score), and the sets' names must be of keyPattern and exists in the redis-server.
125
120
```
126
121
127
-
*****
122
+
The Redis Sets' members will be written to `setRDD`.
128
123
129
-
To write data to Redis Server, you can use the library by loading the implicits from `com.redislabs.provider.redis._` .
124
+
#### Sorted Sets
125
+
```
126
+
val zsetRDD = keysRDD.getZSet
127
+
```
130
128
131
-
In the example we can see how to write to Redis Server.
129
+
Using `getZSet` will store in `zsetRDD`, an RDD that consists of members and their scores, from the Redis Sorted Sets in `keysRDD`.
130
+
131
+
### Writing data
132
+
To write data from Spark to Redis, you'll need to prepare the appropriate RDD depending on the data type you want to use for storing the data in it.
133
+
134
+
#### Strings
135
+
For String values, your RDD should consist of the key-value pairs that are to be written. Assuming that the strings RDD is called `stringRDD`, use the following snippet for writing it to Redis:
132
136
133
-
Saving as Redis' Key/Values
134
137
```
135
-
import com.redislabs.provider.redis._
136
-
val kvRDD = ...
137
-
sc.toRedisKV(kvRDD, ("127.0.0.1", 7000))
138
-
#kvRDD is a RDD holds k/v pairs, we will store all the k/v pairs of kvRDD to the redis-server
138
+
...
139
+
sc.toRedisKV(kvRDD, ("your.redis.server", 6379))
139
140
```
140
141
141
-
Saving as Redis' Hash
142
+
#### Hashes
143
+
To store a Redis Hash, the RDD should consist of its field-value pairs. If the RDD is called `hashRDD`, the following should be used for storing it in the key name specified by `hashName`:
The above example demonstrates storing data in Redis in a Sorted Set. The `zsetRDD` in the example should contain pairs of members and their scores, whereas `zsetName` is the name for that key.
0 commit comments