Skip to content

Commit c2a5a08

Browse files
committed
README edits
1 parent 1e2b145 commit c2a5a08

File tree

1 file changed

+92
-89
lines changed

1 file changed

+92
-89
lines changed

README.md

+92-89
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,30 @@
11
# Spark-Redis
2+
A library for reading and writing data from and to [Redis](http://redis.io) with [Apache Spark](http://spark.apache.org/), for Spark SQL and DataFrames.
23

3-
Spark-Redis is a connector for reading/writing from Redis cluster or non-cluster directly via Spark.
4-
It supports all the types of Redis structures: Plain Key/Value, Hash, ZSet, Set, List.
5-
In Spark, the data from Redis is represented as an RDD with the tolerance of reshard and down of nodes.
4+
Spark-Redis provides access to all of Redis' data structures - String, Hash, List, Set and Sorted Set - from Spark as RDDs. The library can be used both with Redis stand-alone as well as clustered databases. When used with Redis cluster, Spark-Redis is aware of its partitioning scheme and adjusts in response to resharding and node failure events.
65

7-
Integrating Redis and Spark gives us a system that combines the best of both worlds.
6+
## Minimal requirements
7+
You'll need the the following to use Spark-Redis:
88

9-
## Requirements
9+
- Apache Spark v1.4.0
10+
- Scala v2.10.4
11+
- Jedis v2.7
12+
- Redis v2.8.12 or v3.0.3
1013

11-
This library requires Apache Spark 1.4+, Scala 2.10.4+, Jedis 2.7+, Redis 2.8+
14+
## Known limitations
1215

13-
## Current Limitations
14-
* No Java or Python API bindings
15-
* Only tested with the following configurations:
16-
- Redis 2.8+
17-
- Scala 2.10
18-
- Spark 1.4.0
19-
- Jedis 2.7
16+
* Java, Python and R API bindings are not provided at this time
17+
* The package was only tested with the following stack:
18+
- Apache Spark v1.4.0
19+
- Scala v2.10.4
20+
- Jedis v2.7 and v2.8 pre-release (see [below](#jedis-and-read-only-redis-cluster-slave-nodes) for details)
21+
- Redis v2.8.12 and v3.0.3
2022

21-
## Enable Slaves For Reading
22-
As jedis-2.7 doesn't support `readonly` command. We must wait for the release of jedis-2.8.
23-
The pre-build jedis-2.8.0 is included in `with-slaves` branch. We can enable slaves for reading by
23+
## Additional considerations
24+
This library is work in progress so the API may change before the official release.
2425

25-
`git checkout with-slaves`
26-
27-
after the `git clone` in **Using the library** field
28-
29-
## Warnings
30-
* The APIs will probably change several times before an official release
31-
32-
## Using the library
33-
There are two ways of using Spark-Redis library:
34-
35-
You can use it as a maven dependency:
26+
## Getting the library
27+
You can use the Spark-Redis library by adding it as a maven dependency to your `pom.xml` file:
3628
```
3729
<repositories>
3830
<repository>
@@ -50,13 +42,23 @@ You can use it as a maven dependency:
5042
</dependencies>
5143
```
5244

53-
There also exists the possibility of downloading the project by doing:
45+
Alternatively, you can simply download the library's source and build it:
5446
```
5547
git clone https://github.com/RedisLabs/spark-redis.git
48+
cd spark-redis
5649
mvn clean install
5750
```
58-
In order to add the Spark-Redis jar file to Spark, you can use the --jars command line option.
59-
For example, to include it when starting the spark-shell:
51+
52+
### Jedis and read-only Redis cluster slave nodes
53+
Jedis' current version - v2.7 - does not support reading from Redis cluster's slave nodes. This functionality will only be included in its upcoming version, v2.8.
54+
55+
To use Spark-Redis with Redis cluster's slave nodes, the library's source includes a pre-release of Jedis v2.8 under the `with-slaves` branch. Switch to that branch by entering the following before running `mvn clean install`:
56+
```
57+
git checkout with-slaves
58+
```
59+
60+
## Using the library
61+
Add Spark-Redis to Spark with the `--jars` command line option. For example, use it from spark-shell, include it in the following manner:
6062

6163
```
6264
$ bin/spark-shell --jars <path-to>/spark-redis-<version>.jar,<path-to>/jedis-<version>.jar
@@ -69,103 +71,104 @@ Welcome to
6971
/_/
7072
7173
Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_79)
72-
Type in expressions to have them evaluated.
73-
Type :help for more information.
7474
...
7575
```
76-
To read data from Redis Server, you can use the library by loading the implicits from `com.redislabs.provider.redis._` .
7776

78-
In the example we can see how to read from Redis Server.
77+
The following sections contain code snippets that demonstrate the use of Spark-Redis. To use the sample code, you'll need to replace `your.redis.server` and `6379` with your Redis database's IP address or hostname and port, respectively.
78+
79+
### The keys RDD
80+
Since data access in Redis is based on keys, to use Spark-Redis you'll first need a keys RDD. The following example shows how to read key names from Redis into an RDD:
7981
```
8082
import com.redislabs.provider.redis._
81-
val keysRDD = sc.fromRedisKeyPattern(("127.0.0.1", 7000), "keyPattern", 5)
82-
#keyPattern should be a plain string or a RedisRegex.
83-
#keysRDD is a RDD holds all the keys of keyPattern of the redis server.
84-
#keysRDD is divided into 5(default 3) partitions by hash slots.
83+
val keysRDD = sc.fromRedisKeyPattern(("your.redis.server", 6379, "foo*", 5)
8584
```
8685

87-
Using Redis' Key/Values
86+
The above example populates the keys RDD by retrieving the key names from Redis that match the given pattern (`foo*`). Furthermore, it overrides the default setting of 3 partitions in the RDD with a new value of 5 - each partition consists of a set of Redis cluster hashslots contain the matched key names.
87+
88+
89+
### Reading data
90+
91+
Each of Redis' data types can be read to an RDD. The following snippet demonstrates reading Redis Strings.
92+
93+
#### Strings
94+
8895
```
8996
import com.redislabs.provider.redis._
90-
val keysRDD = sc.fromRedisKeyPattern(("127.0.0.1", 7000), "keyPattern", 5)
91-
val kvRDD = keysRDD.getKV
92-
#kvRDD is a RDD holds all the k/v pairs whose k's pattern is keyPattern, and k must be of 'string' type in redis-server.
97+
val keysRDD = sc.fromRedisKeyPattern(("your.redis.server", 6379), "keyPattern", 5)
98+
99+
val stringRDD = keysRDD.getKV
93100
```
94101

95-
Using Redis' Hash
102+
Once run, `stringRDD` will contain the string values of all keys whose names are in provided in `keysRDD`. To read other data types, replace the last line in the example above with one of the following lines according to the actual type that's used.
103+
104+
#### Hashes
96105
```
97-
import com.redislabs.provider.redis._
98-
val keysRDD = sc.fromRedisKeyPattern(("127.0.0.1", 7000), "keyPattern", 5)
99106
val hashRDD = keysRDD.getHash
100-
#hashRDD is a RDD holds all the dicts' contents, and the dicts' names must be of keyPattern and exists in the redis-server.
101107
```
102108

103-
Using Redis' ZSet
104-
```
105-
import com.redislabs.provider.redis._
106-
val keysRDD = sc.fromRedisKeyPattern(("127.0.0.1", 7000), "keyPattern", 5)
107-
val zsetRDD = keysRDD.getZSet
108-
#zsetRDD is a RDD holds all the zsets' contents(key, score), and the zsets' names must be of keyPattern and exists in the redis-server.
109-
```
109+
This will populate `hashRDD` with the fields and values of the Redis Hashes given by `keysRDD`.
110110

111-
Using Redis' List
111+
#### Lists
112112
```
113-
import com.redislabs.provider.redis._
114-
val keysRDD = sc.fromRedisKeyPattern(("127.0.0.1", 7000), "keyPattern", 5)
115113
val listRDD = keysRDD.getList
116-
#listRDD is a RDD holds all the lists' contents, and the lists' names must be of keyPattern and exists in the redis-server.
117114
```
115+
The contents (members) of the Redis Lists in `keysRDD` will be stored in `listRDD`
118116

119-
Using Redis' Set
117+
#### Sets
120118
```
121-
import com.redislabs.provider.redis._
122-
val keysRDD = sc.fromRedisKeyPattern(("127.0.0.1", 7000), "keyPattern", 5)
123119
val setRDD = keysRDD.getSet
124-
#setRDD is a RDD holds all the sets' contents, score), and the sets' names must be of keyPattern and exists in the redis-server.
125120
```
126121

127-
*****
122+
The Redis Sets' members will be written to `setRDD`.
128123

129-
To write data to Redis Server, you can use the library by loading the implicits from `com.redislabs.provider.redis._` .
124+
#### Sorted Sets
125+
```
126+
val zsetRDD = keysRDD.getZSet
127+
```
130128

131-
In the example we can see how to write to Redis Server.
129+
Using `getZSet` will store in `zsetRDD`, an RDD that consists of members and their scores, from the Redis Sorted Sets in `keysRDD`.
130+
131+
### Writing data
132+
To write data from Spark to Redis, you'll need to prepare the appropriate RDD depending on the data type you want to use for storing the data in it.
133+
134+
#### Strings
135+
For String values, your RDD should consist of the key-value pairs that are to be written. Assuming that the strings RDD is called `stringRDD`, use the following snippet for writing it to Redis:
132136

133-
Saving as Redis' Key/Values
134137
```
135-
import com.redislabs.provider.redis._
136-
val kvRDD = ...
137-
sc.toRedisKV(kvRDD, ("127.0.0.1", 7000))
138-
#kvRDD is a RDD holds k/v pairs, we will store all the k/v pairs of kvRDD to the redis-server
138+
...
139+
sc.toRedisKV(kvRDD, ("your.redis.server", 6379))
139140
```
140141

141-
Saving as Redis' Hash
142+
#### Hashes
143+
To store a Redis Hash, the RDD should consist of its field-value pairs. If the RDD is called `hashRDD`, the following should be used for storing it in the key name specified by `hashName`:
144+
142145
```
143-
import com.redislabs.provider.redis._
144-
val hashRDD = ...
145-
sc.toRedisHASH(hashRDD, hashName, ("127.0.0.1", 7000))
146-
#hashRDD is a RDD holds k/v pairs, we will store all the k/v pairs of hashRDD to a dict named hashName to the redis-server
146+
...
147+
sc.toRedisHASH(hashRDD, hashName, ("your.redis.server", 6379))
147148
```
148149

149-
Saving as Redis' ZSet
150+
#### Lists
151+
Use the following to store an RDD in a Redis List:
152+
150153
```
151-
import com.redislabs.provider.redis._
152-
val zsetRDD = ...
153-
sc.toRedisZSET(zsetRDD, zsetName, ("127.0.0.1", 7000))
154-
#zsetRDD is a RDD holds k/v pairs, we will store all the k/v pairs of zsetRDD to a zset named zsetName to the redis-server
154+
sc.toRedisLIST(listRDD, listName, ("your.redis.server", 6379))
155155
```
156156

157-
Saving as Redis' List
157+
The `listRDD` is an RDD that contains all of the list's string elements in order, and `listName` is the list's key name.
158+
159+
160+
#### Sets
161+
For storing data in a Redis Set, use `toRedisSET` as follows:
162+
158163
```
159-
import com.redislabs.provider.redis._
160-
val listRDD = ...
161-
sc.toRedisLIST(listRDD, listName, ("127.0.0.1", 7000))
162-
#listRDD is a RDD holds strings, we will store all the strings of listRDD to a list named listName to the redis-server
164+
sc.toRedisSET(setRDD, setName, ("your.redis.server", 6379))
163165
```
164166

165-
Saving as Redis' Set
167+
Where `setRDD` is an RDD with the set's string elements and `setName` is the name of the key for that set.
168+
169+
#### Sorted Sets
166170
```
167-
import com.redislabs.provider.redis._
168-
val setRDD = ...
169-
sc.toRedisSET(setRDD, setName, ("127.0.0.1", 7000))
170-
#setRDD is a RDD holds strings, we will store all the unique strings of setRDD to a set named setName to the redis-server
171+
sc.toRedisZSET(zsetRDD, zsetName, ("your.redis.server", 6379))
171172
```
173+
174+
The above example demonstrates storing data in Redis in a Sorted Set. The `zsetRDD` in the example should contain pairs of members and their scores, whereas `zsetName` is the name for that key.

0 commit comments

Comments
 (0)