Skip to content

Commit 19f03e1

Browse files
Updated files to reflect feedback
1 parent 3232de8 commit 19f03e1

18 files changed

+13185
-58
lines changed
+112
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Join Operations Exercise"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"Different types of Join\n",
15+
"Stream-stream joins\n",
16+
"Stream-dataset joins\n",
17+
"DEMO: Do a demo with Stream-stream joins\n",
18+
"DEMO: Do a demo with Stream-dataset joins\n",
19+
"EXERCISE: Give an exercise with Stream-stream joins or Stream-dataset joins\n"
20+
]
21+
},
22+
{
23+
"cell_type": "markdown",
24+
"metadata": {},
25+
"source": [
26+
"### Join Operations\n",
27+
"\n",
28+
"Finally, its worth highlighting how easily you can perform different kinds of joins in Spark Streaming.\n",
29+
"\n",
30+
"### Stream-stream joins\n",
31+
"\n",
32+
"Streams can be very easily joined with other streams.\n",
33+
"```python\n",
34+
"stream1 = ...\n",
35+
"stream2 = ...\n",
36+
"joinedStream = stream1.join(stream2)\n",
37+
"```\n",
38+
"Here, in each batch interval, the RDD generated by `stream1` will be joined with the RDD generated by `stream2`. You can also do `leftOuterJoin`, `rightOuterJoin`, `fullOuterJoin`. Furthermore, it is often very useful to do joins over windows of the streams. That is pretty easy as well.\n",
39+
"```python\n",
40+
"windowedStream1 = stream1.window(20)\n",
41+
"windowedStream2 = stream2.window(60)\n",
42+
"joinedStream = windowedStream1.join(windowedStream2)\n",
43+
"```\n",
44+
"\n",
45+
"### Stream-dataset joins\n",
46+
"\n",
47+
"This has already been shown earlier while explain `DStream.transform` operation. Here is yet another example of joining a windowed stream with a dataset.\n",
48+
"```python\n",
49+
"dataset = ... # some RDD\n",
50+
"windowedStream = stream.window(20)\n",
51+
"joinedStream = windowedStream.transform(lambda rdd: rdd.join(dataset))\n",
52+
"```\n",
53+
"In fact, you can also dynamically change the `dataset` you want to join against. The function provided to `transform` is evaluated every batch interval and therefore will use the current dataset that `dataset` reference points to.\n",
54+
"\n",
55+
"The complete list of DStream transformations is available in the API documentation. For the Python API, see [DStream](https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.DStream).\n",
56+
"\n"
57+
]
58+
},
59+
{
60+
"cell_type": "markdown",
61+
"metadata": {},
62+
"source": [
63+
"### Exercise"
64+
]
65+
},
66+
{
67+
"cell_type": "code",
68+
"execution_count": null,
69+
"metadata": {
70+
"collapsed": true
71+
},
72+
"outputs": [],
73+
"source": []
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"metadata": {},
78+
"source": [
79+
"## Reference\n",
80+
"1. https://spark.apache.org/docs/latest/streaming-programming-guide.html#join-operations"
81+
]
82+
},
83+
{
84+
"cell_type": "markdown",
85+
"metadata": {},
86+
"source": [
87+
" "
88+
]
89+
}
90+
],
91+
"metadata": {
92+
"kernelspec": {
93+
"display_name": "Python 3",
94+
"language": "python",
95+
"name": "python3"
96+
},
97+
"language_info": {
98+
"codemirror_mode": {
99+
"name": "ipython",
100+
"version": 3
101+
},
102+
"file_extension": ".py",
103+
"mimetype": "text/x-python",
104+
"name": "python",
105+
"nbconvert_exporter": "python",
106+
"pygments_lexer": "ipython3",
107+
"version": "3.6.1"
108+
}
109+
},
110+
"nbformat": 4,
111+
"nbformat_minor": 2
112+
}
+112
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Stream-DataSet Join Demo"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"Different types of Join\n",
15+
"Stream-stream joins\n",
16+
"Stream-dataset joins\n",
17+
"DEMO: Do a demo with Stream-stream joins\n",
18+
"DEMO: Do a demo with Stream-dataset joins\n",
19+
"EXERCISE: Give an exercise with Stream-stream joins or Stream-dataset joins\n"
20+
]
21+
},
22+
{
23+
"cell_type": "markdown",
24+
"metadata": {},
25+
"source": [
26+
"### Join Operations\n",
27+
"\n",
28+
"Finally, its worth highlighting how easily you can perform different kinds of joins in Spark Streaming.\n",
29+
"\n",
30+
"### Stream-stream joins\n",
31+
"\n",
32+
"Streams can be very easily joined with other streams.\n",
33+
"```python\n",
34+
"stream1 = ...\n",
35+
"stream2 = ...\n",
36+
"joinedStream = stream1.join(stream2)\n",
37+
"```\n",
38+
"Here, in each batch interval, the RDD generated by `stream1` will be joined with the RDD generated by `stream2`. You can also do `leftOuterJoin`, `rightOuterJoin`, `fullOuterJoin`. Furthermore, it is often very useful to do joins over windows of the streams. That is pretty easy as well.\n",
39+
"```python\n",
40+
"windowedStream1 = stream1.window(20)\n",
41+
"windowedStream2 = stream2.window(60)\n",
42+
"joinedStream = windowedStream1.join(windowedStream2)\n",
43+
"```\n",
44+
"\n",
45+
"### Stream-dataset joins\n",
46+
"\n",
47+
"This has already been shown earlier while explain `DStream.transform` operation. Here is yet another example of joining a windowed stream with a dataset.\n",
48+
"```python\n",
49+
"dataset = ... # some RDD\n",
50+
"windowedStream = stream.window(20)\n",
51+
"joinedStream = windowedStream.transform(lambda rdd: rdd.join(dataset))\n",
52+
"```\n",
53+
"In fact, you can also dynamically change the `dataset` you want to join against. The function provided to `transform` is evaluated every batch interval and therefore will use the current dataset that `dataset` reference points to.\n",
54+
"\n",
55+
"The complete list of DStream transformations is available in the API documentation. For the Python API, see [DStream](https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.DStream).\n",
56+
"\n"
57+
]
58+
},
59+
{
60+
"cell_type": "markdown",
61+
"metadata": {},
62+
"source": [
63+
"### Demo"
64+
]
65+
},
66+
{
67+
"cell_type": "code",
68+
"execution_count": null,
69+
"metadata": {
70+
"collapsed": true
71+
},
72+
"outputs": [],
73+
"source": []
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"metadata": {},
78+
"source": [
79+
"## Reference\n",
80+
"1. https://spark.apache.org/docs/latest/streaming-programming-guide.html#join-operations"
81+
]
82+
},
83+
{
84+
"cell_type": "markdown",
85+
"metadata": {},
86+
"source": [
87+
" "
88+
]
89+
}
90+
],
91+
"metadata": {
92+
"kernelspec": {
93+
"display_name": "Python 3",
94+
"language": "python",
95+
"name": "python3"
96+
},
97+
"language_info": {
98+
"codemirror_mode": {
99+
"name": "ipython",
100+
"version": 3
101+
},
102+
"file_extension": ".py",
103+
"mimetype": "text/x-python",
104+
"name": "python",
105+
"nbconvert_exporter": "python",
106+
"pygments_lexer": "ipython3",
107+
"version": "3.6.1"
108+
}
109+
},
110+
"nbformat": 4,
111+
"nbformat_minor": 2
112+
}
+112
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Stream-Stream Join Demo"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"Different types of Join\n",
15+
"Stream-stream joins\n",
16+
"Stream-dataset joins\n",
17+
"DEMO: Do a demo with Stream-stream joins\n",
18+
"DEMO: Do a demo with Stream-dataset joins\n",
19+
"EXERCISE: Give an exercise with Stream-stream joins or Stream-dataset joins\n"
20+
]
21+
},
22+
{
23+
"cell_type": "markdown",
24+
"metadata": {},
25+
"source": [
26+
"### Join Operations\n",
27+
"\n",
28+
"Finally, its worth highlighting how easily you can perform different kinds of joins in Spark Streaming.\n",
29+
"\n",
30+
"### Stream-stream joins\n",
31+
"\n",
32+
"Streams can be very easily joined with other streams.\n",
33+
"```python\n",
34+
"stream1 = ...\n",
35+
"stream2 = ...\n",
36+
"joinedStream = stream1.join(stream2)\n",
37+
"```\n",
38+
"Here, in each batch interval, the RDD generated by `stream1` will be joined with the RDD generated by `stream2`. You can also do `leftOuterJoin`, `rightOuterJoin`, `fullOuterJoin`. Furthermore, it is often very useful to do joins over windows of the streams. That is pretty easy as well.\n",
39+
"```python\n",
40+
"windowedStream1 = stream1.window(20)\n",
41+
"windowedStream2 = stream2.window(60)\n",
42+
"joinedStream = windowedStream1.join(windowedStream2)\n",
43+
"```\n",
44+
"\n",
45+
"### Stream-dataset joins\n",
46+
"\n",
47+
"This has already been shown earlier while explain `DStream.transform` operation. Here is yet another example of joining a windowed stream with a dataset.\n",
48+
"```python\n",
49+
"dataset = ... # some RDD\n",
50+
"windowedStream = stream.window(20)\n",
51+
"joinedStream = windowedStream.transform(lambda rdd: rdd.join(dataset))\n",
52+
"```\n",
53+
"In fact, you can also dynamically change the `dataset` you want to join against. The function provided to `transform` is evaluated every batch interval and therefore will use the current dataset that `dataset` reference points to.\n",
54+
"\n",
55+
"The complete list of DStream transformations is available in the API documentation. For the Python API, see [DStream](https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.DStream).\n",
56+
"\n"
57+
]
58+
},
59+
{
60+
"cell_type": "markdown",
61+
"metadata": {},
62+
"source": [
63+
"### Demo"
64+
]
65+
},
66+
{
67+
"cell_type": "code",
68+
"execution_count": null,
69+
"metadata": {
70+
"collapsed": true
71+
},
72+
"outputs": [],
73+
"source": []
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"metadata": {},
78+
"source": [
79+
"## Reference\n",
80+
"1. https://spark.apache.org/docs/latest/streaming-programming-guide.html#join-operations"
81+
]
82+
},
83+
{
84+
"cell_type": "markdown",
85+
"metadata": {},
86+
"source": [
87+
" "
88+
]
89+
}
90+
],
91+
"metadata": {
92+
"kernelspec": {
93+
"display_name": "Python 3",
94+
"language": "python",
95+
"name": "python3"
96+
},
97+
"language_info": {
98+
"codemirror_mode": {
99+
"name": "ipython",
100+
"version": 3
101+
},
102+
"file_extension": ".py",
103+
"mimetype": "text/x-python",
104+
"name": "python",
105+
"nbconvert_exporter": "python",
106+
"pygments_lexer": "ipython3",
107+
"version": "3.6.1"
108+
}
109+
},
110+
"nbformat": 4,
111+
"nbformat_minor": 2
112+
}

0 commit comments

Comments
 (0)