Skip to content

Commit bc959b0

Browse files
removing redundant files not used in lessons
1 parent 105f0df commit bc959b0

File tree

454 files changed

+4740
-34644
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

454 files changed

+4740
-34644
lines changed
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

1_getting_started_with_spark_streaming/spark_app.py

-24
This file was deleted.

1_getting_started_with_spark_streaming/twitter_app.py

-47
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# First Spark Streaming Example\n",
8+
"_____"
9+
]
10+
},
11+
{
12+
"cell_type": "markdown",
13+
"metadata": {},
14+
"source": [
15+
"## Simple Local Example\n",
16+
"\n",
17+
"This demonstration will contain two parts. The first will be a "
18+
]
19+
},
20+
{
21+
"cell_type": "code",
22+
"execution_count": 1,
23+
"metadata": {},
24+
"outputs": [],
25+
"source": [
26+
"import findspark"
27+
]
28+
},
29+
{
30+
"cell_type": "code",
31+
"execution_count": 2,
32+
"metadata": {},
33+
"outputs": [],
34+
"source": [
35+
"findspark.init('/home/matthew/spark-2.1.0-bin-hadoop2.7')"
36+
]
37+
},
38+
{
39+
"cell_type": "code",
40+
"execution_count": 3,
41+
"metadata": {},
42+
"outputs": [],
43+
"source": [
44+
"from pyspark import SparkContext\n",
45+
"from pyspark.streaming import StreamingContext\n",
46+
"\n",
47+
"# Create a local StreamingContext with two working thread and batch interval of 1 second\n",
48+
"sc = SparkContext(\"local[2]\", \"LocalWordCount\")\n",
49+
"ssc = StreamingContext(sc, 1)"
50+
]
51+
},
52+
{
53+
"cell_type": "code",
54+
"execution_count": 4,
55+
"metadata": {},
56+
"outputs": [],
57+
"source": [
58+
"# Create a DStream that will connect to hostname:port, like localhost:9999\n",
59+
"# Firewalls might block this!\n",
60+
"lines = ssc.socketTextStream(\"localhost\", 9999)"
61+
]
62+
},
63+
{
64+
"cell_type": "code",
65+
"execution_count": 5,
66+
"metadata": {},
67+
"outputs": [],
68+
"source": [
69+
"# Split each line into words\n",
70+
"words = lines.flatMap(lambda line: line.split(\" \"))"
71+
]
72+
},
73+
{
74+
"cell_type": "code",
75+
"execution_count": 6,
76+
"metadata": {},
77+
"outputs": [],
78+
"source": [
79+
"# Count each word in each batch\n",
80+
"pairs = words.map(lambda word: (word, 1))\n",
81+
"wordCounts = pairs.reduceByKey(lambda x, y: x + y)\n",
82+
"\n",
83+
"# Print the first ten elements of each RDD generated in this DStream to the console\n",
84+
"wordCounts.pprint()"
85+
]
86+
},
87+
{
88+
"cell_type": "markdown",
89+
"metadata": {},
90+
"source": [
91+
"Now we open up a Unix terminal and type:\n",
92+
"\n",
93+
" $ nc -lk 9999\n",
94+
" $ hello world any text you want\n",
95+
" \n",
96+
"With this running run the line below, then type Ctrl+C to terminate it."
97+
]
98+
},
99+
{
100+
"cell_type": "code",
101+
"execution_count": null,
102+
"metadata": {},
103+
"outputs": [],
104+
"source": [
105+
"ssc.start() # Start the computation\n",
106+
"ssc.awaitTermination() # Wait for the computation to terminate"
107+
]
108+
}
109+
],
110+
"metadata": {
111+
"anaconda-cloud": {},
112+
"kernelspec": {
113+
"display_name": "Python 3",
114+
"language": "python",
115+
"name": "python3"
116+
},
117+
"language_info": {
118+
"codemirror_mode": {
119+
"name": "ipython",
120+
"version": 3
121+
},
122+
"file_extension": ".py",
123+
"mimetype": "text/x-python",
124+
"name": "python",
125+
"nbconvert_exporter": "python",
126+
"pygments_lexer": "ipython3",
127+
"version": "3.5.2"
128+
}
129+
},
130+
"nbformat": 4,
131+
"nbformat_minor": 1
132+
}

0 commit comments

Comments
 (0)