11# Apache Spark Standalone Cluster on Docker
2+
23> The project just got its [ own article] ( https://towardsdatascience.com/apache-spark-cluster-on-docker-ft-a-juyterlab-interface-418383c95445 ) at Towards Data Science Medium blog! :sparkles :
34
45This project gives you an ** Apache Spark** cluster in standalone mode with a ** JupyterLab** interface built on top of ** Docker** .
5- Learn Apache Spark through its Scala and Python API (PySpark) by running the Jupyter [ notebooks] ( build/workspace/ ) with examples on how to read, process and write data.
6+ Learn Apache Spark through its ** Scala** , ** Python** (PySpark) and ** R ** (SparkR) API by running the Jupyter [ notebooks] ( build/workspace/ ) with examples on how to read, process and write data.
67
78<p align =" center " ><img src =" docs/image/cluster-architecture.png " ></p >
89
@@ -13,6 +14,7 @@ Learn Apache Spark through its Scala and Python API (PySpark) by running the Jup
1314![ docker-compose-file-version] ( https://img.shields.io/badge/docker--compose-v1.10.0%2B-blue )
1415![ spark-scala-api] ( https://img.shields.io/badge/spark%20api-scala-red )
1516![ spark-pyspark-api] ( https://img.shields.io/badge/spark%20api-pyspark-red )
17+ ![ spark-sparkr-api] ( https://img.shields.io/badge/spark%20api-sparkr-red )
1618
1719## TL;DR
1820
@@ -33,12 +35,12 @@ docker-compose up
3335
3436### Cluster overview
3537
36- | Application | URL | Description |
37- | ---------------------- | ---------------------------------------- | ----------------------------------------------------------- |
38- | JupyterLab | [ localhost:8888] ( http://localhost:8888/ ) | Cluster interface with Scala and PySpark built-in notebooks |
39- | Apache Spark Master | [ localhost:8080] ( http://localhost:8080/ ) | Spark Master node |
40- | Apache Spark Worker I | [ localhost:8081] ( http://localhost:8081/ ) | Spark Worker node with 1 core and 512m of memory (default) |
41- | Apache Spark Worker II | [ localhost:8082] ( http://localhost:8082/ ) | Spark Worker node with 1 core and 512m of memory (default) |
38+ | Application | URL | Description |
39+ | ---------------------- | ---------------------------------------- | ---------------------------------------------------------- |
40+ | JupyterLab | [ localhost:8888] ( http://localhost:8888/ ) | Cluster interface with built-in Jupyter notebooks |
41+ | Apache Spark Master | [ localhost:8080] ( http://localhost:8080/ ) | Spark Master node |
42+ | Apache Spark Worker I | [ localhost:8081] ( http://localhost:8081/ ) | Spark Worker node with 1 core and 512m of memory (default) |
43+ | Apache Spark Worker II | [ localhost:8082] ( http://localhost:8082/ ) | Spark Worker node with 1 core and 512m of memory (default) |
4244
4345### Prerequisites
4446
@@ -54,7 +56,7 @@ docker-compose up
5456docker-compose up
5557```
5658
57- 4 . Run Apache Spark code using the provided Jupyter [ notebooks] ( build/workspace/ ) with Scala and PySpark examples;
59+ 4 . Run Apache Spark code using the provided Jupyter [ notebooks] ( build/workspace/ ) with Scala, PySpark and SparkR examples;
58605 . Stop the cluster by typing ` ctrl+c ` .
5961
6062### Build from your local machine
@@ -82,7 +84,7 @@ chmod +x build.sh ; ./build.sh
8284docker-compose up
8385```
8486
85- 7 . Run Apache Spark code using the provided Jupyter [ notebooks] ( build/workspace/ ) with Scala and PySpark examples;
87+ 7 . Run Apache Spark code using the provided Jupyter [ notebooks] ( build/workspace/ ) with Scala, PySpark and SparkR examples;
86888 . Stop the cluster by typing ` ctrl+c ` .
8789
8890## <a name =" tech-stack " ></a >Tech Stack
@@ -93,15 +95,17 @@ docker-compose up
9395| -------------- | ------- |
9496| Docker Engine | 1.13.0+ |
9597| Docker Compose | 1.10.0+ |
96- | Python | 3.7 |
97- | Scala | 2.12 |
98+ | Python | 3.7.3 |
99+ | Scala | 2.12.11 |
100+ | R | 3.5.2 |
98101
99102- Jupyter Kernels
100103
101- | Component | Version | Provider |
102- | -------------- | ------- | ------------------------------- |
103- | Python | 2.1.4 | [ Jupyter] ( https://jupyter.org/ ) |
104- | Scala | 0.10.0 | [ Almond] ( https://almond.sh/ ) |
104+ | Component | Version | Provider |
105+ | -------------- | ------- | --------------------------------------- |
106+ | Python | 2.1.4 | [ Jupyter] ( https://jupyter.org/ ) |
107+ | Scala | 0.10.0 | [ Almond] ( https://almond.sh/ ) |
108+ | R | 1.1.1 | [ IRkernel] ( https://irkernel.github.io/ ) |
105109
106110- Applications
107111
@@ -110,18 +114,22 @@ docker-compose up
110114| Apache Spark | 2.4.0 \| 2.4.4 \| 3.0.0 | ** \< spark-version>** -hadoop-2.7 |
111115| JupyterLab | 2.1.4 | ** \< jupyterlab-version>** -spark-** \< spark-version>** |
112116
117+ > Apache Spark R API (SparkR) is only supported on version ** 2.4.4** . Full list can be found [ here] ( https://cran.r-project.org/src/contrib/Archive/SparkR/ ) .
118+
113119## <a name =" docker-hub-metrics " ></a >Docker Hub Metrics
114120
115- | Image | Latest Version Size | Downloads |
116- | -------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
117- | [ JupyterLab] ( https://hub.docker.com/r/andreper/jupyterlab ) | ![ docker-size] ( https://img.shields.io/docker/image-size/andreper/jupyterlab/latest ) | ![ docker-pull] ( https://img.shields.io/docker/pulls/andreper/jupyterlab ) |
118- | [ Spark Master] ( https://hub.docker.com/r/andreper/spark-master ) | ![ docker-size] ( https://img.shields.io/docker/image-size/andreper/spark-master/latest ) | ![ docker-pull] ( https://img.shields.io/docker/pulls/andreper/spark-master ) |
119- | [ Spark Worker] ( https://hub.docker.com/r/andreper/spark-worker ) | ![ docker-size] ( https://img.shields.io/docker/image-size/andreper/spark-worker/latest ) | ![ docker-pull] ( https://img.shields.io/docker/pulls/andreper/spark-worker ) |
121+ | Image | Size | Downloads |
122+ | -------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
123+ | [ JupyterLab] ( https://hub.docker.com/r/andreper/jupyterlab ) | ![ docker-size-jupyterlab ] ( https://img.shields.io/docker/image-size/andreper/jupyterlab/latest ) | ![ docker-pull] ( https://img.shields.io/docker/pulls/andreper/jupyterlab ) |
124+ | [ Spark Master] ( https://hub.docker.com/r/andreper/spark-master ) | ![ docker-size-master ] ( https://img.shields.io/docker/image-size/andreper/spark-master/latest ) | ![ docker-pull] ( https://img.shields.io/docker/pulls/andreper/spark-master ) |
125+ | [ Spark Worker] ( https://hub.docker.com/r/andreper/spark-worker ) | ![ docker-size-worker ] ( https://img.shields.io/docker/image-size/andreper/spark-worker/latest ) | ![ docker-pull] ( https://img.shields.io/docker/pulls/andreper/spark-worker ) |
120126
121127## <a name =" contributing " ></a >Contributing
122128
123129We'd love some help. To contribute, please read [ this file] ( CONTRIBUTING.md ) .
124130
131+ > Staring us on GitHub is also an awesome way to show your support :star :
132+
125133## <a name =" contributors " ></a >Contributors
126134
127- - ** André Perez** - [ dekoperez] ( https://twitter.com/dekoperez ) - andre.marcos.perez@gmail.com
135+ - ** André Perez** - [ dekoperez] ( https://twitter.com/dekoperez ) - andre.marcos.perez@gmail.com
0 commit comments