Skip to content

Commit 448a192

Browse files
Merge pull request #100 from cluster-apps-on-docker/feature/refactor-compose
[Feature] Refactor compose file
2 parents 0fd9f69 + da351d3 commit 448a192

10 files changed

Lines changed: 2216 additions & 30 deletions

File tree

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,4 @@ parallel computing in distributed environments through our projects. :sparkles:
2323
- [x] JupyterLab R kernel;
2424
- [x] Jupyter notebook with Apache Spark R API examples;
2525
- [ ] Test coverage;
26-
- [ ] Ever growing examples.
26+
- [ ] Ever-growing examples.

README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
This project gives you an **Apache Spark** cluster in standalone mode with a **JupyterLab** interface built on top of **Docker**.
1010
Learn Apache Spark through its **Scala**, **Python** (PySpark) and **R** (SparkR) API by running the Jupyter [notebooks](build/workspace/) with examples on how to read, process and write data.
1111

12-
<p align="center"><img src="docs/image/cluster-architecture.png"></p>
12+
<p align="center"><img alt="cluster-architecture" src="docs/image/cluster-architecture.png"></p>
1313

1414
![build-master](https://github.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/workflows/build-master/badge.svg)
1515
![sponsor](https://img.shields.io/badge/patreon-sponsor-ff69b4)
@@ -22,7 +22,7 @@ Learn Apache Spark through its **Scala**, **Python** (PySpark) and **R** (SparkR
2222
## TL;DR
2323

2424
```bash
25-
curl -LO https://raw.githubusercontent.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/master/docker-compose.yml
25+
curl -LO https://raw.githubusercontent.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/master/assets/docker-compose.yml
2626
docker-compose up
2727
```
2828

@@ -53,13 +53,13 @@ docker-compose up
5353

5454
### Download from Docker Hub (easier)
5555

56-
1. Download the [docker compose](docker-compose.yml) file;
56+
1. Download the [docker compose](assets/docker-compose.yml) file;
5757

5858
```bash
59-
curl -LO https://raw.githubusercontent.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/master/docker-compose.yml
59+
curl -LO https://raw.githubusercontent.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/master/assets/docker-compose.yml
6060
```
6161

62-
2. Edit the [docker compose](docker-compose.yml) file with your favorite tech stack version, check **apps** [supported versions](#tech-stack);
62+
2. Edit the [docker compose](assets/docker-compose.yml) file with your favorite tech stack version, check **apps** [supported versions](#tech-stack);
6363
3. Start the cluster;
6464

6565
```bash
@@ -110,16 +110,16 @@ docker-compose up
110110

111111
- Languages
112112

113-
| Spark | Hadoop | Scala | [Scala Kernel](https://almond.sh/) | Python | R |
114-
|-------|--------|---------|------------------------------------|--------|-------|
115-
| 3.5.7 | 3 | 2.12.20 | 0.14.2 | 3.12.3 | 4.3.3 |
113+
| Spark | Hadoop | Scala | Python | R |
114+
|-------|--------|---------|--------|-------|
115+
| 3.5.7 | 3 | 2.12.20 | 3.12.3 | 4.3.3 |
116116

117117
- Apps
118118

119-
| Component | Version | Docker Tag |
120-
|--------------|---------|------------------------------------------------------|
121-
| Apache Spark | 3.5.7 | **\<spark-version>** |
122-
| JupyterLab | 4.4.10 | **\<jupyterlab-version>**-spark-**\<spark-version>** |
119+
| Component | Version | Docker Tag |
120+
|--------------|---------|--------------------|
121+
| Apache Spark | 3.5.7 | 3.5.7 |
122+
| JupyterLab | 4.4.10 | 4.4.10-spark-3.5.7 |
123123

124124
## <a name="metrics"></a>Metrics
125125

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,22 +8,25 @@ volumes:
88
driver: local
99
services:
1010
jupyterlab:
11+
pull_policy: missing
1112
image: andreper/jupyterlab:4.4.10-spark-3.5.7
1213
container_name: jupyterlab
1314
ports:
1415
- "8888:8888"
16+
- "4040:4040"
1517
volumes:
1618
- shared-workspace:/opt/workspace
1719
spark-master:
20+
pull_policy: missing
1821
image: andreper/spark-master:3.5.7
1922
container_name: spark-master
2023
ports:
2124
- "8080:8080"
2225
- "7077:7077"
23-
- "4040:4040"
2426
volumes:
2527
- shared-workspace:/opt/workspace
2628
spark-worker-1:
29+
pull_policy: missing
2730
image: andreper/spark-worker:3.5.7
2831
container_name: spark-worker-1
2932
environment:
@@ -36,6 +39,7 @@ services:
3639
depends_on:
3740
- spark-master
3841
spark-worker-2:
42+
pull_policy: missing
3943
image: andreper/spark-worker:3.5.7
4044
container_name: spark-worker-2
4145
environment:
@@ -47,4 +51,4 @@ services:
4751
- shared-workspace:/opt/workspace
4852
depends_on:
4953
- spark-master
50-
...
54+
...

build/build.sh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,11 @@ SHOULD_BUILD_BASE="$(grep -m 1 build_base build.yml | sed -E 's/.*"([^"]*)".*/\1
1212
SHOULD_BUILD_SPARK="$(grep -m 1 build_spark build.yml | sed -E 's/.*"([^"]*)".*/\1/')"
1313
SHOULD_BUILD_JUPYTERLAB="$(grep -m 1 build_jupyter build.yml | sed -E 's/.*"([^"]*)".*/\1/')"
1414

15-
SPARK_VERSION="$(grep -m 1 spark build.yml | sed -E 's/.*"([^"]*)".*/\1/')"
16-
JUPYTERLAB_VERSION="$(grep -m 1 jupyterlab build.yml | sed -E 's/.*"([^"]*)".*/\1/')"
17-
15+
SPARK_VERSION="3.5.7"
1816
HADOOP_VERSION="3"
1917
SCALA_VERSION="2.12.20"
2018
SCALA_KERNEL_VERSION="0.14.2"
19+
JUPYTERLAB_VERSION="4.4.10"
2120

2221
# ----------------------------------------------------------------------------------------------------------------------
2322
# -- Functions----------------------------------------------------------------------------------------------------------

build/build.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,3 @@
1-
applications:
2-
spark: "3.5.7"
3-
jupyterlab: "4.4.10"
41
build:
52
build_base: "true"
63
build_spark: "true"

build/docker-compose.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,22 +8,25 @@ volumes:
88
driver: local
99
services:
1010
jupyterlab:
11+
pull_policy: never
1112
image: jupyterlab:4.4.10-spark-3.5.7
1213
container_name: jupyterlab
1314
ports:
1415
- "8888:8888"
16+
- "4040:4040"
1517
volumes:
1618
- shared-workspace:/opt/workspace
1719
spark-master:
20+
pull_policy: never
1821
image: spark-master:3.5.7
1922
container_name: spark-master
2023
ports:
2124
- "8080:8080"
2225
- "7077:7077"
23-
- "4040:4040"
2426
volumes:
2527
- shared-workspace:/opt/workspace
2628
spark-worker-1:
29+
pull_policy: never
2730
image: spark-worker:3.5.7
2831
container_name: spark-worker-1
2932
environment:
@@ -36,6 +39,7 @@ services:
3639
depends_on:
3740
- spark-master
3841
spark-worker-2:
42+
pull_policy: never
3943
image: spark-worker:3.5.7
4044
container_name: spark-worker-2
4145
environment:

build/docker/base/Dockerfile

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,14 @@ RUN mkdir -p ${shared_workspace}/data && \
2222

2323
RUN apt-get -y update && \
2424
apt-get -y install curl python3 r-base && \
25-
apt-get -y clean
25+
apt-get -y clean && \
26+
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
2627

2728
RUN curl -L https://github.com/scala/scala/releases/download/v${scala_version}/scala-${scala_version}.deb -o scala.deb && \
2829
apt-get -y install ./scala.deb && \
2930
rm -rf scala.deb && \
30-
apt-get -y clean
31+
apt-get -y clean && \
32+
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
3133

3234
ENV SCALA_HOME="/usr/bin/scala"
3335
ENV PATH=${PATH}:${SCALA_HOME}/bin

build/docker/jupyterlab/Dockerfile

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,19 +23,20 @@ ARG jupyterlab_version
2323
RUN apt-get -y update && \
2424
apt-get -y install python3-pip python3-dev && \
2525
pip install wget==3.2 pyspark==${spark_version} jupyterlab==${jupyterlab_version} --break-system-packages && \
26-
apt-get -y clean
26+
apt-get -y clean && \
27+
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
2728

2829
# -- Layer: Scala kernel for Spark
2930

3031
ARG scala_version
3132
ARG scala_kernel_version
3233

33-
RUN apt-get -y install ca-certificates-java --no-install-recommends && \
34-
curl -Lo coursier https://git.io/coursier-cli && \
34+
RUN curl -Lo coursier https://git.io/coursier-cli && \
3535
chmod +x coursier && \
3636
./coursier launch --fork almond:${scala_kernel_version} --scala ${scala_version} -- --display-name "Scala ${scala_version}" --install && \
3737
rm -f coursier && \
38-
apt-get -y clean
38+
apt-get -y clean && \
39+
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
3940

4041
# -- Layer: R kernel for SparkR
4142

@@ -45,11 +46,12 @@ RUN apt-get -y install r-base-dev && \
4546
curl -L https://archive.apache.org/dist/spark/spark-${spark_version}/SparkR_${spark_version}.tar.gz -o sparkr.tar.gz && \
4647
R CMD INSTALL sparkr.tar.gz && \
4748
rm -f sparkr.tar.gz && \
48-
apt-get -y clean
49+
apt-get -y clean && \
50+
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
4951

5052
# -- Runtime
5153

52-
EXPOSE 8888
54+
EXPOSE 8888 4040
5355

5456
WORKDIR ${SHARED_WORKSPACE}
5557
CMD jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root --NotebookApp.token=

0 commit comments

Comments
 (0)