Skip to content

Docker file needed for creating a container contains : pyspark , mongodb-hadoop and jupyter notebook

License

Notifications You must be signed in to change notification settings

nabilm/pyspark_mongodb_nb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Docker machine contents:

  • PySpark 2.1
  • Conda
  • Jupyter Notebook
  • mongodb-spark driver 2.11:2.0.0

How to start the docker machine?

1- Clone the repo

2- Build the docker machine from the repo directory by:
        $ sudo docker build -t pyspark_mongo_nb .

3- Create shared directory on you host
        $ sudo mkdir /pyspark
4- Execute:
        $ sudo docker run -d -p 8888:8888 -p 4040:4040 -p 4041:4041 -v /pyspark/:/pyspark --name pyspark_mongo_nb pyspark_mongo_nb

note:
    - You can access the jupyter notebook on http://localhost:8888
    - You can access spark UI using http://localhost:4040

To Access the docker machine using normal user you need to execute:

$ sudo docker exec -it pyspark_mongo_nb bash

To access the docker machine using root user you need to execute:

$ sudo docker exec -i -u root -t pyspark_mongo_nb bash

Download using docker hub

$ sudo docker pull phawzy/pyspark_mongo_nb

Test that things are working

To use Spark in a Python 3 notebook, add the following code at the start of the notebook:

import os
# make sure pyspark tells workers to use python3
os.environ['PYSPARK_PYTHON'] = '/opt/conda/bin/python3'

Run the Using Spark Local Mode tutorial to test that everything is working

Note That: The docker machine is based on jupyter/pyspark-notebook

About

Docker file needed for creating a container contains : pyspark , mongodb-hadoop and jupyter notebook

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •