The following tutorial I will use a four node Raspberry Pi Cluster for an example.
(And I'll use my preferred selection between many similar options.)
i.e. the memory allocation settings is fit for Raspberry Pi 3 with 1G RAM
After setting up the environment, I'll implement some popular distributed computing ecosystem on it. And try to write a quick start script for them. And maybe some example demo.
Usage in Detail !! (Manual)
Important!!. First check user settings in configure.yaml, (for deeper settings check out fabfile.py User Settings part)
# Install local dependencies
python3 -m pip install -r requirements.txtfab update-and-upgrade # Make apt-get up to date (this can be done using the first login GUI of Raspbian Buster)
fab env-setup # Quick install basic utility function
fab set-hostname # Set hostname for each node (will need to reboot)
fab hosts-config # Set each others' hostname-to-IP on each Raspberry Pi (or they can't find each other using hostname)
fab ssh-config # Generate ssh-key and setup to all nodes
fab change-passwd # Change password for more security (Remember also change in fabfile.py later if you have changed pi's passowrd)
fab expand-swap # Expand swap (default 1024MB use --size=MEMSIZE to match your need) (System default is 100MB)Regular used function (make sure you've generated ssh-key or move your ssh-key to ./connection/id_rsa)
fab ssh-connect NODE_NUM # Connect to any node by it's index without password (use -h flag to be hadoop user)
fab uploadfile file_or_dir -s -p # Upload file or folder to remote (specific node use -n=NODE_NUM flag)If you changed default hostname in fabfile.py or configure.yaml.
Make sure you also changed the Hadoop configuraiton file in ./Files.
(if you're using cloud server, make sure you've opened the ports that Hadoop need.)
fab install-hadoop # An one button setup for hadoop environment on all nodes!!!
fab update-hadoop-conf # Every time you update configure file in local you can update it to all nodes at once(the key of Hadoop user is store in ./connection/hadoopSSH)
Utility function
fab start-hadoop
fab restart-hadoop
fab stop-hadoop
fab status-hadoop # Monitor Hadoop behavior
fab example-hadoop # If everything is done. You can play around with some hadoop official exampleIf you changed default hostname in fabfile.py or configure.yaml.
Make sure you also changed the Spark configuraiton file in ./Files.
fab install-sparkThere are lots of utility function like I did for Hadoop. Check it out by fab --list
This will be installed with Hadoop user
fab install-jupyterfab install-dockerfab install-codeserver| Subject | Ecosystem | Purpose |
|---|---|---|
| MapReduce Practice | Hadoop | MapReduce practice with Hadoop Streaming |
| Spark Practice | Spark | |
| Inverted Index | Focus on multiple inverted index strategy for search |
A step by step record of how I build this system.
- Preparation
- Hardware purchase
- Software package and dependencies (PC/Laptop)
- Python > 3.6
- Fabric 2.X
-
Assemble hardwares
-
Follow steps in Quick Setup
- Make sure
- (setup locale)
- update and upgrade
- setup environment
- git
- Java (JDK)
- setup hostname (for each and between each others)
- ssh keys
- expand swap (if use Raspberry Pi 3 or small RAM Raspberry Pi 4)
- Make sure
-
Setup fabric (brief notes) - execute shell commands remotely over SSH to all hosts at once!
- I've built some utility function first and then move on setup Hadoop
- when any general purpose manipulation needed I'll add it.
-
Setup Docker Swarm - TODO
-
Setup Kubernetes - TODO
-
Setup Distributed Tensorflow - TODO
- on Hadoop
- on Kubernetes
- Setup VSCode code-server - TODO
Algorithm
Links
- Chameleon Cloud Training
- fffaraz/awesome-selfhosted-aws: A curated list of awesome self-hosted alternatives to Amazon Web Services (AWS)
Distributed Tensorflow
High Performance Computing (HPC)
Resource Manager
Intel has updated their DevCloud system and currently called oneAPI
- Deal with PySpark and Jupyter Notebook problem
- More friendly Document
- Hadoop utility function introduction
- Dynamic Configure based on different hardware and maybe GUI and save multiple settings
- Set up hardware detail e.g. RAM size
- Read and write *.xml
- list some alterative note
- pdsh == fab CMD
- ssh-copy-id == ssh-config
- Hive, HBase, Pig, ...
- Git server maybe
- 14+ Raspberry Pi Server Projects
- Change
apt-gettoapt?! - MPI
- Dask
- Deploy Dask Clusters — Dask documentation
- Dask-MPI
- Cluster manager: PBS, SLURM, LSF, SGE
- Configuring a Distributed Dask Cluster
- Deploy Dask Clusters — Dask documentation
- Fabric alternative
