Skip to content

Commit 6590365

Browse files
authored
Update README.md
1 parent 80d7edc commit 6590365

File tree

1 file changed

+8
-66
lines changed

1 file changed

+8
-66
lines changed

README.md

+8-66
Original file line numberDiff line numberDiff line change
@@ -1,70 +1,12 @@
1-
# Distributed-Something
2-
Run encapsulated docker containers that do... something in the Amazon Web Services (AWS) infrastructure.
3-
We are interested in scientific image analysis so we have used it for [CellProfiler](https://github.com/CellProfiler/Distributed-CellProfiler), [Fiji](https://github.com/CellProfiler/Distributed-Fiji), and [BioFormats2Raw](https://github.com/CellProfiler/Distributed-OmeZarrMaker).
4-
You can use it for whatever you want!
5-
6-
## Documentation
7-
Full documentation is available on our [Documentation Website](https://distributedscience.github.io/Distributed-Something).
8-
9-
## Overview
10-
11-
This code is an example of how to use AWS distributed infrastructure for running anything Dockerized.
12-
The configuration of the AWS resources is done using boto3 and the AWS CLI.
13-
The worker is written in Python and is encapsulated in a Docker container.
14-
There are four AWS components that are minimally needed to run distributed jobs:
15-
16-
17-
1. An SQS queue
18-
2. An ECS cluster
19-
3. An S3 bucket
20-
4. A spot fleet of EC2 instances
21-
22-
23-
All of them can be managed individually through the AWS Management Console.
24-
However, this code helps to get started quickly and run a job autonomously if all the configuration is correct.
25-
The code runs a script that links all these components and prepares the infrastructure to run a distributed job.
26-
When the job is completed, the code is also able to stop resources and clean up components.
27-
It also adds logging and alarms via CloudWatch, helping the user troubleshoot runs and destroy stuck machines.
28-
29-
## Running the code
30-
31-
### Step 1
32-
Edit the config.py file with all the relevant information for your job.
33-
Then, start creating the basic AWS resources by running the following script:
1+
# Distributed-HelloWorld
342

35-
$ python3 run.py setup
36-
37-
This script initializes the resources in AWS.
38-
Notice that the docker registry is built separately and you can modify the worker code to build your own.
39-
Any time you modify the worker code, you need to update the docker registry using the Makefile script inside the worker directory.
40-
41-
### Step 2
42-
After the first script runs successfully, the job can now be submitted to with the following command:
43-
44-
$ python3 run.py submitJob files/exampleJob.json
45-
46-
Running the script uploads the tasks that are configured in the json file.
47-
You have to customize the exampleJob.json file with information that make sense for your project.
48-
You'll want to figure out which information is generic and which is the information that makes each job unique.
49-
50-
### Step 3
51-
After submitting the job to the queue, we can add computing power to process all tasks in AWS.
52-
This code starts a fleet of spot EC2 instances which will run the worker code.
53-
The worker code is encapsulated in Docker containers, and the code uses ECS services to inject them in EC2.
54-
All this is automated with the following command:
55-
56-
$ python3 run.py startCluster files/exampleFleet.json
57-
58-
After the cluster is ready, the code informs you that everything is setup, and saves the spot fleet identifier in a file for further reference.
59-
60-
### Step 4
61-
When the cluster is up and running, you can monitor progress using the following command:
62-
63-
$ python3 run.py monitor files/APP_NAMESpotFleetRequestId.json
3+
[Distributed-Something](https://github.com/DistributedScience/Distributed-Something) is an app to run encapsulated docker containers that do... something in the Amazon Web Services (AWS) infrastructure.
4+
We are interested in scientific image analysis so we have used it for [CellProfiler](https://github.com/DistributedScience/Distributed-CellProfiler), [Fiji](https://github.com/DistributedScience/Distributed-Fiji), and [BioFormats2Raw](https://github.com/DistributedScience/Distributed-OmeZarrMaker).
5+
You can use it for whatever you want!
646

65-
The file APP_NAMESpotFleetRequestId.json is created after the cluster is setup in step 3.
66-
It is important to keep this monitor running if you want to automatically shutdown computing resources when there are no more tasks in the queue (recommended).
7+
Here, as an example, we have used it to make an app that lets you say hello to the world, as well as list some of your favorite things. The full code changes are available [here](https://github.com/DistributedScience/Distributed-HelloWorld/pull/1/files)
678

68-
See our [full documentation](https://distributedscience.github.io/Distributed-Something) for more information about each step of the process.
9+
Happy Distributing!
6910

70-
![Distributed-Something](documentation/DS-documentation/images/Distributed-Something_chronological_overview.png)
11+
## Documentation
12+
Full documentation is available on our [Documentation Website](https://distributedscience.github.io/Distributed-Something).

0 commit comments

Comments
 (0)