Skip to content

Before you get started: setting up

ErinWeisbart edited this page Oct 26, 2020 · 1 revision

Distributed-Fiji runs many parallel jobs in EC2 instances that are automatically managed by ECS. To get jobs started, a control node to submit jobs and monitor progress is needed. This section describes what you need in AWS and in the control node to get started.

1. AWS Configuration

The AWS resources involved in running Distributed-Fiji can be primarily configured using the AWS Web Console. The architecture of Distributed-Fiji is based in the [worker pattern] (https://aws.amazon.com/blogs/compute/better-together-amazon-ecs-and-aws-lambda/) for distributed systems. We have adapted and simplified that architecture for high-throughput image processing using Fiji.

You need an active account configured to proceed. Login into your AWS account, and make sure the following list of resources is created:

1.1 Access keys

  • Get security credentials for your account. Store your credentials in a safe place that you can access later.
  • You will probably need an ssh key to login into your EC2 instances (control or worker nodes). Generate an SSH key and store it in a safe place for later use. If you'd rather, you can generate a new key pair to use for this during creation of the control node; make sure to chmod 600 the private key when you download it.

1.2 Roles and permissions

  • You can use your default VPC, subnet, and security groups; you should add an inbound SSH connection from your IP address to your security group.
  • Create an ecsInstanceRole with appropriate permissions (An S3 bucket access policy CloudWatchFullAccess, CloudWatchActionEC2Access, AmazonEC2ContainerServiceforEC2Role policies, ec2.amazonaws.com as a Trusted Entity)
  • Create an aws-ec2-spot-fleet-tagging-role with appropriate permissions (just needs AmazonEC2SpotFleetTaggingRole); ensure that in the "Trust Relationships" tab it says "spotfleet.amazonaws.com" rather than "ec2.amazonaws.com" (edit this if necessary). In the current interface, it's easiest to click "Create role", select "EC2" from the main service list, then select "EC2- Spot Fleet Tagging".

1.3 Auxiliary Resources

1.4 Primary Resources

The following five are resources you need to interact with constantly while working with Distributed Fiji. Although at this point you don't need to create anything special there, you can open each console in a separate tab in your browser to keep them handy and monitor DF's behavior.

1.5 Spot Limits

AWS initially limits the number of spot instances you can use at one time; you can request more through a process in the linked documentation.

2. The Control Node

The control node can be your local machine if it is configured properly, or it can also be a small instance in AWS. The control node needs the following tools to successfully run Distributed-Fiji. Here we assume you are using the command line in a Linux machine, but you are free to try other operating systems too. Install the following packages in the control node.

2.1 Make your own

2.1.1 Clone this repo

You will need the scripts in Distributed-Fiji locally available in your control node.

    sudo apt-get install git
    git clone https://github.com/bethac07/Distributed-Fiji.git
    cd Distributed-Fiji/
    git pull

2.1.2 Python 2.7 or higher

Most scripts are written in Python, so you need to make sure your control node runs it without trouble. The following instructions are useful to install Python in Ubuntu:

    sudo apt-get update
    sudo apt-get install build-essential checkinstall
    sudo apt-get install python python-dev python-setuptools python-dev build-essential
    sudo easy_install pip
    sudo apt-get install fabric
    pip install --upgrade setuptools

After Python has been installed, you need to install the requirements for Distributed-Fiji following this steps:

    cd Distributed-Fiji/files
    sudo pip install -r requirements.txt

2.1.3 AWS CLI

The command line interface is the main mode of interaction between the local node and the resources in AWS. You need to install awscli for Distributed-Fiji to work properly:

    sudo pip install awscli --ignore-installed six
    sudo pip install --upgrade awscli
    aws configure

When running the last step, you will need to enter your AWS credentials. Make sure to set the region correctly (ie us-west-1 or eu-east-1, not eu-west-2a), and set the default file type to json

2.1.4 cloud-image-utils

This package is used to create the boothook that will allow you to specify how much disk space the Docker container gets for temporary files. As a default, each Docker container will use 10GB of disk space, which is often sufficient but can sometimes run out when many measurements are being output in a single job.

    sudo apt-get install cloud-image-utils

2.1.5 Parallel (optional)

Parallel is an optional tool that you can get installed in your control node for generating job files.

    sudo apt-get install parallel

2.1.6 s3fs-fuse (optional)

s3fs-fuse allows you to mount your s3 bucket as a pseudo-file system. It does not have all the performance of a real file system, but allows you to easily access all the files in your s3 bucket. Follow the instructions at the link to mount your bucket.

2.1.7 Create Control Node AMI (optional)

These tools can be installed in an EC2 instance running in AWS for simplicity of access and configuration. To login in an EC2 machine you need an ssh key that can be generated in the web console. Each time you launch an EC2 instance you have to confirm having this key (which is a pem file). This machine is needed only for submitting jobs, and does not have any special computational requirements, so you can use a micro instance to run basic scripts to proceed. Once you've set up the other software (and gotten a job running, so you know everything is set up correctly), you can use Amazon's web console to set this up as an Amazon Machine Instance, or AMI, to replicate the current state of the hard drive so that you can more easily create new control nodes.

2.2 Use a pre-made AMI

You can use our Cytominer-VM and add your own security keys; it has extra things you may not need, such as R, but it can be very handy!