A more scalable way to run the baseline evaluations in containers as Jobs on a Kubernetes cluster.
- A running Kubernetes Cluster
- An accessible Redis instance
NOTE: All of these scripts must be run from the repository root
Prerequisites:
To run a local Kubernetes cluster with Docker, you can run the following:
export K8S_VERSION=$(curl -sS https://storage.googleapis.com/kubernetes-release/release/stable.txt)
export ARCH=amd64
docker run -d \
--volume=/sys:/sys:rw \
--volume=/var/lib/docker/:/var/lib/docker:rw \
--volume=/var/lib/kubelet/:/var/lib/kubelet:rw,shared \
--volume=/var/run:/var/run:rw \
--net=host \
--pid=host \
--privileged \
--name=kubelet \
gcr.io/google_containers/hyperkube-${ARCH}:${K8S_VERSION} \
/hyperkube kubelet \
--hostname-override=127.0.0.1 \
--api-servers=http://localhost:8080 \
--kubeconfig=/etc/kubernetes/manifests \
--cluster-dns=10.0.0.10 \
--cluster-domain=cluster.local \
--allow-privileged --v=2
You can then download the kubectl
binary for accessing your cluster using the following command:
export K8S_VERSION=$(curl -sS https://storage.googleapis.com/kubernetes-release/release/stable.txt)
export GOOS=linux
export GOARCH=amd64
curl -sSL "https://storage.googleapis.com/kubernetes-release/release/${K8S_VERSION}/bin/${GOOS}/${GOARCH}/kubectl" > /usr/bin/kubectl
chmod +x /usr/bin/kubectl
Verify that everything is working with the following command:
ubuntu@integration-3:~/biocaddie$ kubectl get nodes
NAME STATUS AGE
127.0.0.1 Ready 6m
Prerequisites:
- Multiple VMs, or cloud/local resources to provision multiple VMs
- A shared volume mounted to every node (i.e. NFS, GlusterFS, Quobyte, etc)
Options:
- If you already have multiple VMs provisioned, you can bring up a cluster with a single command using kubeadm.
- To deploy multiple VMs to AWS, GCE, Azure, OpenStack, or Baremetal, you can use kargo.
- To deploy multiple VMs to your local laptop (development only), you can use Vagrant.
- To deploy multiple VMs in OpenStack at NCSA or SDSC (with varying support for other sites), you can use ndslabs-deploy-tools.
- For more deployment options, check out the constantly changing list.
<placeholder for instructions on how to more-easily deploy our gluster (yuck) or some equivalent shared NFS mount>
Once your cluster is running, you'll need to deploy an instance of Redis using the kubectl
command:
kubectl create -f ./kubernetes/redis.yaml
This will start up an instance of Redis on your cluster. Execute kubectl get pods
to retrieve the name of the Redis pod:
ubuntu@integration-3:~/biocaddie$ kubectl get svc
core@biocaddie-master1 ~ $ kubectl get pods
NAME READY STATUS RESTARTS AGE
redis-0p8vk 1/1 Running 0 15d
Acquire a shell into this container using kubectl exec
:
export REDIS_NAME=$(kubectl get pods | grep redis | grep -v Terminating | awk '{print $1}')
kubectl exec -it ${REDIS_NAME} bash
Once you're inside of the Redis container, you should see that your current working directory is biocaddie
and it already contains the source needed to run the baseline scripts:
core@biocaddie-master1 ~ $ kubectl exec -it $(kubectl get pods | grep redis | grep -v Terminating | awk '{print $1}') bash
root@redis-0p8vk:~/biocaddie# ls -al kubernetes/
total 38
drwxr-xr-x. 2 redis root 4096 Jun 28 16:34 .
drwxr-xr-x. 18 redis root 4096 Jul 20 16:28 ..
-rw-r--r--. 1 redis root 11626 Jun 27 18:10 README.md
-rwxr-xr-x. 1 redis root 839 Jun 27 18:10 dir-krovetz.sh
-rwxr-xr-x. 1 redis root 814 Jun 27 18:10 dir.sh
-rw-r--r--. 1 redis root 635 Jun 27 18:10 indri.yaml
-rwxr-xr-x. 1 redis root 840 Jun 27 18:10 jm.sh
-rw-r--r--. 1 redis root 1066 Jun 27 18:10 job.yaml
-rwxr-xr-x. 1 redis root 996 Jun 27 18:10 okapi.sh
-rw-r--r--. 1 redis root 803 Jun 27 18:10 redis.yaml
-rwxr-xr-x. 1 redis root 1556 Jun 27 18:37 reindex.sh
-rwxr-xr-x. 1 redis root 1213 Jun 27 18:10 rm3-krovetz.sh
-rwxr-xr-x. 1 redis root 1220 Jun 27 18:10 rm3-stopped.sh
-rwxr-xr-x. 1 redis root 1169 Jun 27 18:10 rm3.sh
-rwxr-xr-x. 1 redis root 1216 Jun 27 18:10 run_all.sh
-rwxr-xr-x. 1 redis root 936 Jun 27 18:10 tfidf.sh
-rwxr-xr-x. 1 redis root 947 Jun 27 18:10 two.sh
-rw-r--r--. 1 redis root 810 Jun 28 16:34 worker.yaml
You are now ready to start running the baseline scripts on Kubernetes!
To run a given model on a specific <collection>
-<topics>
pair, execute the following from the repository root:
./kubernetes/<model>.sh <topics> <collection>
Accepted parameters:
<model>
:dir
,dir-krovetz
,jm
,okapi
,rm3
,rm3-krovetz
,rm3-stopped
,tfidf
, ortwo
<topics>
:short
,stopped
, ororig
<collection>
:combined
,train
, ortest
For example:
ubuntu@integration-3:~/biocaddie$ ./kubernetes/dir.sh short combined
(integer) 1
(integer) 2
(integer) 3
(integer) 4
(integer) 5
(integer) 6
(integer) 7
job "dir-combined-short" created
Job started - to run multiple workers for this Job in parallel, use "kubectl scale"
The following command will list all existing jobs, along with their worker pods:
ubuntu@integration-3:~/biocaddie$ kubectl get jobs,pods -a
NAME DESIRED SUCCESSFUL AGE
jobs/dir-combined-short <none> 0 20s
jobs/rm3-combined-short <none> 6 1d
jobs/tfidf-combined-short <none> 2 1d
jobs/two-combined-short <none> 2 1d
NAME READY STATUS RESTARTS AGE
po/dir-combined-short-bfpgx 1/1 Running 0 20s
po/dir-combined-short-g3xpf 1/1 Running 0 20s
po/redis-dp58n 1/1 Running 0 5d
po/rm3-combined-short-5nsng 0/1 Completed 0 1d
po/rm3-combined-short-743g1 0/1 Completed 0 1d
po/rm3-combined-short-7vh8g 0/1 Completed 0 1d
po/rm3-combined-short-wl5kt 0/1 Completed 0 1d
po/rm3-combined-short-xf4qx 0/1 Completed 0 1d
po/rm3-combined-short-znq89 0/1 Completed 0 1d
po/temp-2077333550-cxl1g 1/1 Running 2 5d
po/tfidf-combined-short-s2cb2 0/1 Completed 0 1d
po/tfidf-combined-short-zsk58 0/1 Completed 0 1d
po/two-combined-short-qqhtw 0/1 Completed 0 1d
po/two-combined-short-s5jn0 0/1 Completed 0 1d
NOTE: including -a
will also list Completed
pods in the output.
You can then use kubectl logs -f <pod_name>
to view the logs of a Running
or Completed
worker pod.
Once the job has completed, you will see the number of successful replicas under kubectl get jobs
has been incremented. You should then be able to check the output/
folder for the output of the baseline runs:
ubuntu@integration-3:~/biocaddie$ du -h output/
62M output/tfidf/combined/short
62M output/tfidf/combined
62M output/tfidf
50M output/two/combined/short
50M output/two/combined
50M output/two
4.4M output/dir/combined/orig
4.6M output/dir/combined/short
8.9M output/dir/combined
8.9M output/dir
6.7M output/jm/combined/orig
7.1M output/jm/combined/short
14M output/jm/combined
14M output/jm
800M output/rm3/combined/short
800M output/rm3/combined
800M output/rm3
934M output/
All jobs, by default, run 2 worker replicas to pull work out of the job queue in Redis.
It is easy, however, to tell Kubernetes to run more than 2 workers:
kubectl scale job <job_name> --replicas=<N>
<job_name>
is a value from theNAME
column ofkubectl get jobs
<N>
is a non-negative integer number of worker pods to run concurrently to execute this job
NOTE: Setting <N>
to 0 effectively pauses a running job.
For example:
ubuntu@integration-3:~/biocaddie$ kubectl get jobs
NAME DESIRED SUCCESSFUL AGE
dir-combined-short <none> 0 31s
ubuntu@integration-3:~/biocaddie$ kubectl get pods
NAME READY STATUS RESTARTS AGE
dir-combined-short-k6nks 1/1 Running 0 16s
dir-combined-short-lrzxg 1/1 Running 0 16s
redis-dp58n 1/1 Running 0 5d
temp-2077333550-cxl1g 1/1 Running 2 5d
ubuntu@integration-3:~/biocaddie$ kubectl scale job dir-combined-short --replicas=5
job "dir-combined-short" scaled
ubuntu@integration-3:~/biocaddie$ kubectl get pods
NAME READY STATUS RESTARTS AGE
dir-combined-short-h5pvk 1/1 Running 0 4s
dir-combined-short-hz2nj 1/1 Running 0 4s
dir-combined-short-k6nks 1/1 Running 0 31s
dir-combined-short-lrzxg 1/1 Running 0 31s
dir-combined-short-wl9l3 1/1 Running 0 4s
redis-dp58n 1/1 Running 0 5d
temp-2077333550-cxl1g 1/1 Running 2 5d
We also provide a run_all.sh
that has the capability to run multiple baselines in bulk using the same parameters and a similar syntax to the above scripts.
For example, to run dir.sh
and jm.sh
against combined
-short
and combined
-orig
, you could use:
./kubernetes/run_all.sh "dir jm" "short orig" "combined"
For example:
ubuntu@integration-3:~/biocaddie$ ./kubernetes/run_all.sh "dir jm" "short orig" "combined"
(integer) 1
(integer) 2
(integer) 3
(integer) 4
(integer) 5
(integer) 6
(integer) 7
job "dir-combined-short" created
Job started - to run multiple workers for this Job in parallel, use "kubectl scale"
(integer) 1
(integer) 2
(integer) 3
(integer) 4
(integer) 5
(integer) 6
(integer) 7
job "dir-combined-orig" created
Job started - to run multiple workers for this Job in parallel, use "kubectl scale"
(integer) 1
(integer) 2
(integer) 3
(integer) 4
(integer) 5
(integer) 6
(integer) 7
(integer) 8
(integer) 9
(integer) 10
(integer) 11
job "jm-combined-short" created
Job started - to run multiple workers for this Job in parallel, use "kubectl scale"
(integer) 1
(integer) 2
(integer) 3
(integer) 4
(integer) 5
(integer) 6
(integer) 7
(integer) 8
(integer) 9
(integer) 10
(integer) 11
job "jm-combined-orig" created
Job started - to run multiple workers for this Job in parallel, use "kubectl scale"
ubuntu@integration-3:~/biocaddie$ kubectl get jobs,pods
NAME DESIRED SUCCESSFUL AGE
jobs/dir-combined-orig <none> 0 5s
jobs/dir-combined-short <none> 0 6s
jobs/jm-combined-orig <none> 0 5s
jobs/jm-combined-short <none> 0 5s
jobs/rm3-combined-short <none> 6 1d
jobs/tfidf-combined-short <none> 2 1d
jobs/two-combined-short <none> 2 1d
NAME READY STATUS RESTARTS AGE
po/dir-combined-orig-5fkqm 0/1 ContainerCreating 0 5s
po/dir-combined-orig-w5snl 0/1 ContainerCreating 0 5s
po/dir-combined-short-536z9 0/1 ContainerCreating 0 6s
po/dir-combined-short-lglwx 0/1 ContainerCreating 0 6s
po/jm-combined-orig-cbb8t 0/1 ContainerCreating 0 5s
po/jm-combined-orig-r47r6 0/1 ContainerCreating 0 5s
po/jm-combined-short-2g0dw 0/1 ContainerCreating 0 5s
po/jm-combined-short-f01f4 0/1 ContainerCreating 0 5s
po/redis-dp58n 1/1 Running 0 5d
po/temp-2077333550-cxl1g 1/1 Running 2 5d
The state of the framework can be reset with a few simple commands:
kubectl delete jobs --all # Stop all running jobs and clean up their worker pods
redis-cli -h ${REDIS_SERVICE_HOST} flushall # Clear out all work queues in Redis
rm -rf output/* # Delete existing output files