Skip to content

clojurecup2014/cloujera

Folders and files

NameName
Last commit message
Last commit date
Sep 28, 2014
Jan 19, 2015
Oct 28, 2015
Jan 19, 2015
Jan 19, 2015
Feb 25, 2015
Jan 19, 2015
Feb 25, 2015
Jan 19, 2015
Jan 19, 2015
Feb 25, 2015

Repository files navigation

Cloujera

Cloujera lets you do a fine-grained search for spoken words in Coursera's videos. It does this by performing full text searches on the transcripts of videos on coursera.

Local Setup

  1. Bring up Vagrant (elasticsearch + redis): vagrant up

  2. Compile the clojurescript: (Make sure you have java >1.7) lein cljsbuild once

  3. Start the app: lein run

  4. On the first run, visit http://127.0.0.1:8080/burglar/go to seed the db (it will error out ridiculously with an IndexMissingException from elasticsearch if you don't do this!);

Testing dockerized cloujera inside Vagrant VM

$ vagrant ssh
$ cd /vagrant
$ ./scripts/deploy.sh

NOTE: the address to access the dockerized cloujera is http://127.0.0.1:8081 (see Vagrantfile)

Testing uberjar inside Vagrant

$ vagrant ssh
$ cd /vagrant
$ source ./scripts/prod-env.sh
$ lein uberjar
$ java -jar ./target/uberjar/cloujera-*-standalone.jar

NOTE: the address to access the uberjarred cloujera running on port 8080 is http://127.0.0.1:8082 (see Vagrantfile)

Scraping courses

Visiting http://cloujera.whatever/burglar/go scrapes some 10 courses to get you started;

To scrape another course, you need to:

  1. Visit the cloujera session API https://api.coursera.org/api/catalog.v1/sessions and choose a course

  2. Sign up for the course and agree to honour code manually for the vise890+cloujera@gmail.com user

  3. Find the video lecture URL (videoLecturesURL)

  4. Perform an http POST http://cloujera.whatever/burglar/raid with this payload (JSON):

    { "url": videoLecturesURL }
    

    For example:

    { "url": "https://class.coursera.org/apcalcpart1-001/lecture" }
    

Deployment

Provisioning (The first time)

$ ssh user@cloudmachine
$ git clone https://github.com/vise890/cloujera
$ cd cloujera
$ sudo ./scripts/provision.sh

(Re-)Deploying cloujera

# in the cloujera directory...
$ ./scripts/deploy.sh

NOTE: deploy.sh pulls the most recent version of cloujera from the repo

Troubleshooting

Ensure that all the containers are running in the VM:

$ vagrant ssh
$ sudo docker ps -a

You should see redis, elasticsearch and cloujera running

Checking the cloujera logs

$ vagrant ssh
$ sudo docker logs cloujera

Checking Elasticsearch health

Visit http://localhost:9200/, you should see status: 200

Checking if Redis is running

redis-cli will drop you into a Redis shell. Some useful commands are: INFO, MONITOR, HELP, HELP @server.

NOTE: this works form the host as well as in the Vagrant VM

Dropping into a shell inside a container

$ vagrant ssh || ssh user@cloudbox
$ sudo docker exec -i -t cloujera bash

BUGS

  • lein run doesn't give any output initially
  • lein run doesn't reload

Packages

No packages published

Contributors 5