Skip to content

Commit d084368

Browse files
authored
bring README up to date
1 parent 8a24e7c commit d084368

File tree

1 file changed

+9
-47
lines changed

1 file changed

+9
-47
lines changed

README.md

+9-47
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,25 @@
55
- docker
66
- heroku-cli
77

8-
## Commands
8+
## Commands for local development & deployment
99

1010
We have a Makefile with common commands.
1111

12-
Do `make pip` once to install dependencies (using pipenv)
12+
Do `make pip` once to install dependencies (using pipenv). Repeat every time the dependencies change.
13+
Create a copy of `env.example` like so:
14+
`cp env.example .env`
15+
and populate `.env` with the correct values for OPENHUMANS and GITHUB credentials.
1316

1417
Every time you want to run locally, do `make deps` and then `make local`. The app will be available at `127.0.0.1:5000`
1518

19+
To deploy the current version to heroku, do `make deploy`.
20+
1621

1722
# The OH Github integration
1823

1924
<!-- [![Build Status](https://travis-ci.org/OpenHumans/oh-moves-source.svg?branch=master)](https://travis-ci.org/OpenHumans/oh-moves-source) -->
2025

21-
This repository provides a `Django` application that interfaces both with the `Open Humans` API and the `Github` API to collect GPS track data from `Github` and uploading it into `Open Humans`. It is based on the https://github.com/OpenHumans/oh-data-demo-template repository.
26+
This repository provides a `Django` application that interfaces both with the `Open Humans` API and the `Github` API to collect commit data from `Github` and uploading it into `Open Humans`. It is based on the https://github.com/OpenHumans/oh-data-demo-template repository.
2227

2328
For a user the workflow is the following:
2429

@@ -27,7 +32,7 @@ For a user the workflow is the following:
2732
3. This redirects the user back to this Github-integration website
2833
4. The user is redirected starts the authorization with `Github`. For this they are redirected to the Github page
2934
5. After a user has authorized both `Open Humans` & `Github` their `Github` data will be requested and ultimately saved as a file in Open Humans.
30-
6. Regular updates of the data should be automatically triggered to keep the data on Open Humans up to date.
35+
6. Regular updates of the data are triggered daily to keep the data on Open Humans up to date.
3136

3237
Getting the data from `Github` and uploading it to Open Humans has a couple of challenges:
3338
1. The `Github` API uses rate limits, which need to be respected and going over the rate limit would not yield more data but just errors
@@ -48,49 +53,6 @@ By registering a `realm` we set up a namespace for the github requests and speci
4853
## setup for Celery
4954
The settings for Celery can be found in `datauploader/celery.py`. These settings apply globally for our application. The Celery task itself can be found in `datauploader/tasks.py`. The main task for requesting & processing the github data is `process_github()` in that file.
5055

51-
## `process_github()`
52-
This task solves both the problem of hitting API limits as well as the import of existing data.
53-
The rough workflow is
54-
55-
```
56-
get_existing_github(…)
57-
get_start_date(…)
58-
remove_partial_data(…)
59-
try:
60-
while *no_error* and still_new_data:
61-
get more data
62-
except:
63-
process_github.async_apply(…,countdown=wait_period)
64-
finally:
65-
replace_github(…)
66-
```
67-
68-
### `get_existing_github`
69-
This step just checks whether there is already older `Github` data on Open Humans. If there is data
70-
it will download the old data and import it into our current workflow. This way we already know which dates we don't have to re-download from `Github` again.
71-
72-
### `get_start_date`
73-
This function checks what the last dates are for which we have downloaded data before. This tells us from which date in the past we have to start downloading more data.
74-
75-
### `remove_partial_data`
76-
The Github download works on a ISO-week basis. E.g. we request data for `Calendar Week 18`. But if we request week 18 on a Tuesday we will miss out on all of the data from Wednesday to Sunday. For that reason we make sure to drop the last week during which we already downloaded data and re-download that completely.
77-
78-
### getting more data.
79-
Here we just run a while loop over our date range beginning from our `start_date` until we hit `today`.
80-
81-
### `except`
82-
When we hit the Github API rate limit we can't make any more requests and the exception will be raised. When this happens we put a new `process_github` for this user into our `Celery` queue. With the `countdown` parameter we can specify for how long the job should at least be idle before starting again. Ultimately this serves as a cooldown period so that we are allowed new API calls to the `Github API`.
83-
84-
### `finally: replace_github`
85-
No matter whether we hit the API limit or not: We always want to upload the new data we got from the Github API back to Open Humans. This way we can incrementally update the data on Open Humans, even if we regularly hit the API limits.
86-
87-
### Example flow for `process_github`
88-
1. We want to download new data for user A and `get_existing_github` etc. tells us we need data for the weeks 01-10.
89-
2. We start our API calls and in Week 6 we hit the API limit. We now enqueue a new `process_github()` task with `Celery`.
90-
3. We then upload our existing data from week 1-5 to Open Humans. This way a user has at least some data already available
91-
4. After the countdown has passed our in `2` enqueued `process_github` task starts.
92-
5. This new task downloads the data from Open Humans and finds it already has data for weeks 1-5. So our new task only needs to download the data for week 5-10. It can now start right in week 5 and either finish without hitting a limit again, or it will at least make it through some more weeks before crashing again, which in turn will trigger yet another new `process_github` task for later.
93-
9456
## Doing automatic updates of the Github data
9557
This can be done by regularly enqueuing `process_github` tasks with `Celery`. As `Heroku` does not offer another cheap way of doing it we can use a `management task` for this that will be called daily by the `heroku scheduler`.
9658

0 commit comments

Comments
 (0)