Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Topic/abstract storage #144

Merged
merged 35 commits into from
Mar 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
a6b8575
print command when --dry is passed (#143)
breezykermo Mar 5, 2020
b136b73
WIP: begins to separate out storage layer
breezykermo Mar 6, 2020
15ec7cb
WIP: rewrite selector to use disk abstraction
Mar 7, 2020
0483ae0
WIP fix test_selector_errors
Mar 7, 2020
50c4430
WIP: test_mtmodule passing, write_logs added
Mar 7, 2020
6c3c088
WIP: getting analyser to work.... slowly
Mar 7, 2020
864db8e
WIP: getting the analyser tests working slowly...
Mar 8, 2020
7443a9e
all tests pass! much work to do though
Mar 8, 2020
11bcca5
all tests passing
breezykermo Mar 11, 2020
e643b7b
local selector working in new format
breezykermo Mar 11, 2020
317a804
restructure
breezykermo Mar 11, 2020
a482d26
refmt modules and get_module
breezykermo Mar 11, 2020
1100b33
refmt ExtractTypes
breezykermo Mar 11, 2020
84d8b8f
audio analysers, and multiple analysers from config working
breezykermo Mar 11, 2020
6294089
demos, etype tests, started on module rewrites
breezykermo Mar 12, 2020
ea41df8
implement cast
breezykermo Mar 14, 2020
f2cc292
add custom etypes, lint, and fix build tests
breezykermo Mar 14, 2020
210888a
Rename CvJson.py to cvjson.py
breezykermo Mar 14, 2020
b81bcac
clean and rewrite Youtube selector
breezykermo Mar 15, 2020
825273d
convert Twitter selector
breezykermo Mar 15, 2020
a019e3d
WIP: converting KerasPretrained
breezykermo Mar 15, 2020
cbd0836
KerasPretrained complete
breezykermo Mar 16, 2020
a82fca1
port ranking->Rank, add optional return value from post_analyse
breezykermo Mar 16, 2020
e29d22e
fmt
breezykermo Mar 16, 2020
a454bb0
cleaning...
breezykermo Mar 16, 2020
8af3dda
restructure docs/tutorial
breezykermo Mar 16, 2020
6eafdbc
ignorecase false
breezykermo Mar 16, 2020
cc16bbf
update testing paths
breezykermo Mar 16, 2020
88990fb
capitals
breezykermo Mar 17, 2020
cc14b66
abs path
breezykermo Mar 17, 2020
2e94ba1
complete tutorial 1
breezykermo Mar 17, 2020
a3a9bb6
label other tutorials TODO
breezykermo Mar 17, 2020
b131af5
lint
breezykermo Mar 17, 2020
7f15e10
fix build tests
breezykermo Mar 17, 2020
4a2ccbd
add update
breezykermo Mar 17, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,13 @@ credentials/**
.env

# other data
data/**
tags*
logfile.log

blacklists/**
whitelists/**
config/**

data/demo/3video/dancingonmyown.mov

data/demo/3video/info.json
24 changes: 19 additions & 5 deletions commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

def __run(cmd, cli_args, *args):
if cli_args.dry:
print(" ".join(cmd))
return cmd
try:
returncode = sp.call(cmd)
Expand Down Expand Up @@ -184,6 +185,17 @@ def develop(args):
CONT_NAME = "mtriage_developer"
TAG_NAME = "{}-gpu".format(args.tag) if args.gpu else args.tag

volumes = [
"-v",
"{}:/mtriage".format(DIR_PATH),
"-v",
"{}/.config/gcloud:/root/.config/gcloud".format(HOME_PATH),
]

if args.yaml is not None:
yaml_path = os.path.abspath(args.yaml)
volumes += ["-v", "{}:/run_args.yaml".format(yaml_path)]

# --runtime only exists on nvidia docker, so we pass a bubblegum flag when not available
# so that the call arguments are well formed.
return __run(
Expand All @@ -199,10 +211,7 @@ def develop(args):
"BASE_DIR=/mtriage",
get_env_config(),
"--privileged",
"-v",
"{}:/mtriage".format(DIR_PATH),
"-v",
"{}/.config/gcloud:/root/.config/gcloud".format(HOME_PATH),
*volumes,
"{}:{}".format(NAME, TAG_NAME),
"/bin/bash",
],
Expand Down Expand Up @@ -248,7 +257,7 @@ def run(args):
"-v",
"{}/media:/mtriage/media".format(DIR_PATH),
"-v",
"{}/credentials:/mtriage/credentials".format(DIR_PATH),
"{}/data:/mtriage/data".format(DIR_PATH),
"-v",
"{}:/run_args.yaml".format(yaml_path),
"-v",
Expand All @@ -274,6 +283,9 @@ def run(args):
args,
)

if not args.persist:
clean(args)


def parse_args(cli_args):
parser = argparse.ArgumentParser(description="mtriage dev scripts")
Expand All @@ -285,6 +297,7 @@ def parse_args(cli_args):
run_p.add_argument("--gpu", action="store_true")
run_p.add_argument("--dry", action="store_true")
run_p.add_argument("--dev", action="store_true")
run_p.add_argument("--persist", action="store_true")

dev_p = subparsers.add_parser("dev")
dev_p.add_argument("--whitelist")
Expand All @@ -293,6 +306,7 @@ def parse_args(cli_args):
dev_p.add_argument("--gpu", action="store_true")
dev_p.add_argument("--dry", action="store_true")
dev_p.add_argument("--verbose", action="store_true")
dev_p.add_argument("--yaml", type=str2yamlfile)
dev_p.add_argument(
"command",
choices=["develop", "build", "test", "clean"],
Expand Down
1 change: 1 addition & 0 deletions credentials/.gitignore → data/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
**/*
!.gitignore
!demo/
1 change: 1 addition & 0 deletions data/demo/1local/1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is a simple text file.
3 changes: 3 additions & 0 deletions data/demo/1local/2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Markdown example

The __tiniest__ bit less simple than a txt file.
1 change: 1 addition & 0 deletions data/demo/1local/3.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added data/demo/2audio/coffee.m4a
Binary file not shown.
15 changes: 9 additions & 6 deletions docs/components/youtube.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
# Selector: `youtube`
# Configuring the Youtube selector

In order to run the youtube selector, mtriage requires a Google Cloud Platform service account.
In order to run the Youtube selector, mtriage requires a Google Cloud Platform
API key.

1. Create a service account and download a 'credentials.json' from the [credentials page](https://console.cloud.google.com/apis/credentials) on the Google Cloud console by creating a service account, and downloading a JSON version of the service acount key.
2. Move the downloaded JSON file to 'credentials/google.json. This file is
gitignored, and so will not be pushed to any remotes.
3. In the service account API settings on Google Cloud Console in the browser, enable the "Youtube Data v3 API".
1. Create a new project in GCP, and in the [credentials
page](https://console.cloud.google.com/apis/credentials), enable the
'Youtube Data V3' API.
2. Create a new API key, ensuring that it has access to the Youtube V3 API.
3. In the '.env' file in mtriage's root folder, add the line
`GOOGLE_API_KEY=xxxxx`, replacing 'xxxxx' with your downloaded API key.
102 changes: 0 additions & 102 deletions docs/getting-started.md

This file was deleted.

34 changes: 34 additions & 0 deletions docs/install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Install

Mtriage is a tool developed at [Forensic Architecture](https://forensic-architecture.org) to orchestrate complex workflows that download media of various kinds, analyse them, and visualise results. To understand what mtriage can do, this tutorial will briefly outline the different components of an mtriage workflow, and then show you how to create one that analyses Youtube videos frame-by-frame with a Resnet50 object detection classifer pretrained on ImageNet. To conclude, we'll briefly touch on how much more mtriage is capable of, and show how it is easy to extend mtriage to analyse other kinds of media.

## Architecture
Mtriage has three kinds of components: selectors, analysers, and viewers. Each component manages a different stage of a workflow:
* **Selector** - indexes and then downloads media from a source, such as Youtube. Selectors can implement their own web scraping techniques, or simply leverage the search functionality of online platforms in order to return results. The Youtube selector, for example, takes as input a search terms and two dates (start and end), returning all the videos that Youtube returns for the search term that was uploaded between the two given dates.
* **Analyser** - analyses media that has been downloaded from a selector, producing *derived* media that contain the analysis results. An analyser may produce media that is of a different kind than its input media. The frames analyser, for example, takes a video as input and produces a set of images (one frame for each second, say) as output.
* **Viewer** - visualises derived media in an interactive website. Viewers make mtriage an end-to-end tool for analysing media, as they mean that you can present results, and even create interactive workstations, directly from analysis results.

## Downloading Mtriage
Start by cloning the source code:

```bash
git clone https://github.com/forensic-architecture/mtriage.git
```

Mtriage has two primary dependencies: [Python](https://www.python.org/) 3 and [Docker CE](https://docs.docker.com/install/). Mtriage will _probably_ work with Python 2.x as well, but it's untested. If you have a CUDA GPU, you can use [Nvidia Docker](https://github.com/NVIDIA/nvidia-docker) instead of Docker to make certain analysers more performant.

Once you have Python and Docker installed, install the three dependencies in requirements.txt. Two of these are for testing (pytest and black); the only runtime dependency is [pyyaml](https://pyyaml.org/).
```
cd mtriage
pip install -r requirements.txt
```

Run the test suite to ensure that everython is working. This command may take a while, as the first time you run mtriage it will download the [latest Docker image](https://cloud.docker.com/u/forensicarchitecture/repository/docker/forensicarchitecture/mtriage). Mtriage commands will run much faster after this first one:

```bash
./mtriage dev test
```

Assuming this command completed and all the tests passed, you are now ready to run mtriage workflows!


2 changes: 1 addition & 1 deletion docs/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ We developed mtriage to address the insufficiency in machine learning tooling fo

Mtriage is open source and in active development. This means that everyone can not only use mtriage in their own research, but also that community contributions (of a new classifier, or a new media source) can potentially be made available to all other users as upstream contributions.

To get started with mtriage, check out [Getting Started](docs/getting-started.md).
To get started with mtriage, check out [Getting Started](docs/getting-started.md).
2 changes: 1 addition & 1 deletion docs/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ using the locally installed Python environment.

To run all tests, use the following command:
```
./mtriage run test
./mtriage dev test
```

See docs/custom-components.md for more information on how to write tests for
Expand Down
6 changes: 6 additions & 0 deletions docs/tutorial/1/1a.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
folder: media/demo_official/1
select:
name: Local
config:
source: data/demo/1local
# aggregate: true
9 changes: 9 additions & 0 deletions docs/tutorial/1/1b.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
folder: media/demo_official/1
elements_in:
- Local
analyse:
name: ExtractTypes
config:
exts:
- txt
- md
11 changes: 11 additions & 0 deletions docs/tutorial/1/1c.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
folder: media/demo_official/1
select:
name: Local
config:
source: data/demo/1local
analyse:
name: ExtractTypes
config:
exts:
- txt
- md
Loading