-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype for storage API #7
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fastapi is a uncharted territory for me as well. Let's keep researching on it.
For reference, this PR addresses the "storage" side of clamsproject/aapb-evaluations#50 . |
…ters (hashed) in the nested directory structure.
…tted representation of the pipeline you're searching for
…ch a full filepath within the database) and cleaned up other parts of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally, looks good, I left some additional feature request/suggestion in the comments.
prototype/storage_api.py
Outdated
from typing import List, Dict | ||
from typing_extensions import Annotated | ||
import os | ||
import yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this yaml is used only for a limited purpose, namely to load config file. However, the existing code is based on .env
file and dotenv
module, so let's migrate to environment variable-based configuration to match the existing code.
prototype/storage_api.py
Outdated
return {"message": "Storage api for pipelined mmif files"} | ||
|
||
|
||
@app.route("/upload_mmif/", methods=["POST"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's think about the routings here. Existing "baapb resolver" codebase (api/__init__.py
) uses many routings, but only one of them (/searchapi/
) is a pure API landing point (others are attached to some web page with minimal front end implementation).
Eventually, we want to run a single server app that can handle "assets" (video files) as well as can handle "mmif", meaning this storage_api.py
should be merged into the api/
pacakge. To that end, I think we need some basic rules for naming these routing paths.
Since /searchapi
is already used in production server (for baapb://
resolution), how about
/upload_mmif
>>/storeapi/mmif
/retrieve
>>/searchapi/mmif
?
Note that changes in the routing names will also impact the client-side implementation (that you're currently working on).
prototype/storage_api.py
Outdated
# and dump the param dicts | ||
os.makedirs(directory, exist_ok=True) | ||
for path in param_path_dict: | ||
file_path = os.path.join(path, 'parameters.json') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this out, and figured that it would be more convenient for clients if we just store .mmif
files only in this directory, and have an additional api to return the entire absolute path of the directory (just like how baapb://
is resolved), so that when we run some code that uses pre-stored mmif files on the same machine where those files are, the client doesn't have to "download" the mmif files, but directly can access the directory (if exists) and use the files under.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if we store the parameter files in the "appversion" directory, and keeping only mmif files in the "hash" directory, clients can just take *
glob to read all mmif files, without worrying about any additional metadata-like files.
So instead of
storage-root/
├── app1
│ ├── v1
│ │ ├── hash1
│ │ │ ├── guid1.mmif
│ │ │ └── params.json
│ │ ├── hash2
│ │ │ ├── guid1.mmif
│ │ │ └── params.json
│ │ └── hash3
│ │ ├── guid1.mmif
│ │ └── params.json
│ └── v2
│ └── hash1
│ ├── guid1.mmif
│ └── params.json
└── app2
├── v1
│ └── hash1
│ ├── guid1.mmif
│ └── params.json
└── v2
└── hash1
├── guid1.mmif
└── params.json
,
storage-root/
├── app1
│ ├── v1
│ │ ├── hash1
│ │ │ └── guid1.mmif
│ │ ├── hash1.json
│ │ ├── hash2
│ │ │ └── guid1.mmif
│ │ ├── hash2.json
│ │ ├── hash3
│ │ │ └── guid1.mmif
│ │ └── hash3.json
│ └── v2
│ ├── hash1
│ │ └── guid1.mmif
│ └── hash1.json
└── app2
├── v1
│ ├── hash1
│ │ └── guid1.mmif
│ └── hash1.json
└── v2
├── hash1
│ └── guid1.mmif
└── hash1.json
prototype/storage_api.py
Outdated
pipeline = pipeline_from_param_json(data) | ||
# get number of views for rewind if necessary | ||
num_views = len(data['pipeline']) | ||
guid = data.get('guid') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mentioned you want to work on (or already on it?) multi-guid query scenario. As mentioned in other comment, how about also adding zero-guid query, to return just the full directory path?
…uration, changed route names, added "zero-guid" functionality to return just the absolute path for the pipeline, and changed directory level for storing parameter hashes (from hash dir to version number dir)
A few additional suggestions after using it for uploading in recent days.
|
…eprint) such that it can run in the same app as the baapb resolver.
Basic skeleton for a prototype storage api for MMIF files that creates nested subdirectories based on the
views
and correspondingmetadata
present.