-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gene retrieval via API #7
base: main
Are you sure you want to change the base?
Conversation
add loguru and argparse retrieval script
quantize model before move to GPU
remove drugbank data
move protein embeddings loading to startup update logger messages
retrieval script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Vinnie, thanks so much for helping us out by sharing your changes! This seems like a great way for users to interact with the pre-trained model with reduced overhead for ongoing experimentation.
I left a couple small comments and questions, but only one other high-level ask: Would it be possible to add a README.md to the procyon/app
folder containing some basic instructions? I think this could essentially be exactly the same text as the main docstring in procyon/app/main/py
, just in a more easily discoverable place.
Otherwise, looks great! I think you may need to rebase on top of some our more recent changes (let me know if you have any questions about any merge conflicts), then should be good to squash merge into main!
def startup_retrieval( | ||
inference_bool: bool = True, | ||
) -> Tuple[ | ||
Union[UnifiedProCyon, None], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd been using e.g. Optional[UnifiedProCyon]
, so googled what's preferred, and it appears the current best practice for a value that could be None
is to express it as e.g. UnifiedProCyon | None
|
||
logger.info("Now quantizing the model to a smaller precision") | ||
model.bfloat16() # Quantize the model to a smaller precision | ||
logger.info("Done quantizing the model to a smaller precision") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly personal opinion, but the logging seems quite verbose. It might be nice to log some of these messages at the debug level instead and add a command-line option to turn them on. I haven't used loguru
before but it looks like some combination of logger.add
or logger.remove
should do it
retrieval=True, | ||
aaseq_type="protein", | ||
) | ||
# The script can run up to here without a GPU, but the following line requires a GPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if running this on the GPU is a headache. If so, we can change get_proteins_from_embedding
to operate on a specified device, I don't think it has to run on the GPU
task_desc_infile (Path): The path to the file containing the task description. | ||
disease_desc_infile (Path): The path to the file containing the disease description. | ||
instruction_source_dataset (str): Dataset source for instructions - either "disgenet" or "omim" | ||
inference_bool (bool): OPTIONAL; choose this if you do not intend to do inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand, what's the use case for inference_bool
being False here? I may just be missing something, but not really seeing what benefit it has other than perhaps testing that the CLI works
if __name__ == "__main__": | ||
""" | ||
This API endpoint will allow users to perform protein retrieval for a given disease description using the | ||
pre-trained ProCyon model Procyon-Full. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre-trained ProCyon model Procyon-Full. | |
pre-trained ProCyon model ProCyon-Full. |
Copy code from protein retrieval example notebook into importable methods.
Wrap methods in fastapi app
retrieval can also be performed from new script that uses the same methods