Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gene retrieval via API #7

Open
wants to merge 49 commits into
base: main
Choose a base branch
from
Open

Conversation

Vincent-Ustach
Copy link

Copy code from protein retrieval example notebook into importable methods.

Wrap methods in fastapi app

retrieval can also be performed from new script that uses the same methods

Copy link
Collaborator

@rcalef rcalef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Vinnie, thanks so much for helping us out by sharing your changes! This seems like a great way for users to interact with the pre-trained model with reduced overhead for ongoing experimentation.

I left a couple small comments and questions, but only one other high-level ask: Would it be possible to add a README.md to the procyon/app folder containing some basic instructions? I think this could essentially be exactly the same text as the main docstring in procyon/app/main/py, just in a more easily discoverable place.

Otherwise, looks great! I think you may need to rebase on top of some our more recent changes (let me know if you have any questions about any merge conflicts), then should be good to squash merge into main!

def startup_retrieval(
inference_bool: bool = True,
) -> Tuple[
Union[UnifiedProCyon, None],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd been using e.g. Optional[UnifiedProCyon], so googled what's preferred, and it appears the current best practice for a value that could be None is to express it as e.g. UnifiedProCyon | None


logger.info("Now quantizing the model to a smaller precision")
model.bfloat16() # Quantize the model to a smaller precision
logger.info("Done quantizing the model to a smaller precision")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly personal opinion, but the logging seems quite verbose. It might be nice to log some of these messages at the debug level instead and add a command-line option to turn them on. I haven't used loguru before but it looks like some combination of logger.add or logger.remove should do it

retrieval=True,
aaseq_type="protein",
)
# The script can run up to here without a GPU, but the following line requires a GPU
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if running this on the GPU is a headache. If so, we can change get_proteins_from_embedding to operate on a specified device, I don't think it has to run on the GPU

task_desc_infile (Path): The path to the file containing the task description.
disease_desc_infile (Path): The path to the file containing the disease description.
instruction_source_dataset (str): Dataset source for instructions - either "disgenet" or "omim"
inference_bool (bool): OPTIONAL; choose this if you do not intend to do inference
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand, what's the use case for inference_bool being False here? I may just be missing something, but not really seeing what benefit it has other than perhaps testing that the CLI works

if __name__ == "__main__":
"""
This API endpoint will allow users to perform protein retrieval for a given disease description using the
pre-trained ProCyon model Procyon-Full.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pre-trained ProCyon model Procyon-Full.
pre-trained ProCyon model ProCyon-Full.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants