Skip to content

Code repository for the paper "Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic"

Notifications You must be signed in to change notification settings

Bornelov-lab/Camformer

Repository files navigation

Camformer

Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic

Problem: Let $S = {A,C,G,T,N}^{110}$ denote a promoter sequence of length $110$. Here, $A$, $C$, $G$, $T$ are the four nucleotides and $N$ represents an unknown nucleotide. The gene expression prediction task is then to learn a mapping $f: S \to \mathbb{R}$.

Graphical abstract

Data: We use data from DREAM Challenge consisting of 7 million random promoter sequences and the yellow fluorescent protein level. We then use the official test set from the challenge to evaluate our trained model(s).

Model: A residual convolutional neural network, strategically optimised using automated hyperparameter tuning.

Search for a model

The figure above shows the structure of the original (large variant) model (16M parameters). There is an almost equally good model that has 90% less parameters (1.4M). Please see the associated manuscript (preprint) for more details.

Assessment: Predictive, comparative

Evaluating a trained model

Assessment: Explanatory, Scientific discovery

Evaluating a trained model for explanatory assessment

File information

Here are some details on what the purpose of each file is:

File Purpose
gen_figs.ipynb A notebook to show (re-generate) some figures in the manuscript.
train_rep.py Program to train several replicates of a Camformer model using training data.
score_rep.py Program to test several replicates of a trained Camformer model on test data.

Directory structure

Directory Contents
analysis Contains some basic analysis of results. Contents may be updated.
base Contains core codebase, utility functions, auxiliary helper files etc.
manuscript_figures Contains data, script and figures present in the manuscript.
readme_figs Images used to prepare this nice-looking README file.
saved_models Saved model weights and example code to run.

References

Relevant resources and previous Camformer repositories.

  1. Camformer repository (2022 version): DREAM2022 Submission
  2. DREAM 2022 Challenge Wiki Page
  3. Rafi et al., 2024: Paper Preprint
  4. Rafi et al., 2024: Data and Official Evaluation GitHub

About

Code repository for the paper "Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published