Predicting and explaining the impact of genetic disruptions and interactions on cell and organismal viability

This repositroy contains all the source code necessary to reproduce the results of our paper, "Predicting and explaining the impact of genetic disruptions and interactions on cell and organismal viability".

Live GI Prediction Tool

File Description

Dataset Curation

The following files are responsible for extracting features and tasks from raw bioinformatic data.

create_ppc.py creates the protein-protein interaction networks for the budding yeast, fission yeast, human, and fruit fly.
create_tasks.py generates the single-, double-, and triple-mutant tasks studied in the paper. It assumes that create_ppc.py has already been executed.
create_features.py creates the single and pairwise gene features for all four organisms. This requires the owltool application from geneontology.org to be present in ../tools, and requires an NCBI-Blast+ installation (on Ubuntu, this can be installed via sudo apt install ncbi-blast+).
create_datasets.py combines features and tasks into one csv file, for each task. GI and Triple GI tasks only include the pairwise features as it would be too much to include the features of individual genes. For those tasks, the GI and Triple GI models require the single gene feature files as well.
create_pseudo_triplets_task.py creates randomly sampled pseudo triplets within- and across-complexes.
explore_hybrid_costanzo.ipynb examines the overlap between costanzo and Biogrid datasets.

The above-mentioned files require original third-party data files to be present in the ../data-sources directory (e.g., datasets such as BioGRID, uniprot, etc.). Since there are many original data files required and due to the difficulty of downloading and placing them in the right organization, we provide, a zip file containing all the processed data necessary to replicate the analyses and modeling experiments. Thus, the user doesn't need to deal with original third-party data. The file can be downloaded here.

After download, unzip the contents of the file into ../generated-data directory.

Creating Additional Datasets

After downloading and extracting the processed data files, the following scripts should be executed.

create_mn_datasets.py creates datasets for the MN models, based on those generated by create_datasets.py.
create_splits.py creates all the cross-validation splits for all tasks studied in the paper. This includes the development/test splits for yeast.
figures.ipynb produces all the non-modeling figures in the paper.

Experiment Scripts

The following files can reproduce the results of the modeling experiments of the paper. You can just run them, there are no arguments or parameters to pass.

exp_optimize_hyperparams.py runs the hyper parameter optimization experiments on the single- and double-gene neural network models. Produces supplementary table 1.
exp_feature_selection.py runs feature selection experiments on the development portion of the budding yeast datasets. Produces supplementary tables 2, 4, 7.
exp_yeast_smf.py evaluates the S-Full, S-Refined, S-MN, and null models on the development portion (CV) and test portions of the yeast SMF dataset. Produces Figure 1A.
exp_yeast_gi_hybrid.py evaluates the D-Full, D-Refined, D-MN, and null models on the development (CV) and test portions of the yeast hybrid GI dataset. Produces Figure 2A.
exp_yeast_tgi.py evaluates the T-Full, T-Refined, T-MN, and null models on the development (CV) and test portions of the yeast triple mutant GI dataset. Produces Figure 3A.
exp_smf_binary.py evaluates the S-Refined, S-MN, and null models on the SMF datasets of all four organisms, as well as the multi organismal lethal (MO) vs. viable (V) dataset of humans and fruit flies. Training and evaluation is done using CV. Produces Figure 4.
exp_gi_binary.py evaluates the D-Refined, D-MN (with and without slim GO terms), and the null models on the GI datasets of all four organisms. Produces Figure 5.
exp_gi_costanzo_pombe.py evaluates the 4-way D-Full, D-Refined, D-MN, and null models on the yeast Costanzo GI dataset, and the D-Refiend, D-MN, and null models on the pombe GI dataset. Produces Supplementary Figure 5.
exp_smf_other_orgs.py evaluates the 3-way S-Refined, S-MN, and null models on the pombe, human, and fruit fly SMF datasets. Produces Supplementary Figure 7.
exp_smf_ca_mo_v.py evaluates S-Refined, S-MN, and null models on the task of predicting cellular autonomous lethality (CA) vs multi-organismal lethality (MO) vs viability (V) in humans and fruit flies. Produces Supplementary Figure 8.
exp_lit.py compares the binary S-MN and D-MN models to other single-mutant fitness models from literature on the yeast SMF and hybrid GI datasets. Produces Supplementary Figure 9.
exp_cross_prediction.py trains the D-MN model on the GI prediction task on the yeast hybrid GI dataset, and evaluates it on the task of predicting GI, coprecipitation, phosphorylation, and transcription. Produces Supplementary Figure 10.
exp_mn_feature_contribution.py computes the drop in balanced accuracy when each feature of the S-MN, D-MN, and T-MN models is removed. Produces Supplementary Figure 11.
exp_generalization.py trains S-MN and D-MN models on the yeast SMF and hybrid GI datasets, and evaluates them on the other organisms' datasets. Produces Supplementary Figure 12.

Model Configuration Files

Those reside under cfgs/ and specify the configuration of the NN and MN models used in the paper. Note that those configuration specify the models in their "full" form, the refined variants and created dynamically from those files in the experiment scripts above.

Models

Tensorflow model classes reside under models/. In addition to the neural network and MN models, a helper module, train_and_evaluate.py is provided to carry out CV training and evaluation. The module takes advantage of multiple cores to run several splits at the same time.

Dependencies

The code was tested with the follow modules:

sklearn 1.0.2
dcor 0.5.3
Bio 1.79
igraph 0.9.10
matplotlib 3.5.1
networkx 2.8
numpy 1.22.3
obonet 0.3.0
pandas 1.4.2
scipy 1.8.0
seaborn 0.11.2
statsmodels 0.13.2
tensorflow 2.8.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting and explaining the impact of genetic disruptions and interactions on cell and organismal viability

File Description

Dataset Curation

Creating Additional Datasets

Experiment Scripts

Model Configuration Files

Models

Dependencies

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 300 Commits
cfgs		cfgs
feature_preprocessing		feature_preprocessing
models		models
ppc_creation		ppc_creation
pretasks		pretasks
tasks		tasks
utils		utils
website		website
.gitignore		.gitignore
README.md		README.md
complex_pathway_analysis.py		complex_pathway_analysis.py
create_dataset.py		create_dataset.py
create_features.py		create_features.py
create_mn_datasets.py		create_mn_datasets.py
create_ppc.py		create_ppc.py
create_pseudo_triplets_task.py		create_pseudo_triplets_task.py
create_splits.py		create_splits.py
create_tasks.py		create_tasks.py
create_website_data.py		create_website_data.py
create_website_models.py		create_website_models.py
discover_imports.py		discover_imports.py
exp_cross_prediction.py		exp_cross_prediction.py
exp_feature_selection.py		exp_feature_selection.py
exp_generalization.py		exp_generalization.py
exp_gi_binary.py		exp_gi_binary.py
exp_gi_costanzo_pombe.py		exp_gi_costanzo_pombe.py
exp_interpretation.py		exp_interpretation.py
exp_lit.py		exp_lit.py
exp_mn_feature_contribution.py		exp_mn_feature_contribution.py
exp_optimize_hyperparams.py		exp_optimize_hyperparams.py
exp_smf_binary.py		exp_smf_binary.py
exp_smf_ca_mo_v.py		exp_smf_ca_mo_v.py
exp_smf_other_orgs.py		exp_smf_other_orgs.py
exp_yeast_gi_hybrid.py		exp_yeast_gi_hybrid.py
exp_yeast_smf.py		exp_yeast_smf.py
exp_yeast_tgi.py		exp_yeast_tgi.py
explore_hybrid_costanzo.ipynb		explore_hybrid_costanzo.ipynb
figure_auc_roc_curve.py		figure_auc_roc_curve.py
figure_cm.py		figure_cm.py
figure_cross_prediction.py		figure_cross_prediction.py
figure_cv_bacc.py		figure_cv_bacc.py
figure_dev_test_bacc.py		figure_dev_test_bacc.py
figure_feature_contribution.py		figure_feature_contribution.py
figures.ipynb		figures.ipynb

KISRDevelopment/cell_viability_paper

Folders and files

Latest commit

History

Repository files navigation

Predicting and explaining the impact of genetic disruptions and interactions on cell and organismal viability

File Description

Dataset Curation

Creating Additional Datasets

Experiment Scripts

Model Configuration Files

Models

Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages