This code is used to benchmark model performance for high throughput cell line screens.
drugfeats.pkl
contains mOrdred descritpors without 3D features, that are scaled and imputedimputer.pkl
contains a dictionary {imputer, scaler} to be used for on the fly feature generationrnaseq.pkl
contains columns with label combo_auc.DRUG, combo_auc.AUC, combo_auc.CELLextended_combined_smiles
which matches smiles to combo_auc.DRUGcellpickle.pkl
contains RNAseq data frame with label lincs.CELL for mergingextended_combined_smiles
contains smiles matching combo_auc.DRUGextended_combined_mordred_descriptors
contains precomputed mordred descriptors for combo_auc.DRUGtestsmiles.smi
is a SMILES formated file with 100k random sample smiles for testing
Training is done either with --mode [graph, desc, image] (RNN SMILES coming soon). Use python train.py -h
for options.
For this benchmark the following commands were used:
python train.py --mode graph -o saved_models/graph_model.pt -w 32 -s cell
python train.py --mode desc -o saved_models/desc_model.pt -w 32 -s cell
python train.py --mode image -o saved_models/image_model.pt -w 32 -s cell
Again use python infer.py -h
to see all options.
For this benchmark the following commands were used:
python infer.py --mode graph -o saved_models/graph_model.pt -w 32 -g 2 --smiles_file data/testsmiles.smi --output_file saved_models/graph_infers.txt
python infer.py --mode desc -o saved_models/desac_model.pt -w 32 -g 2 --smiles_file data/testsmiles.smi --output_file saved_models/desc_infers.txt
python infer.py --mode image -o saved_models/image_model.pt -w 32 -g 2 --smiles_file data/testsmiles.smi --output_file saved_models/image_infers.txt