autoencoder to compress methylation data and cancer cell classifier
This project consists of 3 main parts
- loading data with
load_data
folder: load methylation data from GDC database - preprocessor data with
preprocessor.py
: remove features withNA
data, remove low-variant features, and separate data of normal/cancer into separated files. - classifier with
model
folder: compress data and classify if it is cancer or notautoencoder.py
train anencoder
model used for compressionclassifier.py
train a classifier for compressed datapredict.py
uses the trained classifier and encoder to classify new data