-
Notifications
You must be signed in to change notification settings - Fork 8
Dimensionality Reduction with a Stacked Contractive Autoencoder
To overcome resource limitations when training convolutional neural networks, reducing the dimensionality of high dimensional data such as infrared microscopic images is a useful approach. Here, the stacked autoencoder can have N layers, where each layer is trained separately using an autoencoder with one layer at a time. The weights of the trained encoder part is then transfered to a fully connected encoder to perform dimensionality reduction of the whole dataset. To train the autoencoders, a contractive loss is used which is a mean-squared error with an additive regularization term. The penalty term is the Frobenius norm of the Jacobian matrix. It enables to learn a robust representation which is less sensitive to small variations in the data.
To train a stacked contractive autoencoder, only a small percentage of your whole dataset is need. The 3-dimensional data (X height ,Y width ,Z depth ) has to be reshaped to a vector (X height*Y width ,Z depth ), np.ndarray, np.float32
. For example if you are working with spectral data your vector has to be reshaped to (X * Y , 427 WN). The data can L2-normalized within the training if wanted.
import numpy as np
data = np.random.uniform(low=0.001, high=0.0099, size=(1000,427))
val = np.random.uniform(low=0.001, high=0.0099, size=(200,427))
test = np.random.uniform(low=0.001, high=0.0099, size=(200,427))
A minimal guide to train a stacked contractive autoencoder is given below. All weights of each encoder will be saved in the ./logs/ folder. The amount of autoencoders are defined by the number of hidden layers: e.g. hidden_layer = [100, 50, 25]
== yields 3 stacked autoencoder (SAE). The first SAE has only one hidden layer of size 100. After training this SAE for num_epochs
, the dimensionality of the training data will be reduced to 100 dimensions and serves as the input training data for the second SAE with a hidden layer of size 50 (...).
from openvibspec.scae import train_SCAE
import os
fpath = os.getcwd() #Define a saving path
enc_model = train_SCAE(data, #Train each AE with N of epochs
val, #Enc_model is the fully connected encoder with the trained weights
fpath,
hidden_layer = [100, 50, 25], #List of hidden layers: E.g. [100,50,25] yiels 3 SCAE
num_epochs=5) #Number of epochs,
batch_size = 50, #Batch size
learning_rate = 0.003, #Learning rate, 0.003 recommended by (1.5 Mil, 427) training data
early_stop_epochs = 200, #Number of epochs for early stopping
l2_normalize_data = False #Data can be L2 normalized if wanted
)
enc.summary()
To validate an independent dataset or perform dimensionality reduction on the whole dataset, the data has to reshapes as mentioned in "Data preparation". If l2_normalize_data
was set to True
while Training, the test data has to be manually L2-normalized before prediction.
prediction_test = enc.predict(test)
The final encoder model will be saved in the ./logs/ folder and can be loaded with:
import tensorflow as tf
encoder = tf.keras.models.load_model("./logs/model_encoder")
Correspondence:
Prof. Dr. Axel Mosig, Bioinformatics Group, Ruhr Universität Bochum, Germany