Skip to content

Dimensionality Reduction with a Stacked Contractive Autoencoder

Dajana Müller edited this page Feb 23, 2022 · 3 revisions

Stacked Contractive Autoencoder

To overcome resource limitations when training convolutional neural networks, reducing the dimensionality of high dimensional data such as infrared microscopic images is a useful approach. Here, the stacked autoencoder can have N layers, where each layer is trained separately using an autoencoder with one layer at a time. The weights of the trained encoder part is then transfered to a fully connected encoder to perform dimensionality reduction of the whole dataset. To train the autoencoders, a contractive loss is used which is a mean-squared error with an additive regularization term. The penalty term is the Frobenius norm of the Jacobian matrix. It enables to learn a robust representation which is less sensitive to small variations in the data.

Data preparation

To train a stacked contractive autoencoder, only a small percentage of your whole dataset is need. The 3-dimensional data (X height ,Y width ,Z depth ) has to be reshaped to a vector (X height*Y width ,Z depth ), np.ndarray, np.float32. For example if you are working with spectral data your vector has to be reshaped to (X * Y , 427 WN). The data can L2-normalized within the training if wanted.

Example data

import numpy as np
data = np.random.uniform(low=0.001, high=0.0099, size=(1000,427))
val = np.random.uniform(low=0.001, high=0.0099, size=(200,427))
test = np.random.uniform(low=0.001, high=0.0099, size=(200,427))


A minimal guide to train a stacked contractive autoencoder is given below. All weights of each encoder will be saved in the ./logs/ folder. The amount of autoencoders are defined by the number of hidden layers: e.g. hidden_layer = [100, 50, 25] == yields 3 stacked autoencoder (SAE). The first SAE has only one hidden layer of size 100. After training this SAE for num_epochs, the dimensionality of the training data will be reduced to 100 dimensions and serves as the input training data for the second SAE with a hidden layer of size 50 (...).

from openvibspec.scae import train_SCAE
import os

fpath = os.getcwd()                                             #Define a saving path
enc_model = train_SCAE(data,                                    #Train each AE with N of epochs 
                       val,                                     #Enc_model is the fully connected encoder with the trained weights
                       hidden_layer = [100, 50, 25],            #List of hidden layers: E.g. [100,50,25] yiels 3 SCAE 
                       num_epochs=5)                            #Number of epochs,
                       batch_size = 50,                         #Batch size
                       learning_rate = 0.003,                   #Learning rate, 0.003 recommended by (1.5 Mil, 427) training data
                       early_stop_epochs = 200,                 #Number of epochs for early stopping
                       l2_normalize_data = False                #Data can be L2 normalized if wanted



To validate an independent dataset or perform dimensionality reduction on the whole dataset, the data has to reshapes as mentioned in "Data preparation". If l2_normalize_data was set to True while Training, the test data has to be manually L2-normalized before prediction.

prediction_test = enc.predict(test)

The final encoder model will be saved in the ./logs/ folder and can be loaded with:

import tensorflow as tf
encoder = tf.keras.models.load_model("./logs/model_encoder")