Neural Networks

Given a new sample, we denote it by

where the first element is the bias term and the others are the feature values.

Binary problem

Consider a binary classification task with a positive class and a negative class.

Denote the nodes in the hidden layer by and the incoming weights to by

Then

and

where is an activation function of your choice.

Using similar notations, we have

and the probability that the new sample is positive is
Multiclass problem

Consider a multiclass classification task with classes .

Using the same notation as above, we have

Then, define

We then get

and the probability that the new sample belongs to class is
Activation functions

In both cases, is a vector containing all weights, and is a constant that determines the strength of regularization.

num_hidden_units: the number of units in the hidden layer
activation: the activation function for the hidden layer
solver: learning algorithm used to optimize the loss function
penalty: regularization strength (i.e. larger values lead to stronger regularization.)
batch_size: the number of samples in each batch used in stochastic optimization
learning_rate: learning rate schedule for weight updates
- constant: uses constant rate given by learning_rate_init.
- invscaling: the learning rate gradually decreases from the initial rate given by learning_rate_init.
- adaptive: the learning rate is divided by 5 only when two consecutive iterations fail to decrease the loss. The initial rate is given by learning_rate_init.
learning_rate_init: the initial learning rate
early_stopping: whether to terminate learning if validation score fails to improve

Stopping criteria:

tol: minimum reduction in loss required for optimization to continue.
max_iter: maximum number of iterations allowed for the learning algorithm to converge.

Check out the documentation listed below to view the attributes that are available in sklearn but not exposed to the user in the software.

Further readings

sklearn tutorial on neural networks.

sklearn MLPClassifier documentation.

Stanford CS231n lecture note on neural networks.

Provide feedback