The artificial neural network consists of one input, N number of hidden, and one output layer. When the artificial neural network consists of a large number of hidden layers then it is often called the deep neural network.

The transfer function is most commonly known as the activation function. It is mainly used for establishing non-linearity in the neural network. That is, it is mainly used for introducing the non-linear transformation in the neural network for learning the intricate patterns in the data.

The sigmoid activation function scales the value between 0 to 1 and it is centered at 0.5 whereas the tanh activation function scaled the value between -1 to 1 and it is centered at 0.

Suppose, x is given as an input to the ReLU function. If the value of x is less than 0 then the ReLU function returns 0 as output. If the value of x is greater than or equal to 0 then the ReLU function returns x as output. Thus, the ReLU function always returns 0 when the value of x is less than 0 (that is when x is a negative value) and this is often referred to as the dying ReLU problem.

To combat the dying ReLU problem, we use the Leaky ReLU. With the leaky ReLU activation function, we introduce a small slope for a negative value. That is, instead of always returning the 0 every time when x is a negative value, leaky ReLU returns the x multiplied by a small number called alpha. We usually set the value of alpha to 0.01. Thus, leaky ReLU returns x when x is greater than or equal to 0 and it returns x multiplied by alpha when x is less than 0.