Gradient Descent Method Interview Questions - Deep Learning - Avatto

What are some of the adaptive methods of gradient descent?

Adaptive methods of gradient descent include the following:

1) Adagrad
2) Adadelta
3) RMSProp
4) Adam
5) Adamax
6) AMSGrad
7) Nadam

How can we set the learning rate adaptively?

We can set the learning rate adaptively using adagrad. Using adagrad method, we assign a high learning rate when the previous gradient value is low and we assign a low learning rate when the previous gradient value is high. This makes the learning rate to change adaptively based on the past gradient updates.

Can we get rid of the learning rate?

Yes, we can get rid of the learning date using adadelta.

How does Adam optimizer differ from RMSProp?

In Adam, we compute the running average of squared gradients as we compute in RMSProp. But instead of computing only the running average of squared gradients, we also compute the running average of gradients, That is, Adam uses both first and second-order moments of the gradients.

Why do we need AMSGrad?

Due to the exponentially moving the average of gradients, Adam fails to reach convergence and may reach the sub-optimal solution instead of the globally optimal solution. This happens because when we use the exponentially moving average of gradients, we miss the information about the gradients that occur less frequently. So, to combat this issue, we use AMSGrad.

Deep Learning Interview Questions - Gradient Descent Method

What are some of the adaptive methods of gradient descent?

How can we set the learning rate adaptively?

Can we get rid of the learning rate?

How does Adam optimizer differ from RMSProp?

Why do we need AMSGrad?