Gradient Descent Method Interview Questions - Deep Learning - Avatto

What are numerical and analytical gradients?

Analytical gradients are the gradients we calculated through backpropagation and the numerical gradients are the numerical approximation to the gradients.

Explain gradient checking.

In gradient checking, first, we compute the analytical and approximated numerical gradients. Then we compare the analytical and numerical gradients. If they are not the same then there is an error with our implementation.

we don’t have to check whether analytical and numerical gradients are exactly the same since the numerical gradient is just an approximation. So, we compute the difference between the analytical and numerical gradients and if their difference is very small say 1e-7 then our implementation is correct else we have a buggy implementation.

Difference between convex and non convex function?

A function is called a convex function when it has only one minimum value and a function is called a non-convex function when it has more than one minimum value.

Why do we need stochastic gradient descent?

With gradient descent, we update the parameters of the model only after iterating through all data points present in our training set. Let’s say we have 10 million data points. Now, even to perform a single parameter update, we have to iterate through all the 10 million data points and then we update the parameter of the network. This is will be a very time-consuming task and takes us a lot of training time. So, to combat this drawback of gradient descent, we can stochastic gradient descent.

How stochastic gradient descent works?

With stochastic gradient descent, we don’t have to update the parameters of the network only after iterating through all data points present in our training set. Instead, we can update the parameter of the network after iterating through every single point in the training set.

Deep Learning Interview Questions - Gradient Descent Method

What are numerical and analytical gradients?

Explain gradient checking.

Difference between convex and non convex function?

Why do we need stochastic gradient descent?

How stochastic gradient descent works?