Gradient descent is the most popular and widely used optimization algorithms used for training neural networks. Yes, Gradient descent is the first-order optimization method because with gradient descent we calculate only the first-order derivative.
Gradient descent is an optimization method used for training the network. First, we compute the derivatives of the loss function with respect to the weights of the network and then update the weights of the network using the below update rule:
When the learning rate is small then we take a very small step and it slows down attaining the convergence and when the learning rate is large then we take a very large step and it may cause us to miss out on the global minimum.