The convolutional neural network uses the following three important layers:

Convolutional layer

Pooling layer

Fully connected layer

Convolutional layer

Pooling layer

Fully connected layer

We take the input matrix and one more matrix called the filter matrix. We slide the filter matrix over the input matrix by n number of pixels, perform element-wise multiplication, sum up the results, and produce a single number and this operation is known as convolution.

The matrix obtained as a result of convolution operation is often called activation maps or feature maps.

In the convolution operation, we take the filter matrix and slide it over the input matrix by n number of pixels, perform element-wise multiplication, sum the result, and produce a single number. The number of pixels we choose to slide the filter matrix over the input matrix is often called the stride.

If we set stride to a high value, then it takes us less time to compute but we might miss out on some important feature from the image. If we set stride to a low value, then we can learn the more detailed representation fo the image but it will take us a lot of time to compute.