CNN Interview Questions | Convolution Neural Network Interview Questions - Avatto -

Explain the architecture of the VGG net.

The architecture of VGG consists of convolutional layers followed by a pooling layer. VGG net uses 3 x 3 convolution and 2 x 2 pooling. It is often referred to as VGG-n and the n corresponds to a number of layers, excluding the pooling and softmax layer.

Why do we use multiple filters of varying sizes in the inception network?

Let’s suppose we are performing an object detection task. The object can appear anywhere in the image. That is, the object can be in the center region of the image, or it can be in the small corner of the image. Also, the shape of the object can vary from image to image. In some images, the object takes large shape while in other images the object takes small shape. Since the object in the image varies greatly in the image in terms of size and location, it is difficult to identify the object in the image if we use only a single filter with a fixed size. So in the inception network, we use multiple filters of varying sizes.

How the inception blocks are placed?

The inception network contains nine inception blocks. These nine inception blocks are stacked one above the other. First, we take the input image and we perform the convolutional operation with three filters of varying size which includes 1 x 1, 3 x3, and 5 x5. Then we feed the result of the convolutional operation to the next inception block.

Why 1 x 1 convolution is useful?

1x1 convolution implies that we use one filter of size 1 x 1. It is widely used for reducing the number of depth channels.

What is called factorized convolution?

We can break down a convolutional layer with a larger filter size into a stack of convolutional layers with smaller filter size and this is known as factorized convolution.
Suppose, we have a convolutional layer with a 5 x 5 filter then it can be broken down into two convolutional layers with 3 x 3 filters.

Deep Learning Interview Questions - Convolution Neural Networks

Explain the architecture of the VGG net.

Why do we use multiple filters of varying sizes in the inception network?

How the inception blocks are placed?

Why 1 x 1 convolution is useful?

What is called factorized convolution?