We feed the input image to the encoder which consists of a convolutional layer. The convolutional layer performs the convolution operation and extracts important features from the image.
Next, we apply the max pooling operation to retain only the important features of the image and obtain a latent representation of the image, called a bottleneck.
To the decoder, we feed the bottleneck as an input. The decoder consists of deconvolutional layers and it performs the deconvolution operation and tries to reconstruct the image from the bottleneck.
When the code or latent representation has the dimension higher than the dimension of the input then the autoencoder is called the overcomplete autoencoder. On the contrary, when the code or latent representation has the dimension lower than the dimension of the input then the autoencoder is called the undercomplete autoencoder.
The encoder takes the given input and outputs the low dimensional latent representation of the input. The decoder takes this low dimensional latent representation generated by the encoder as an input and tries to reconstruct the original input.
The autoencoders map the data of a high dimension data to a low-level representation. This low-level representation of data is called as latent representation or bottleneck. The bottleneck comprises of only meaningful and important features that represent the input.