
Adjusting network size
When we speak of a network's size, we simply mean the number of trainable parameters within the network. These parameters are defined by the number of layers in the network, as well as the number of neurons per each layer. Essentially, a network's size is a measure of its complexity. We mentioned how having too large a network size can be counterproductive and lead to overfitting. An intuitive way to think about this is that we should favor simpler representations over complex ones, as long as they achieve the same ends—sort of a lex parsimoniae, if you will. The engineers who design such learning systems are indeed deep thinkers. The intuition here is that you could probably have various representations of your data, depending on your network's depth and number of neurons per layer, but we will favor simpler configurations and only progressively scale a network if required, to prevent it from using any extra learning capacity to memorize randomness. However, letting our model have too few parameters may well cause it to underfit, leaving it oblivious to the underlying trends we are trying to capture in our data. Through experimentation, you can find a network size that fits just right, depending on your use case. We force our network to be efficient in representing our data, allowing it to generalize better out of our training data. Beneath, we show a few experiments that are performed while varying the size of the network. This lets us compare how our loss on the validation set differs per epoch. As we will see, larger models are quicker to diverge away from the minimum loss values, and they will start to overfit on our training data almost instantly:
