
Representational learning
As we saw earlier with our perceptron experiments on the TensorFlow Playground, sets of artificial neurons seem to be capable of learning fairly simple patterns. This is nothing remotely close to the sort of complex representations we humans can perform and wish to predict. However, we can see that, even in their nascent simplicity, these networks seem to be able to adapt to the sort of data we provide them with, at times even outperforming other statistical predictive models. So, what's going on here that is so different than previous approaches to teach machines to do things for us?
It can be very useful to teach a computer what skin cancer looks like, simply by showing it the vast number of medically relevant features that we may have. Indeed, this is what our approach has been toward machines thus far. We would hand-engineer features so that our machines could easily digest them and generate relevant predictions. But why stop there? Why not just show the computer what skin cancer actually looks like? Why not show it millions of images and let it figure out what is relevant? Indeed, that's exactly what we try to do when we speak of deep learning. As opposed to traditional Machine Learning (ML) algorithms, where we represent the data in an explicitly processed representation for the machine to learn, we take a different approach with neural networks. Here, what we actually wish to achieve is for the network to learn these representations on its own.
As shown in the following diagram, a network achieves this by learning simple representations and using them to define more and more complex representations in successive layers, until the ultimate layer learns to represent output classes accurately:

This approach, as it turns out, can be quite useful for teaching your computer to detect complex movement patterns and facial expressions, just as we humans do. Say you want it to accept packages on your behalf when you're away, or perhaps detect any potential robbers trying to break in to your house. Similarly, what if we wanted our computer to schedule our appointments, find potentially lucrative stocks on the market, and keep us up to date according to what we find interesting? Doing so involves processing complex image, video, audio, text, and time-series data, all of which come in complex dimensional representations, and cannot be modeled by just a few neurons. So, how do we work with the neural learning system, akin to what we saw in the last chapter? How do we make neural networks learn the complex and hierarchical patterns in eyes, faces, and other real-world objects? Well, the obvious answer is that we make them bigger. But, as we will see, this brings in complexities of its own. Long story short, the more learnable parameters you put in a network, the higher the chance that it will memorize some random patterns, and hence will not generalize well. Ideally, you want a configuration of neurons that perfectly fits the learning job at hand, but this is almost impossible to determine a priori without performing experiments.