
Keras activations
At the moment, our network is composed of a flattened input layer, followed by a sequence of two dense layers, which are fully connected layers of neurons. The first two layers employ a Rectified Linear Unit (ReLU) activation function, which plots out a bit differently than the sigmoid we saw in Chapter 2, A Deeper Dive into Neural Networks. In the following diagram, you can see how some of the different activation functions that are provided by Keras plot out. Remember, picking between them requires an intuitive understanding of the possible decision boundaries that may help with or hinder the partitioning your feature space. Using the appropriate activation function in conjunction with ideally initialized biases can be of paramount importance in some scenarios, but trivial in others. It is always advisable to experiment, leaving no stone unturned:

The fourth (and last) layer in our model is a 10-way Softmax layer. In our case, this means it will return an array of ten probability scores, all of which will add up to 1. Each score will be the probability that the current digit image belongs to one of our output classes. Hence, for any given input, a layer with the Softmax activation computes and returns the class probability of that input, with respect to each of our output classes.