![Python Reinforcement Learning](https://wfqqreader-1252317822.image.myqcloud.com/cover/708/36698708/b_36698708.jpg)
State-action value function (Q function)
A state-action value function is also called the Q function. It specifies how good it is for an agent to perform a particular action in a state with a policy π. The Q function is denoted by Q(s, a). It denotes the value of taking an action in a state following a policy π.
We can define Q function as follows:
![](https://epubservercos.yuewen.com/F4348E/19470379901496006/epubprivate/OEBPS/Images/36a6e230-3432-4477-810c-5dca66d8669d.png?sign=1739291415-5jKaBGEc4Z7vQPAd3dAsRcuwXPU9hUqu-0-aa79718d839590d425c6e64b6858f3aa)
This specifies the expected return starting from state s with the action a according to policy π. We can substitute the value of Rt in the Q function from (2) as follows:
![](https://epubservercos.yuewen.com/F4348E/19470379901496006/epubprivate/OEBPS/Images/0424410a-24d5-4fc3-859d-f201929b41c3.png?sign=1739291415-I1H0PDJhqwlAhf7lM5cmaCDIfSCy28Hb-0-cb6c09c32a04401aadd6a0a99463aa9d)
The difference between the value function and the Q function is that the value function specifies the goodness of a state, while a Q function specifies the goodness of an action in a state.
Like state value functions, Q functions can be viewed in a table. It is also called a Q table. Let us say we have two states and two actions; our Q table looks like the following:
![](https://epubservercos.yuewen.com/F4348E/19470379901496006/epubprivate/OEBPS/Images/3.jpg?sign=1739291415-D2lO9u5BHUjssyHwaM8eUDcRZcRk5Vuo-0-9d97c46fe8ab2f2a12445de1daa2ad7e)
Thus, the Q table shows the value of all possible state action pairs. So, by looking at this table, we can come to the conclusion that performing action 1 in state 1 and action 2 in state 2 is the better option as it has high value.
Whenever we say value function V(S) or Q function Q( S, a), it actually means the value table and Q table, as shown previously.