# Markov decision process

At each time step, the process is in some state ${\displaystyle s}$, and the decision maker may choose any action ${\displaystyle a}$ that is available in state ${\displaystyle s}$. The process responds at the next time step by randomly moving into a new state ${\displaystyle s'}$, and giving the decision maker a corresponding reward ${\displaystyle R_{a}(s,s')}$.
The probability that the process moves into its new state ${\displaystyle s'}$ is influenced by the chosen action. Specifically, it is given by the state transition function ${\displaystyle P_{a}(s,s')}$. Thus, the next state ${\displaystyle s'}$ depends on the current state ${\displaystyle s}$ and the decision maker's action ${\displaystyle a}$. But given ${\displaystyle s}$ and ${\displaystyle a}$, it is conditionally independent of all previous states and actions; in other words, the state transitions of an MDP satisfy the Markov property.