Q-learning
Model-free reinforcement learning algorithm / From Wikipedia, the free encyclopedia
Dear Wikiwand AI, let's keep it short, summarize this topic like I'm... Ten years old or a College student
Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations.
Part of a series on |
Machine learning and data mining |
---|
![]() |
For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.[1] Q-learning can identify an optimal action-selection policy for any given FMDP, given infinite exploration time and a partly-random policy.[1] "Q" refers to the function that the algorithm computes – the expected rewards for an action taken in a given state.[2]