Multi-armed bandit problems (often monickered ‘bandit problems’) are a well studied field of reinforcement learning.
Dipendra et. al introduce the concept of “contextual bandit” in their approach to training a reinforcement learning agent in their first instruction following paper.
- briefly summarize/remind readers of the definition of a multi-armed bandit
- introduce the concept of a contextual bandit
- talk about the limitations (i.e. reward shaping function has to have a “potential function” property to prove convergence in training)
- briefly discuss some new approaches to immediate rewards - “curiosity” paper