Contextual Bandits - what are they? (draft)

DRAFT

Multi-armed bandit problems (often monickered ‘bandit problems’) are a well studied field of reinforcement learning.

Dipendra et. al introduce the concept of “contextual bandit” in their approach to training a reinforcement learning agent in their first instruction following paper.

TODO:

briefly summarize/remind readers of the definition of a multi-armed bandit
introduce the concept of a contextual bandit
- talk about the limitations (i.e. reward shaping function has to have a “potential function” property to prove convergence in training)
briefly discuss some new approaches to immediate rewards - “curiosity” paper