Counter-Intuitive Effects of Q-Learning Exploration in a Congestion Dilemma

Exploration is an integral part of learning dynamics which allows algorithms to search a space of solutions.When many algorithms simultaneously explore, this can lead to counter-intuitive effects.This paper V-Neck contributes an analysis of the influence that exploration has on a multi-agent system of $Q$ -learners in a famous congestion dilemma, the Braess paradox.

I find ranges of the exploration rate for which $epsilon $ -greedy $Q$ -learners show chaotic and oscillatory dynamics which do not converge, and yield better than Nash equilibrium results.I decouple the dynamics endogenous to $Q$ -learning from the exogenous exploration Sweaters rate $epsilon $ , and find that $Q$ -learners implicitly coordinate with low exploration rates $epsilon in (0, 0.1)$ , but are disrupted in their coordination for larger exploration rates $epsilon > 0.

1$.The best implicit coordination leads to a 20% reduction in average travel times which approaches the social optimum.I discuss how our results may inform multi-agent algorithm design, fit within a cognitive science perspective of cognitive noise during learning, and provide a mechanistic hypothesis for the lack of empirical evidence of the Braess Paradox in traffic systems.

Leave a Reply

Your email address will not be published. Required fields are marked *