Abstract: Markov decision processes (MDPs) are widely used for modeling sequential decision-making problems under uncertainty. We propose an online algorithm for solving a class of average-reward MDPs ...