This letter investigates a two-player coordination game in which the players exhibit heterogeneous levels of bounded rationality. We analyze the log-linear learning dynamics where the probability distribution used to select which of the agents gets to revise its strategy is fixed but not necessarily uniform. The stationary distribution of the resulting Markov chain on the strategy profile space is derived in closed-form as a function of the rationalities and the agent selection probabilities. We proceed by showing that adjusting the selection probabilities can be used to bias the stationary distribution toward the potential-maximizing state. However, this optimization comes at the cost of a reduced convergence rate, whereas the uniform selection probabilities uniquely maximizes the convergence speed irrespective of the players’ rationality levels. A Pareto-optimal probability selection rule is proposed, trading-off the distributional bias with convergence rate. Moreover, it is shown that in coordination games, high levels of rationality sometimes accelerate convergence, whereas in other cases they may paradoxically hinder the convergence rate of log-linear learning dynamics.