Reinforcement learning (RL) refers to a branch of machine learning that can be used to solve sequential decision-making problems. By utilising RL algorithms, an agent can learn to complete a complex task by learning which actions lead to favourable outcomes based on trial-and-error experience. A standard approach for applying RL techniques to real-world problems is to first test and study their performance on similar but easy-to-simulate problems. One such example that has received attention in the past is the popular seven-player strategy game “Diplomacy”.
RL techniques are already well-studied and well-understood when applied to two-player, zero-sum adversarial games with zero cooperation. Examples of these powerful algorithms are deep Q networks (DQNs) and deep deterministic policy gradient (DDPG) methods. In Diplomacy, however, cooperation (and ultimate betrayal) is usually necessary for victory. In the “Press” version of the game, communication is further allowed between players through natural language. To date, most research efforts were focused on the simpler “No-Press Diplomacy” problem. The aim of this project is to develop a computer-controlled agent through RL that can challenge human-level performance in a software version of “Press Diplomacy”.