In this project we applied different reinforcement learning
algorithms and policies which include imitation learning, DQN,
DQFD, and AC3 to the Pommerman FFA competition challenge.
We were able to successfully perform as efficiently as
SimpleAgent which was a baseline heuristic using DQN and an
architecture inspired by AlphaGo and Atari papers. Most of our
agents emerged with defensive behaviors where we tried to train
them further with reward shaping to observe emergence of other