Hanabi is a very fun game to play, and it is currently our main project. The game contains many complexities like Radomness, partial observability, big observation space, etc. However, the biggest challenge with the game is the existence of a non-verbal way to communicate information. This information, that is called intention by some, is communicated through the choice of action and specifically through a hint system.

If you have not seen the game before, I think it is a good time to click at (put them in parallel or make a box 2x2)

English: Hanabi Review - with Tom Vasel

French: LudoChrono - Hanabi

Portugese: TRAPAÇAS no De Quem é a Vez? - MAGIC MAZE & HANABI - Jhonny Drumond, Thati Lopes e Victor Lamoglia

German: Hanabi (Spiel des Jahres 2013) - Spiel - Kartenspiel - Board Game - Review #10

Because of this special communication system, it is possible to generate an astronomical number of winning strategies, which are totally incompatible with each other. This means that when people consciously agree on a strategy, the game is being completed successfully. However, if these strategies are picked at random, we end up in a disaster. Unlike with zero-sum competitive games like Go, since the choices that an agent is doing in the game, depend on their interpretation of your intentions, it is impossible to find a strategy that works for everyone.

Because of this communication peculiarity, Hanabi was proposed by google as a good testbed for creating agents that can successfully engage in cooperative tasks in ad-hoc setting, i.e., paired with agents that have not seen before. In order to do this, we believe that an agent should be able to create models of other agents that are sufficient to finish the game. However, these models must have two main properties. First, they must be sufficient in order to have a universal model-based strategy that successfully completes the game. They have to be quite reduced and light so they can be produced only after a small number of games. That is why Hanabi, seems like the perfect ground to form and test meta-learning algorithms applied to decision making.

Our roadmap has three stops. First, we create a pool of diverse agents that they carry a winning strategy. These agents are quite fixed and not adaptable, and they serve as data-points on the space of possible strategies. For the creation of these pools, we try three different Evolutionary methods. The first is with ruled based agents, the second with reward shaping and the third using Neuroevolution. With the completion of the pool, the second step is about forming models of these agents. Through Reinforcement learning, an agent will be trained to play with these agents, given these models.

One can find updates of our project at https://www.researchgate.net/project/Hanabi-2 and the latest version of code and agents in our github page: https://github.com/orgs/Hanabi-GameProject/dashboard.