(Fight the Landlord)

Dou Dizhu, also called "Fight the landlord" is a simple and entertaining traditional Chinese poker game played under a stochastic, partially-observable environment.. A standard game includes dealing, bidding, playing, and scoring with three players using a 54-card deck (with two jokers).The game is short, usually lasting around five minutes.

If you want to learn how to play the game, check this video out:

For more detailed rules, please check: Full rules of Fight the Landlord.

Players consider the game to be very interesting and challenging, because it combines random and cooperative aspects. The classic DouDiZhu game involves three players, two of whom, called “the peasants”, need to cooperate against one another, called “the landlord”. The peasants win if any of their hands are played first; otherwise, the landlord wins. The profits or losses in the game are shared between the peasants while the landlord carries himself alone, which means the game is a zero-sum game satisfying the Nash Equilibrium. As in most card games, the starting hand of DouDiZhu can primarily affect the outcome of the game. The rules of DouDiZhu are not complicated, however, the two essential aspects of winning the game required strategies and skills.

Since there are numerous card combinations at each time step, the number of possible actions is huge. Because of this discrete action space and the complicated action relationships, the game is a quite challenging reinforcement learning environment. Moreover, compared to games like Chess and Go, DouDiZhu is an incomplete information game that usually has more complicated solutions as well as an unique game that emphasizes both competition and cooperation.

Our work is based on the environment from RLCard: A Toolkit for Reinforcement Learning in Card Games>, you can find the original code of the RLCard framework here: We have slightly modified the RLCard environment for state encoding and state space.

In the beginning of our work we want to design several Doudizhu agents by applying different deep reinforcement learning approaches like the popular Rainbow DQN architecture with different distributional components as well as Neural Fictitious SelfPlay and others. Our first milestone is to beat the rule-based agents of the RLCard Paper as landlord. We then will evaluate the impact of changes to the state encodings of the original RLCard environment as well as different training settings. Once we accomplish that we will pursue the most promising approaches and see how far we can push them.