lematic due to the mismatch between our own strategy and the model of it entering the end game. We
chose to do this because the endgame solving
approach can be less robust if the input strategies
have weight on only a small number of hands (as an
extreme example, if all the weight was on one hand,
then the end-game solver would assume that the other agent knew our exact hand, and the solution
would require us to play extremely conservatively).
The approach is much more robust if we include a
small probability on many different hands before
applying the postprocessing. We believed that the
gain in robustness outweighed the limitation of the
mismatch (in addition to the reasons given above, we
already expect there to be a mismatch between the
input trunk strategy for the opponent, which is based
off our offline equilibrium computation, and his own
actual strategy, and thus we would not be removing
this mismatch completely even if we eliminated it for
our own strategy).
The endgame solving algorithm consists of several
steps (Ganzfried and Sandholm 2015). First, the joint
hand-strength input distributions are computed by
applying Bayes’ rule to the precomputed trunk strategies, utilizing a recently developed technique that
requires only a linear number of lookups in the large
strategy table (while the naïve approach requires a
quadratic number of lookups and is impractical).
Then the equity is computed for each hand, given
these distributions. The equity of a hand against a
distribution for the opponent is the probability of
winning plus one half the probability of tying. Then
hands are bucketed separately for each player based
on the computed equities for the given situation by
applying an information abstraction algorithm.
Finally an exact Nash equilibrium is computed in the
game corresponding to this information abstraction
and an action abstraction that had been precomputed for the specific pot and stack size of the current
hand. All of this computation was done in real time
during gameplay. To compute equilibria within the
end games, we used Gurobi’s parallel linear program
solver1 to solve the sequence-form optimization formulation (Koller, Megiddo, and von Stengel 1994).
Rules of No-Limit Texas Hold ’em
Two-player no-limit Texas hold ’em works as follows.
Initially two players each have a stack of chips (worth
$20,000 in the computer poker competition). One
player, called the small blind, initially puts $50 worth
of chips in the middle, while the other player, called
the big blind, puts $100 worth of chips in the middle.
The chips in the middle are known as the pot, and
will go to the winner of the hand.
Next, there is an initial round of betting. The player to act can choose from three available options:
Fold: Give up on the hand, surrendering the pot to the
Call: Put in the minimum number of chips needed to
match the number of chips put into the pot by the
opponent. For example, if the opponent has put in
$1000 and we have put in $400, a call would require
putting in $600 more. A call of zero chips is also
known as a check.
Bet: Put in additional chips beyond what is needed to
call. A bet can be of any size from 1 chip up to the
number of chips a player has left in his stack, provided it exceeds some minimum value and is a multiple
of the smallest chip denomination (by contrast, in the
limit variant, all bets must of a fixed size, which equals
the big blind for the first two rounds and twice the big
blind for the final two rounds). The minimum allowable bet size is the big blind for the first bet of a round
and the size of the previous bet in the current round
for subsequent bets. A bet of all of one’s remaining
chips is called an all-in bet. If the opponent has just
bet, then our additional bet is also called a raise. In
some variants, the number of raises in a given round
is limited (for limit it is limited to three and for no-limit it is unlimited), and players are forced to either
fold or call at that point.
The initial round of betting ends if a player has
folded, if there has been a bet and a call, or if both
players have checked. If the round ends without a
player folding, then three public cards are revealed
face-up on the table (called the flop) and a second
round of betting takes place. Then one more public
card is dealt (the turn) and a third round of betting,
followed by a fifth public card (the river) and a final
round of betting. If a player ever folds, the other player wins all the chips in the pot. If the final betting
round is completed without a player folding, then
both players reveal their private cards, and the player with the best five-card hand (out of his two private
cards and the five public cards) wins the pot (it is
divided equally for a tie).
Several hands stood out during the course of the
competition that highlighted weaknesses of the
In one hand, Claudico had A4s (ace and four of the
same suit) and folded preflop after it had put in over
half of its stack (the human opponent had 99). This
is regarded as a bad play, since it would only need to
win around 25 percent of the time against the opponent’s distribution for a call to be profitable at this
point (Claudico wins about 33 percent of the time
against the hand the human had). The problem was
that the translation mapping mapped the opponent’s
raise down to a smaller size, which caused the agent
to look up a strategy that had been computed thinking that the pot size was much smaller than it had
thought it was (Claudico thought it had invested
around 7,000 when it had actually invested close to
10,000 — recall that the starting stacks are 20,000).