-
Notifications
You must be signed in to change notification settings - Fork 2
MCTS Module Documentation
A tree node, contains N(s,a), W(s,a), P(s,a)
, Q(s,a)
is always calculated by W(s,a)/N(s,a)
. And its parent (MCTreeNode), its children (a dictionary of action: MCTreeNode).
Input: parent, prior_prob
- parent: the parent of this node
- prior prob:
P(s,a)
for this node, from NN evaluation.
Expand the current node if it's a leaf node, updates its value (with value) and create its children (with policy).
Input: policy, value
- policy: a list of (action, prob) tuples returned by the network
- value: the value of this node returned by the network
Output: None
Select THE best child of this node with Argmax_a(Q(s,a)+U(s,a))
Input: None
Output: (action, node)
- action: selected action
- node: the child corresponding to this action
Update W(s,a)
for the current node.
Calculates Q(s,a)
for current node.
Input: None
Output: a real number
The MCTS. For one game, creating one instance is sufficient. Since a part of the tree should be reused when a move is done, creating a new instance will discard the whole tree.
Example usage:
# Assuming state is an instance of a game state
mcts = MCTSearch(math_helper.random_state_transform, parallel_evaluator)
move = mcts.calc_move(state)
state.do_move(move)
mcts.update_with_move(move)
Input: (transformer, evaluator, max_playout)
- transformer: The function transforming the state before NN evaluation. I have implemented this function in
math_helper.random_state_transform
. - evaluator: The function evaluating a given state to policy and value. This function should be implemented such that it queues the state (until 8) and sends the batch to
Network.main.response
, and return the results to each leaf with shape[(action, prob), ...], value
.
Takes a state and calculates the move to play.
Input: (state, dirichlet, prop_exp)
- state: An instance of game state.
- dirichlet: enable Dirichlet noise described in "Self-play" section
- prop_exp: select the final decision proportional to its exponential visit
Output: move
Calculates the best move, and return the search probabilities.
Input: (state, dirichlet, prop_exp)
- state: An instance of game state.
- dirichlet: enable Dirichlet noise described in "Self-play" section
- prop_exp: select the final decision proportional to its exponential visit
Output: (move, probs)
- move: an action
- probs: a list of
(action, probs)
Forward a step in the tree, setting the move just made (calc_move
only select the move, you have to make the move manually) as root and discard all its siblings. Therefore this part of the tree is reused.
Input: lastmove
- lastmove: the move just played
Output: None
Runs a single simulation from a tree node, once it reaches a leaf, it returns after the evaluation and expansion.
This function is implemented recursively, it can start at any node in the tree.
Input: (state, node)
- state: a game state
- node: corresponding node
Output: the value from the leaf of this simulation.