Skip to content

MCTS Module Documentation

vapor edited this page Nov 21, 2017 · 3 revisions

Class: MCTreeNode

A tree node, contains N(s,a), W(s,a), P(s,a), Q(s,a) is always calculated by W(s,a)/N(s,a). And its parent (MCTreeNode), its children (a dictionary of action: MCTreeNode).

MCTreeNode.__init__

Input: parent, prior_prob

  • parent: the parent of this node
  • prior prob: P(s,a) for this node, from NN evaluation.

MCTreeNode.expand

Expand the current node if it's a leaf node, updates its value (with value) and create its children (with policy).

Input: policy, value

  • policy: a list of (action, prob) tuples returned by the network
  • value: the value of this node returned by the network

Output: None

MCTreeNode.select

Select THE best child of this node with Argmax_a(Q(s,a)+U(s,a))

Input: None

Output: (action, node)

  • action: selected action
  • node: the child corresponding to this action

MCTreeNode.update

Update W(s,a) for the current node.

MCTreeNode.get_mean_action_value

Calculates Q(s,a) for current node. Input: None Output: a real number

Class: MCTSearch

The MCTS. For one game, creating one instance is sufficient. Since a part of the tree should be reused when a move is done, creating a new instance will discard the whole tree.

Example usage:

# Assuming state is an instance of a game state
mcts = MCTSearch(math_helper.random_state_transform, parallel_evaluator)
move = mcts.calc_move(state)
state.do_move(move)
mcts.update_with_move(move)

MCTSearch.__init__

Input: (transformer, evaluator, max_playout)

  • transformer: The function transforming the state before NN evaluation. I have implemented this function in math_helper.random_state_transform.
  • evaluator: The function evaluating a given state to policy and value. This function should be implemented such that it queues the state (until 8) and sends the batch to Network.main.response, and return the results to each leaf with shape [(action, prob), ...], value.

MCTSearch.calc_move

Takes a state and calculates the move to play.

Input: (state, dirichlet, prop_exp)

  • state: An instance of game state.
  • dirichlet: enable Dirichlet noise described in "Self-play" section
  • prop_exp: select the final decision proportional to its exponential visit

Output: move

MCTSearch.calc_move_with_probs

Calculates the best move, and return the search probabilities.

Input: (state, dirichlet, prop_exp)

  • state: An instance of game state.
  • dirichlet: enable Dirichlet noise described in "Self-play" section
  • prop_exp: select the final decision proportional to its exponential visit

Output: (move, probs)

  • move: an action
  • probs: a list of (action, probs)

MCTSearch.update_with_move

Forward a step in the tree, setting the move just made (calc_move only select the move, you have to make the move manually) as root and discard all its siblings. Therefore this part of the tree is reused.

Input: lastmove

  • lastmove: the move just played

Output: None

MCTSearch._playout

Runs a single simulation from a tree node, once it reaches a leaf, it returns after the evaluation and expansion.

This function is implemented recursively, it can start at any node in the tree.

Input: (state, node)

  • state: a game state
  • node: corresponding node

Output: the value from the leaf of this simulation.