MCTS Module Documentation

Class: `MCTreeNode`

A tree node, contains N(s,a), W(s,a), P(s,a), Q(s,a) is always calculated by W(s,a)/N(s,a). And its parent (MCTreeNode), its children (a dictionary of action: MCTreeNode).

`MCTreeNode.init`

Input: parent, prior_prob

parent: the parent of this node
prior prob: P(s,a) for this node, from NN evaluation.

`MCTreeNode.expand`

Expand the current node if it's a leaf node, updates its value (with value) and create its children (with policy).

Input: policy, value

policy: a list of (action, prob) tuples returned by the network
value: the value of this node returned by the network

Output: None

`MCTreeNode.select`

Select THE best child of this node with Argmax_a(Q(s,a)+U(s,a))

Input: None

Output: (action, node)

action: selected action
node: the child corresponding to this action

`MCTreeNode.update`

Update W(s,a) for the current node.

`MCTreeNode.get_mean_action_value`

Calculates Q(s,a) for current node. Input: None Output: a real number

Class: `MCTSearch`

The MCTS. For one game, creating one instance is sufficient. Since a part of the tree should be reused when a move is done, creating a new instance will discard the whole tree.

Example usage:

# Assuming state is an instance of a game state
mcts = MCTSearch(math_helper.random_state_transform, parallel_evaluator)
move = mcts.calc_move(state)
state.do_move(move)
mcts.update_with_move(move)

`MCTSearch.init`

Input: (transformer, evaluator, max_playout)

transformer: The function transforming the state before NN evaluation. I have implemented this function in math_helper.random_state_transform.
evaluator: The function evaluating a given state to policy and value. This function should be implemented such that it queues the state (until 8) and sends the batch to Network.main.response, and return the results to each leaf with shape [(action, prob), ...], value.

`MCTSearch.calc_move`

Takes a state and calculates the move to play.

Input: (state, dirichlet, prop_exp)

state: An instance of game state.
dirichlet: enable Dirichlet noise described in "Self-play" section
prop_exp: select the final decision proportional to its exponential visit

Output: move

`MCTSearch.calc_move_with_probs`

Calculates the best move, and return the search probabilities.

Input: (state, dirichlet, prop_exp)

state: An instance of game state.
dirichlet: enable Dirichlet noise described in "Self-play" section
prop_exp: select the final decision proportional to its exponential visit

Output: (move, probs)

move: an action
probs: a list of (action, probs)

`MCTSearch.update_with_move`

Forward a step in the tree, setting the move just made (calc_move only select the move, you have to make the move manually) as root and discard all its siblings. Therefore this part of the tree is reused.

Input: lastmove

lastmove: the move just played

Output: None

`MCTSearch._playout`

Runs a single simulation from a tree node, once it reaches a leaf, it returns after the evaluation and expansion.

This function is implemented recursively, it can start at any node in the tree.

Input: (state, node)

state: a game state
node: corresponding node

Output: the value from the leaf of this simulation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCTS Module Documentation

Class: `MCTreeNode`

`MCTreeNode.init`

`MCTreeNode.expand`

`MCTreeNode.select`

`MCTreeNode.update`

`MCTreeNode.get_mean_action_value`

Class: `MCTSearch`

`MCTSearch.init`

`MCTSearch.calc_move`

`MCTSearch.calc_move_with_probs`

`MCTSearch.update_with_move`

`MCTSearch._playout`

Clone this wiki locally

MCTS Module Documentation

Class: MCTreeNode

MCTreeNode.__init__

MCTreeNode.expand

MCTreeNode.select

MCTreeNode.update

MCTreeNode.get_mean_action_value

Class: MCTSearch

MCTSearch.__init__

MCTSearch.calc_move

MCTSearch.calc_move_with_probs

MCTSearch.update_with_move

MCTSearch._playout

Clone this wiki locally

Class: `MCTreeNode`

`MCTreeNode.init`

`MCTreeNode.expand`

`MCTreeNode.select`

`MCTreeNode.update`

`MCTreeNode.get_mean_action_value`

Class: `MCTSearch`

`MCTSearch.init`

`MCTSearch.calc_move`

`MCTSearch.calc_move_with_probs`

`MCTSearch.update_with_move`

`MCTSearch._playout`