Official implementation for paper Tree of Thoughts: Deliberate Problem Solving with Large Language Models for Open-source models using LocalAI / pplx-api.
- Install
tot
package:
git clone https://github.com/satani99/tree-of-thought-llm.git
cd tree-of-thought-llm
pip install -r requirements.txt
pip install -e . # install `tot` package
- Set up OpenAI API key and OpenAI API base and store them in environment variables
OPENAI_API_KEY
andOPENAI_API_BASE
respectively(see here).
- Option 1: Using pplx-api
OPENAI_API_KEY="your_pplx-api_key"
OPENAI_API_BASE="https://api.perplexity.ai"
- Option 2: Using LocalAI
OPENAI_API_KEY="your_openai_api_key"
OPENAI_API_BASE="http://localhost:8080/v1"
- If you're using pplx-api then skip this step:
- To setup the LocalAI, clone this repo which is for zephyr-7b model, and then download the model to the models directory. If you want to use different model then refer this guide. Run the below code in the main directory.
git clone https://github.com/satani99/LocalAI.git
cd LocalAI
wget https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q5_K_M.gguf -O models/zephyr
sudo docker compose up -d --pull always
Run experiments via sh scripts/game24/{standard_sampling, cot_sampling, bfs}.sh
. It'll run the llama-2-70b-chat model on pplx-api. If you want to run a different model then add the --backend
in any of the above sh files.
e.g. --backend zephyr
choices for the backend(llama2-7b
, llama-2-13b-chat
, zephyr
, llama-2-70b-chat
, mistral-7b-instruct
)
CoT and standard sampling were run for 1000 steps and bfs(ToT) was run for 100 steps.
The CoT achieved a 1% score(10 out of 1000 steps) and standard IO prompting did better than CoT with a 4.9% success score. However bfs(ToT) didn't give even one true answer, the reasons are discussed below.
-
Right now pplx-api doesn't support multiple outputs for one input. So we are restrained to use
n_generate_sample=1
and can't get 100 or 10 outputs per input. So we can only evaluate CoT and standard prompting method and usingn_generate_sample=1
isn't quite a Tree of Thoughts prompting. So that's why the score is 0. -
In LocalAI, we can get multiple outputs for one input but all the outputs are the same for that input. So it is also same as getting only one output.
Original repos: