Tree-of-Thoughts for Open-Source LLMs

Official implementation for paper Tree of Thoughts: Deliberate Problem Solving with Large Language Models for Open-source models using LocalAI / pplx-api.

Setup

Install tot package:

git clone https://github.com/satani99/tree-of-thought-llm.git
cd tree-of-thought-llm
pip install -r requirements.txt
pip install -e .  # install `tot` package

Set up OpenAI API key and OpenAI API base and store them in environment variables OPENAI_API_KEY and OPENAI_API_BASE respectively(see here).

Option 1: Using pplx-api

OPENAI_API_KEY="your_pplx-api_key"
OPENAI_API_BASE="https://api.perplexity.ai"

Option 2: Using LocalAI

OPENAI_API_KEY="your_openai_api_key"
OPENAI_API_BASE="http://localhost:8080/v1"

If you're using pplx-api then skip this step:

To setup the LocalAI, clone this repo which is for zephyr-7b model, and then download the model to the models directory. If you want to use different model then refer this guide. Run the below code in the main directory.

git clone https://github.com/satani99/LocalAI.git
cd LocalAI
wget https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q5_K_M.gguf -O models/zephyr
sudo docker compose up -d --pull always

Experiments

Run experiments via sh scripts/game24/{standard_sampling, cot_sampling, bfs}.sh. It'll run the llama-2-70b-chat model on pplx-api. If you want to run a different model then add the --backend in any of the above sh files.

e.g. --backend zephyr choices for the backend(llama2-7b, llama-2-13b-chat, zephyr, llama-2-70b-chat, mistral-7b-instruct)

CoT and standard sampling were run for 1000 steps and bfs(ToT) was run for 100 steps.

Result

The CoT achieved a 1% score(10 out of 1000 steps) and standard IO prompting did better than CoT with a 4.9% success score. However bfs(ToT) didn't give even one true answer, the reasons are discussed below.

Limitations

Right now pplx-api doesn't support multiple outputs for one input. So we are restrained to use n_generate_sample=1 and can't get 100 or 10 outputs per input. So we can only evaluate CoT and standard prompting method and using n_generate_sample=1 isn't quite a Tree of Thoughts prompting. So that's why the score is 0.
In LocalAI, we can get multiple outputs for one input but all the outputs are the same for that input. So it is also same as getting only one output.

Acknowledgments

Original repos:

ToT

LocalAI

tree-of-thoughts

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tree-of-Thoughts for Open-Source LLMs

Setup

Experiments

Result

Limitations

Acknowledgments

About

Releases

Packages

satani99/Tree_of_Thoughts_oss

Folders and files

Latest commit

History

Repository files navigation

Tree-of-Thoughts for Open-Source LLMs

Setup

Experiments

Result

Limitations

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages