Skip to content

Commit

Permalink
Added markdow files for docs.
Browse files Browse the repository at this point in the history
  • Loading branch information
tristanvdb committed Mar 5, 2024
1 parent 5c0a4ab commit 01686c9
Show file tree
Hide file tree
Showing 8 changed files with 300 additions and 370 deletions.
187 changes: 13 additions & 174 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,192 +1,31 @@
⚙ Automaton & Cognition
=============================

[![PIP](https://github.com/LLNL/AutoCog/workflows/pip/badge.svg?branch=master)](https://github.com/LLNL/AutoCog/actions)
[![Frontend](https://github.com/LLNL/AutoCog/workflows/frontend/badge.svg?branch=master)](https://github.com/LLNL/AutoCog/actions)
[![CLI](https://github.com/LLNL/AutoCog/actions/workflows/cli.yml/badge.svg?branch=master)](https://github.com/LLNL/AutoCog/actions)

Auotmaton & Cognition explores mechanisms to build automaton that drive cognitive processes.
To this end, we defined a programming model, Structured Thoughts, with a language that compiles to a set of automaton.

## Structured Thoughts

In the Structured Thoughts programming model, prompts are akin to the building blocks of traditional computer programs.
Prompts are compiled to automaton that ensure that the resulting completion can be parsed to extract structured data.
Branching between prompts is controlled by the language model.
The dataflow is statically defined and executed when instantiating the automaton of each prompt.
Calls (to other prompts or python tools) are executed during the dataflow phase.

Below, we show a single prompt program which implement Chain-of-Thoughts (CoT) to answer a multiple choice question.
In this examples, the language model is presented with the `topic`, the `question`, and four `choices`.
It can then think using one to ten `thought` (up 20 tokens for each).
Eventually, the model must indicate the index of the correct choice.

```
format thought {
is text<20>;
annotate f"a short text representing a single thought, it does not have to be a proper sentence.";
}
prompt main {
is {
topic is text<20>;
question is text<50>;
choices[4] is text<40>;
work[1:10] is thought;
answer is select(.choices);
}
channel {
to .topic from ?topic;
to .question from ?question;
to .choices from ?choices;
}
return {
from .answer;
}
annotate {
_ as "You are answering a multiple choice questionnaire.";
.topic as "the general category from which the question was taken";
.question as "the question that you have to answer";
.choices as "the four possible choices to answer the question, only one is correct";
.work as "show your work step-by-step";
.answer as "you pick the index of the choice that best answer the question";
}
}
```

We are developing the [MCQ](./library/mcq) library of program to illustrate thought patterns that are achievable using Structured Thoughts.

## Getting started

### Install

As simple as `pip install -U git+https://github.com/LLNL/AutoCog`.

But, you'll probably want to clone the repository to get the library of programs:
```
git clone https://github.com/LLNL/AutoCog
pip install -U ./AutoCog
```

### LLM Setup

#### LLama.cpp and GGUF models

We download model from [TheBloke](https://huggingface.co/TheBloke) on Hugging Face.
For example, you can donwload LlaMa 2 with 7B parameters and tuned for Chat with:
```
wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf
```
This is a 4 bits version, aka `Q4_K_M` in the name. It is the main model we use for testing.

To run GGUF model, we use a [modified version](https://github.com/tristanvdb/llama-cpp-python/tree/choice-dev) of the `llama-cpp-python` package.
It provides python bindings and will build `LLama.cpp`.
Our changes permit us to implement `greedy` completion (returning logprob for all tokens).
```
pip install -y git+https://github.com/tristanvdb/llama-cpp-python@choice-dev
```

> TODO v0.5: connect to low-level API in `llama-cpp-python` so that we can use the default release
#### HuggingFace Transformers

> TODO v0.6: connection for HuggingFace Transformers package (use to have it but not tested)
### Inside a Notebook

Most of the development is done inside Python notebook (jupiterlab).
Eventually, several notebooks demonstrating various part of AutoCog will be provided in the [share](./share) folder.
To get an idea of our progress, take a look at the [WIP Notebook](./share/wip.ipynb).

### Command line

We are building a command line tool to use AutoCog.

`python3 -m autocog --help`

```
usage: __main__.py [-h] [--version] [--orch ORCH] [--gguf GGUF] [--gguf-ctx GGUF_CTX] [--syntax SYNTAX] [--cogs COGS] [--command COMMAND] [--output OUTPUT] [--prefix PREFIX] [--serve] [--host HOST] [--port PORT] [--debug]
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--orch ORCH Type of orchestrator: `serial` or `async`. (default: serial)
--gguf GGUF Load a model from a GGUF file using llama.cpp (and llama-cpp-python) (default: None)
--gguf-ctx GGUF_CTX Context size for GGUF models (default: 4096)
--syntax SYNTAX One of `Llama-2-Chat`, `ChatML`, `Guanaco` or a dictionary of the kwargs to initialize a Syntax object (inlined JSON or path to a file). (default: None)
--cogs COGS Files to load as cog in the architecture, prefix with its identifier else the filename is used. For example, `some/cognitive/mcq.sta` and `my.tool:some/python/tool.py` will load a Structured Thought
Automaton as `mcq` and a Python file as `my.tool`. (default: None)
--command COMMAND Command to be executed by the architecture as a dictionary. `__tag` identify the cog while `__entry` identify the entry point in this cog (defaults to `main`). All other field will be forwarded as
keyworded args. Example: `{ "__tag" : "writer", "__entry" : "main", **kwarg }` (inlined JSON or path to a file). Can also provide one or more list of dictionary. (default: None)
--output OUTPUT Directory where results are stored. (default: /home/tristan/projects/LLM/AutoCog)
--prefix PREFIX String to identify this instance of AutoCog (default: autocog)
--serve Whether to launch the flask server. (default: False)
--host HOST Host for flask server. (default: localhost)
--port PORT Port for flask server. (default: 5000)
--debug Whether to run the flask server in debug mode. (default: False)
```

Some examples:
```
python3 -m autocog --gguf /data/models/tinyllama-2-1b-miniguanaco.Q4_K_M.gguf --syntax Guanaco \
--cogs mmlu.repeat_cot:library/mmlu-exams/repeat-cot.sta \
--command '{ "__tag" : "mmlu.repeat_cot", "topic" : "arithmetic", "question" : "What is 3*4+9?", "choices" : [ "16", "21", "39", "42" ] }'
```
```
python3 -m autocog --gguf /data/models/llama-2-7b-chat.Q4_K_M.gguf --syntax Llama-2-Chat \
--syntax '{ "prompt_with_format" : false, "prompt_with_index" : false, "prompt_indent" : "" }' \
--cogs mmlu.repeat_cot:library/mmlu-exams/repeat-cot.sta \
--cogs mmlu.select_cot:library/mmlu-exams/select-cot.sta \
--command '{ "__tag" : "mmlu.repeat_cot", "topic" : "arithmetic", "question" : "What is 3*4+9?", "choices" : [ "16", "21", "39", "42" ] }' \
--command '{ "__tag" : "mmlu.select_cot", "topic" : "arithmetic", "question" : "What is 3*4+9?", "choices" : [ "16", "21", "39", "42" ] }'
```

Currently, the AutoCog application only saves the output of the commands in a JSON file.

> TODO v0.5: saving the "pages"
### Web Application

The goal is to provide a development environment.
Particularly, the ability to inspect and edit/replay `frames`.
These are created for each execution of an `Automaton` (nested when an `Automaton` call another `Automaton`).
Upon ending, the execution trace of the `Automaton` is saved in the corresponding frame.

Eventually, we want to use these traces for two purposes:
- replay: edit part of the trace then restart the program from that point
- finetuning: select "succesful" frames to finetune models

Run the command below at the root of the repository to launch a server. It uses [quart](http://pgjones.gitlab.io/quart).
```
python3 -m autocog --serve --host 0.0.0.0 --port 5000 --cogs mmlu.repeat_cot:library/mmlu-exams/repeat-cot.sta
```

### Testing

Currently only pushes to selected branches trigger GitHub actions.
The results for `master` are shown at the top of this README.

We run three tests:
- `pip install`
- Structured Thoughts frontend (parsing some non-sensical but lexicographically correct sample of the language)
- AutoCog CLI to load the MMLU-Exams and run a very simple query

Currently, tests involving a model use the Random Language Model ([see rambling here](./tests/cli-mmlu.sh)).
Looking for alternative to making the GitHub action download Llama 2 (7b, Chat, Q4_K_M) which I use for testing.

| | PIP | Frontend | CLI |
|---|---|---|---|
| `master` | [![PIP](https://github.com/LLNL/AutoCog/workflows/pip/badge.svg?branch=master)](https://github.com/LLNL/AutoCog/actions) | [![Frontend](https://github.com/LLNL/AutoCog/workflows/frontend/badge.svg?branch=master)](https://github.com/LLNL/AutoCog/actions) | [![CLI](https://github.com/LLNL/AutoCog/actions/workflows/cli.yml/badge.svg?branch=master)](https://github.com/LLNL/AutoCog/actions) |
| `devel` | [![PIP](https://github.com/LLNL/AutoCog/workflows/pip/badge.svg?branch=devel)](https://github.com/LLNL/AutoCog/actions) | [![Frontend](https://github.com/LLNL/AutoCog/workflows/frontend/badge.svg?branch=devel)](https://github.com/LLNL/AutoCog/actions) | [![CLI](https://github.com/LLNL/AutoCog/actions/workflows/cli.yml/badge.svg?branch=devel)](https://github.com/LLNL/AutoCog/actions) |

Automaton & Cognition explores mechanisms to build automaton that control applications driven by auto-regressive language models.
To this end, we defined a programming model, Structured Thoughts, with a language that compiles to a set of automaton.

We broke down the documentation into a few files:
- [setup](./docs/setup.md)
- [usage](./docs/usage.md)
- [language](./docs/language.md)
- [tutorial](./docs/tutorial.md)
- [roadmap](./docs/roadmap.md)

The libraries have [their own documentation](./library/README.md).

## Contributing

Contributions are welcome!

So far there is only one rule: **linear git history** (no merge commits).
Only the master branch have stable commits, other branches might be rebased without notice.

Version number should increase for each push to master and have a matching tag.
Version number should increase for each push to `master` and have a matching tag.

## License

Expand Down
48 changes: 48 additions & 0 deletions docs/language.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
Structured Thoughts
===================

In the Structured Thoughts programming model, prompts are akin to the building blocks of traditional computer programs.
Prompts are compiled to automaton that ensure that the resulting completion can be parsed to extract structured data.
Branching between prompts is controlled by the language model.
The dataflow is statically defined and executed when instantiating the automaton of each prompt.
Calls (to other prompts or python tools) are executed during the dataflow phase.

Below, we show a single prompt program which implement Chain-of-Thoughts (CoT) to answer a multiple choice question.
In this examples, the language model is presented with the `topic`, the `question`, and four `choices`.
It can then think using one to ten `thought` (up 20 tokens for each).
Eventually, the model must indicate the index of the correct choice.

```
format thought {
is text<20>;
annotate f"a short text representing a single thought, it does not have to be a proper sentence.";
}
prompt main {
is {
topic is text<20>;
question is text<50>;
choices[4] is text<40>;
work[1:10] is thought;
answer is select(.choices);
}
channel {
to .topic from ?topic;
to .question from ?question;
to .choices from ?choices;
}
return {
from .answer;
}
annotate {
_ as "You are answering a multiple choice questionnaire.";
.topic as "the general category from which the question was taken";
.question as "the question that you have to answer";
.choices as "the four possible choices to answer the question, only one is correct";
.work as "show your work step-by-step";
.answer as "you pick the index of the choice that best answer the question";
}
}
```

We are developing the [MCQ](./library/mcq) library of program to illustrate thought patterns that are achievable using Structured Thoughts.
30 changes: 30 additions & 0 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
Roadmap
=======

This is a roadmap of basic features that are needed to make AutoCog (and STA) usable.
It only considers a few weeks worth of work but I rarely have brain-cycles to work on this project.

Given that I am currently working alone on this project, I am not tracking work using issues and milestones.

In the roadmap below, each minor versions consolidate the increment of the previous ones.
Simply, all the `v0.4.X` are steps toward `v0.5`.
These bugfix level milestones are subject to reordering (change of priority) and shifting (introducing new milestone or actual bugfixes).

| Version | Features | Notes | Tracking |
| ------- | -------- | ----- | -------- |
| v0.4 | Structured Thoughts | release 1st version of ST | |
| v0.4.1 | Tests & Fixes | Testing more LLMs and fix tokenizations issue | |
| v0.4.2 | Roadmap & Doc | Needed some organization... | |
| v0.4.3 | Low-Level llama-cpp-python | | |
| v0.4.4 | FTA: Simplify, Choice Limit, and Norms | | |
| v0.4.5 | Beam Search | Implementation within FTA | |
| v0.5 | Language Docs | Description of the language and tutorial | |
| v0.5.1 | Tests & Fixes | Expecting that it will be needed... | |
| v0.5.2 | Unified FTA | FTA in one "loop" using llama-cpp-python low-level API | |
| v0.5.3 | Elementary | Library of elementary "worksheet" (arithmetic: add/mul/div, literacy: spelling, grammar, comprehension) | |
| v0.5.4 | MMLU-Exams | Library of MCQ Solver using different Thought Patterns | |
| v0.5.5 | FTA to BNF | Translate FTA to llama.cpp BNF | |
| v0.6 | Benchmarking | Evaluate speed and accuracy on Elementary and MMLU-Exams | |
| v0.6.1 | Tooling Benchmark | | |
| v0.7 | Finetuning | Selected foundation LLMs targetting improved performance at MMLU-Exams | |
| v0.7.1 | Finetune Tooling | | |
36 changes: 36 additions & 0 deletions docs/setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Setup
=====

## Install AutoCog

As simple as `pip install -U git+https://github.com/LLNL/AutoCog`.

But, you'll probably want to clone the repository to get the library of programs:
```
git clone https://github.com/LLNL/AutoCog
pip install -U ./AutoCog
```

## LLM Setup

### LLama.cpp and GGUF models

We download model from [TheBloke](https://huggingface.co/TheBloke) on Hugging Face.
For example, you can donwload LlaMa 2 with 7B parameters and tuned for Chat with:
```
wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf
```
This is a 4 bits version, aka `Q4_K_M` in the name. It is the main model we use for testing.

To run GGUF model, we use a [modified version](https://github.com/tristanvdb/llama-cpp-python/tree/choice-dev) of the `llama-cpp-python` package.
It provides python bindings and will build `LLama.cpp`.
Our changes permit us to implement `greedy` completion (returning logprob for all tokens).
```
pip install -y git+https://github.com/tristanvdb/llama-cpp-python@choice-dev
```

> TODO v0.5: connect to low-level API in `llama-cpp-python` so that we can use the default release
### HuggingFace Transformers

> TODO v0.6: connection for HuggingFace Transformers package (use to have it but not tested)
4 changes: 4 additions & 0 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Tutorial
========


Loading

0 comments on commit 01686c9

Please sign in to comment.