The LLaMa Web is a web interface to chat or play with LLaMa based models.
- NodeJS 18
- Yarn
- MongoDB (for saving chats)
- llama.cpp
- Keycloak server (for authentication / OPTIONAL)
cd client && pnpm install --frozen-lockfile && pnpm build
cd ..
cd api && yarn install --frozen-lockfile && yarn build
Copy the example.env
file from the both folder to .env
and edit it.
cp client/example.env client/.env && nano client/.env
cp api/example.env api/.env && nano api/.env
In the both folder run the following command:
yarn start
- Docker
- Docker Compose
- Download
docker-compose.yml
file
Edit the docker-compose.yml
file and change the environment variables.
However, you can't change the DB
, LLAMA_PATH
and LLAMA_EMBEDDING_PATH
variables.
If you don't want to use Keycloak, you can enable the SKIP_AUTH
variable, by setting it to true
in client AND api.
docker-compose up -d
Note
We assume you want to use TheBloke/Llama-2-7B-Chat-GGUF. Good to know: This project is tested by using TheBloke's GGUF model.
Note
On Docker or without docker, the steps are the same.
- Go to the playground
- Then go to the
Models
tab - Click on
Install a new model
- Enter the name of the model (e.g.
llama-2-7b-chat
) - Enter the download link (e.g.
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf
) - Enter the model chat template (can be found here)
- Click on
Install the new model
- Wait until the model is installed, you can refresh the page to see when it is done.
Warning
Built-in models management is not supported when using an alternative compute backend. You have to edit the alternative backend directly to support the model you want to use. No support will be provided for this. It is possible to use the built-in models management and the alternative compute backend at the same time.
Note
You can disable alternative compute backend by setting ALLOW_ALTERNATIVE_COMPUTE_BACKEND
to false
in the api .env
file.
- a server that can run an app similar to
examples/alt-backend/mixtral8x7B.py
- Go to the playground
- Then go to the
Models
tab - Click on
Install a new model
- Enter the name of the model (e.g.
llama-2-7b-chat
) - Press on
Use alternative compute backend
- Enter the compute backend url (e.g.
https://my-alternative-compute-backend.domain.com
) - Press on
Add the alternative backend model