Releases: LlamaEdge/rag-api-server
Releases · LlamaEdge/rag-api-server
LlamaEdge-RAG 0.6.1
Major changes:
- Support
embedding
prompt template for embedding model
LlamaEdge-RAG 0.6.0
Major changes:
- Improve
--batch-size
CLI option: set batch size for chat and embedding models, respectively
LlamaEdge-RAG 0.5.3
Major changes:
- Add
user
header in chat completion responses - Support
PLUGIN_DEBUG
wasm environment variable for debugging the low-level ggml plugin
LlamaEdge-RAG 0.5.2
Major changes:
- Llama-api-server
- Improve error responses
- Add
content-type:application/json
header in responses
LlamaEdge-RAG 0.5.1
Major change:
- Update
/v1/embeddings
endpoint to be compatible with OpenAI/v1/embeddings
API
LlamaEdge-RAG 0.5.0
Major change:
- Update deps:
llama-core v0.9.0
,endpoints v0.8.0
, andchat-prompts v0.7.1
LlamaEdge-RAG 0.4.0
Major changes:
- New
/v1/retrieve
endpoint - New
--rag-policy
CLI option
LlamaEdge-RAG 0.3.8
major change: update the chat-prompts
dep to 0.6.2
LlamaEdge-RAG 0.3.7
Major change:
- Post-process the generation of
phi-3-chat
model in non-stream mode.
LlamaEdge-RAG 0.3.6
Major changes:
- Post-process the generation of
llama-2-chat
andllama-3-chat
models in non-stream mode.