You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. Please use English, otherwise it will be closed.
Motivation
n-gram based speculative is very effective in retrieval augmented generation(RAG). The cost of generating draft tokens is relatively low compared to eagle and has a great potential for accelerating token generation in RAG. Ant group has proposed the Trie-based retrieval and verification mechanism. I want to adopt it to SGLang.
Checklist
Motivation
n-gram based speculative is very effective in retrieval augmented generation(RAG). The cost of generating draft tokens is relatively low compared to eagle and has a great potential for accelerating token generation in RAG. Ant group has proposed the Trie-based retrieval and verification mechanism. I want to adopt it to SGLang.
Related resources
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
The text was updated successfully, but these errors were encountered: