[willing to PR] Add Lookahead speculative decoding #2772

jjjjohnson · 2025-01-07T08:38:49Z

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

n-gram based speculative is very effective in retrieval augmented generation(RAG). The cost of generating draft tokens is relatively low compared to eagle and has a great potential for accelerating token generation in RAG. Ant group has proposed the Trie-based retrieval and verification mechanism. I want to adopt it to SGLang.

Related resources

Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy

jjjjohnson · 2025-01-08T09:42:19Z

@zhyncs #2790

zhyncs assigned jjjjohnson Jan 7, 2025

zhyncs added good first issue Good for newcomers enhancement New feature or request labels Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[willing to PR] Add Lookahead speculative decoding #2772

[willing to PR] Add Lookahead speculative decoding #2772

jjjjohnson commented Jan 7, 2025

jjjjohnson commented Jan 8, 2025

[willing to PR] Add Lookahead speculative decoding #2772

[willing to PR] Add Lookahead speculative decoding #2772

Comments

jjjjohnson commented Jan 7, 2025

Checklist

Motivation

Related resources

jjjjohnson commented Jan 8, 2025