-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Larger Context Length #144
Comments
Yes. The 4k is small. I was also trying to use longer contexts and the infrence failed, while the initial memory allocation was done. |
Hi, the max context supported is 4096. What is your application scenario, and how large of a context length do you need? |
@waydong - Thank you for confirming that is still the case! I want to do incorporate RAG and web search tool calling, which requires a larger context window for things like summarizing legal documents and sorting through a large number of search results.. Some models can go up to 132K, but anything over 8K would be greatly appreciated! I currently have a basic chat implementation of RKLLM, using Gradio and based on the example you provide: https://github.com/c0zaut/RKLLM-Gradio Another useful case would be for inputting larger Web search examples: https://github.com/InternLM/MindSearch https://github.com/infinigence/InfiniWebSearch Megrez 3B model here: https://huggingface.co/Infinigence/Megrez-3B-Instruct ^ converts to RKLLM and works like charm, as long as you go through all of the config files and set eos token to 120005 instead of 120025 or make sure you use Thank you! |
As for now we are talking about 128k max - this is a max for the known supported models, phi3, qwen2.5 etc. |
@waydong - it would also be nice to have a long context for chat history. |
Per issue: #93
I would like to open a feature request to support a context length greater than 4k. While I can initialize with larger contexts, and submit and get output, it still throws a matmul error, and the output is not accurate at best:
I do know that with llama.cpp you can configure rope scaling for larger context windows, and there are references to its debug output in librkllmrt.so. Is there any param we can set to expand the context window? If not, would it be possible to add some support for that?
The text was updated successfully, but these errors were encountered: