Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default values for inference from generation_config.json are not being applied #2672

Open
eduardzl opened this issue Jan 22, 2025 · 2 comments · May be fixed by #2685
Open

Default values for inference from generation_config.json are not being applied #2672

eduardzl opened this issue Jan 22, 2025 · 2 comments · May be fixed by #2685
Labels
enhancement New feature or request

Comments

@eduardzl
Copy link

eduardzl commented Jan 22, 2025

Description

We are using DJL container 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.31.0-lmi13.0.0-cu124 with vLLM as inference engine to serve Llama 3.1 - Llama 3.3 models. Models files include "generation_config.json" file which can specify default values for sampling parameters : temperature, top_p, top_k.
The default inference values specified in the generation_config.json file are not being applied to the inference requests.
Can it be implemented ?

We would like to populate the "generation_config.json" file with values that are performing best for the model.
It seems that currently DJL ignores this file and uses defaults from

self.max_new_seqlen = kwargs.get('max_new_tokens', 30)

Thank you.

@eduardzl eduardzl added the enhancement New feature or request label Jan 22, 2025
@siddvenk
Copy link
Contributor

Thanks for reporting this issue. It looks like we will need to pass the generation_config.json file to the engine args in vllm https://docs.vllm.ai/en/latest/serving/engine_args.html. I will take a look at this and get back to you with a fix - I expect this to be available in the 0.32.0 release, scheduled for first week of February.

@siddvenk
Copy link
Contributor

vLLM added support for this functionality in 0.6.6 vllm-project/vllm@5aef498.

Our most recent container release currently leverage vllm 0.6.3.post1, which is why this behavior is not observed.

I have raised #2685 to address this issue for the next container release - it's possible that we also update vllm by then, but in case we do not this should resolve that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants