DJL running without speculative decoding #2678

eduardzl · 2025-01-24T09:19:05Z

Hello.
I am using 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.31.0-lmi13.0.0-cu124 container to run inference for Llama3.3-70B-Instruct. The container is being launched using Docker.
Have created repo dir with 2 models : 70B model and 8B model (model ids : mymodel and mymodeldraft)
Here are serving.properties of both models:

For 70B model (mymodel):

engine=Python
option.mpi_mode=True
option.tensor_parallel_degree=8
option.trust_remote_code=true
option.rolling_batch=lmi-dist
option.max_input_len=32768
option.max_output_len=32768
option.max_model_len=32768
option.gpu_memory_utilization=0.5
option.max_rolling_batch_size=32
option.enable_prefix_caching=true
option.enable_streaming=false
option.speculative_draft_model=mymodeldraft
option.draft_model_tp_size=8
option.speculative_length=5

For 8B model (mymodeldraft):

engine=Python
option.mpi_mode=True
option.tensor_parallel_degree=8
option.trust_remote_code=true
option.rolling_batch=lmi-dist
option.max_input_len=32768
option.max_output_len=32768
option.max_model_len=32768
option.gpu_memory_utilization=0.4
option.max_rolling_batch_size=32
option.enable_prefix_caching=true
option.enable_streaming=false

Launching the container, DJL is starting, both model are loaded.
But in the log I see message :

INFO PyProcess W-749-mymodel-stdout: [1,0]<stdout>:WARNING 01-24 08:07:00 arg_utils.py:66] Speculative decoding feature is only available on SageMaker. Running without speculative decoding...
When running inference, seems that speculative decoding is not active, draft model is not being called.

We are running DJL container on SageMaker endpoints.
Can you please explain how can we make this feature work ?
Thank you.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DJL running without speculative decoding #2678

DJL running without speculative decoding #2678

eduardzl commented Jan 24, 2025 •

edited

Loading

DJL running without speculative decoding #2678

DJL running without speculative decoding #2678

Comments

eduardzl commented Jan 24, 2025 • edited Loading

eduardzl commented Jan 24, 2025 •

edited

Loading