Mismatches between config.json exported by export_paged_llm_v1.py and expected by shortfin #405

renxida · 2024-11-01T02:34:53Z

The following edits were required to make llama3 8b fp16 work:

config["attn_head_count"] = 8 # 8 instead of 32
config["paged_kv_cache"] = {}
config["paged_kv_cache"]["block_seq_stride"] = config["block_seq_stride"]
del config["block_seq_stride"]
config["paged_kv_cache"]["device_block_count"] = 256

There are 2 main problems:

the attn_head_count should be set to attention_head_count_kv from export_paged_llm_v1 and not attention_head_count. This should be fixed in sharktank, at least by including both attention head counts
kvcache params should be in config["paged_kv_cache"]

Really need integration tests between sharktank and shortfin.

The text was updated successfully, but these errors were encountered:

renxida · 2024-11-01T02:35:23Z

This was triaged in #401

renxida · 2025-02-05T20:18:12Z

Fixed by #487

renxida mentioned this issue Nov 1, 2024

llama3 8b f16 kvcache dimension mismatch on shortfin #401

Closed

renxida mentioned this issue Nov 1, 2024

Fix cache config inconsistency between shortfin and sharktank #406

Closed

renxida closed this as completed Feb 5, 2025

renxida self-assigned this Feb 5, 2025

renxida added the enhancement New feature or request label Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatches between config.json exported by export_paged_llm_v1.py and expected by shortfin #405

Mismatches between config.json exported by export_paged_llm_v1.py and expected by shortfin #405

renxida commented Nov 1, 2024 •

edited

Loading

renxida commented Nov 1, 2024

renxida commented Feb 5, 2025

Mismatches between config.json exported by export_paged_llm_v1.py and expected by shortfin #405

Mismatches between config.json exported by export_paged_llm_v1.py and expected by shortfin #405

Comments

renxida commented Nov 1, 2024 • edited Loading

renxida commented Nov 1, 2024

renxida commented Feb 5, 2025

renxida commented Nov 1, 2024 •

edited

Loading