Fix cache config inconsistency between shortfin and sharktank #406

renxida · 2024-11-01T02:56:02Z

Up till now we have been patching the configs to reorganize it after exporting from sharktank, but for llama3 8b (which has grouped attention), the config doesn't specify grouping, so this has to be fixed.

Might be a good idea to rename all attn to attention for consistency. But I want this PR to just fix the issue at hand.

After this fix, the only required bit of editing is setting config.paged_kv_cache.device_block_count

renxida · 2024-11-01T04:17:32Z

Proof it works.

Tested with script:

https://gist.github.com/renxida/2023ec7f9931c697879762c9330de001

archana-ramalingam · 2024-11-01T17:02:28Z

sharktank/sharktank/examples/export_paged_llm_v1.py

-            "block_seq_stride": llama_config.block_seq_stride,
+            "paged_kv_cache": {
+                "block_seq_stride": llama_config.block_seq_stride,
+                "attn_head_count_kv": hp.attention_head_count_kv,


Just FYI, in sharktank's llm_configs.py, hp.attention_head_count_kv==hp.attention_head_count.

If you req vmfb and irpa for shortfin, is there a reason we rely on generate_params_json params instead of the irpa params? There are ways to load & access irpa params directly in sharktank. Do we want/have something similar in shortfin?

Oh ty! i should probably take a deeper look then to figure this part out.

renxida · 2024-12-12T17:39:05Z

Replaced by #487

renxida requested review from rsuderman and stbaione November 1, 2024 02:56

renxida force-pushed the cache-config-consistency branch from 754ae09 to 93c0d19 Compare November 1, 2024 03:34

renxida added 2 commits October 31, 2024 20:46

initial fix for cache config inconsistency

acb8927

update kvcache test to match new config format

cd309ac

renxida force-pushed the cache-config-consistency branch from 93c0d19 to cd309ac Compare November 1, 2024 03:47

renxida marked this pull request as ready for review November 1, 2024 14:35

archana-ramalingam reviewed Nov 1, 2024

View reviewed changes

renxida marked this pull request as draft November 1, 2024 17:07

renxida closed this Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cache config inconsistency between shortfin and sharktank #406

Fix cache config inconsistency between shortfin and sharktank #406

renxida commented Nov 1, 2024 •

edited

Loading

renxida commented Nov 1, 2024

archana-ramalingam Nov 1, 2024

archana-ramalingam Nov 1, 2024

renxida Nov 1, 2024

renxida commented Dec 12, 2024

Fix cache config inconsistency between shortfin and sharktank #406

Fix cache config inconsistency between shortfin and sharktank #406

Conversation

renxida commented Nov 1, 2024 • edited Loading

renxida commented Nov 1, 2024

archana-ramalingam Nov 1, 2024

Choose a reason for hiding this comment

archana-ramalingam Nov 1, 2024

Choose a reason for hiding this comment

renxida Nov 1, 2024

Choose a reason for hiding this comment

renxida commented Dec 12, 2024

renxida commented Nov 1, 2024 •

edited

Loading