Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paged-attn:cuda fallback bf16 for compute_cap < 8.0 #1040

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

haricot
Copy link
Contributor

@haricot haricot commented Jan 8, 2025

tested with PR EricLBuehler/candle#57 and works with:

nvidia-smi   --query-gpu="compute_cap"  --format=csv
compute_cap
6.1
cargo run -F "cuda cudnn" -r -- --throughput -i plain -m meta-llama/Llama-3.2-1B-Instruct  --dtype bf16
cargo run -F "cuda cudnn" -r -- --throughput -i plain -m Qwen/Qwen2.5-Coder-3B-Instruct --dtype bf16

Copy link

github-actions bot commented Jan 8, 2025

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           41           22           10            9
 JSON                   12          105          104            0            1
 Python                 64         2729         2359           71          299
 Shell                   1           57           22           18           17
 Plain Text              3         3723            0         2413         1310
 TOML                   18          609          542            2           65
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          205          178            1           26
 (Total)                            282          210           32           40
-------------------------------------------------------------------------------
 Markdown               44         3439            0         2611          828
 |- BASH                 6          103          100            0            3
 |- JSON                 1           12           12            0            0
 |- Python               7          121          109            0           12
 |- Rust                13          440          373            0           67
 |- TOML                 2           75           63            0           12
 (Total)                           4190          657         2611          922
-------------------------------------------------------------------------------
 Rust                  298        91486        82139         1880         7467
 |- Markdown           144         1600           25         1454          121
 (Total)                          93086        82164         3334         7588
===============================================================================
 Total                 449       102245        85235         7007        10003
===============================================================================
  

@haricot haricot marked this pull request as draft January 18, 2025 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant