-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add flux example #1126
Open
XuZhang99
wants to merge
12
commits into
main
Choose a base branch
from
flux_example
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
add flux example #1126
Changes from 9 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
c85f228
add flux example
XuZhang99 084f13e
change sd3 example
XuZhang99 e52a7da
add run benchmark script
XuZhang99 3e9336b
change func name
XuZhang99 11a183b
add quantization for oneflow
XuZhang99 6da355b
remove useless code
XuZhang99 bb3ba7a
support 4090
XuZhang99 569a6c5
add comments
XuZhang99 898ea57
add quantization support for flux on 4090
XuZhang99 d2932fc
fix name issue
XuZhang99 b13521c
add flux readme
XuZhang99 9319cca
add todo
XuZhang99 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,166 @@ | ||
#!/bin/bash | ||
set -e | ||
|
||
# indicate which model to run | ||
# e.g. ./run_benchmark.sh sd15,sd21,sdxl or ./run_benchmark.sh all | ||
run_model=$1 | ||
|
||
|
||
|
||
# set environment variables | ||
export NEXFORT_GRAPH_CACHE=1 | ||
export NEXFORT_FX_FORCE_TRITON_SDPA=1 | ||
|
||
|
||
# model path | ||
model_dir="/data1/hf_model" | ||
sd15_path="${model_dir}/stable-diffusion-v1-5" | ||
sd21_path="${model_dir}/stable-diffusion-2-1" | ||
sdxl_path="${model_dir}/stable-diffusion-xl-base-1.0" | ||
sd3_path="/data1/home/zhangxu/stable-diffusion-3-medium-diffusers" | ||
flux_dev_path="${model_dir}/FLUX.1-dev/snapshots/0ef5fff789c832c5c7f4e127f94c8b54bbcced44" | ||
flux_schell_path="${model_dir}/FLUX.1-schnell" | ||
|
||
# get current time | ||
current_time=$(date +"%Y-%m-%d") | ||
echo "Current time: ${current_time}" | ||
|
||
# get NVIDIA GPU name | ||
gpu_name=$(nvidia-smi --query-gpu=gpu_name --format=csv,noheader,nounits | head -n 1 | sed 's/NVIDIA //; s/ /_/g') | ||
|
||
XuZhang99 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# table header | ||
BENCHMARK_RESULT_TEXT="| Data update date (yyyy-mm-dd) | GPU | Model | HxW | Compiler | Quantization | Iteration speed (it/s) | E2E Time (s) | Max used CUDA memory (GiB) | Warmup time (s) |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |\n" | ||
|
||
|
||
prompt="beautiful scenery nature glass bottle landscape, purple galaxy bottle" | ||
quantize_config='{"quant_type": "fp8_e4m3_e4m3_dynamic_per_tensor"}' | ||
|
||
# oneflow 没有compiler_config | ||
#sd15_nexfort_compiler_config="" | ||
#sd21_nexfort_compiler_config="" | ||
#sdxl_nexfort_compiler_config="" | ||
|
||
sd3_nexfort_compiler_config='{"mode": "max-optimize:max-autotune:low-precision:cache-all", "memory_format": "channels_last"}' | ||
flux_nexfort_compiler_config='{"mode": "max-optimize:max-autotune:low-precision", "memory_format": "channels_last"}' | ||
|
||
|
||
# benchmark model with one resolution function | ||
benchmark_model_with_one_resolution() { | ||
# model_name is the name of the model | ||
model_name=$1 | ||
# model_path is the path of the model | ||
model_path=$2 | ||
# steps is the number of inference steps | ||
steps=$3 | ||
# compiler is the compiler used, e.g. none, oneflow, nexfort, transform | ||
compiler=$4 | ||
# compiler_config is the compiler config used | ||
compiler_config=$5 | ||
# height and width are the resolution of the image | ||
height=$6 | ||
width=$7 | ||
# quantize is whether to quantize | ||
quantize=$8 | ||
|
||
echo "Running ${model_path} ${height}x${width}..." | ||
|
||
# if model_name contains sd3, use sd3 script | ||
if [[ "${model_name}" =~ sd3 ]]; then | ||
script_path="onediff_diffusers_extensions/examples/sd3/text_to_image_sd3.py" | ||
# if model_name contains flux, use flux script | ||
elif [[ "${model_name}" =~ flux ]]; then | ||
script_path="onediff_diffusers_extensions/examples/flux/text_to_image_flux.py" | ||
else | ||
# otherwise, use sd script | ||
script_path="benchmarks/text_to_image.py" | ||
fi | ||
|
||
# if quantize is True, add --quantize and --quantize-config | ||
if [[ ${quantize} == True ]]; then | ||
script_output=$(python3 ${script_path} \ | ||
--model ${model_path} --variant fp16 --steps ${steps} \ | ||
--height ${height} --width ${width} --seed 1 \ | ||
--compiler ${compiler} --compiler-config "${compiler_config}" \ | ||
--quantize --quantize-config "${quantize_config}" \ | ||
--prompt "${prompt}" --print-output | tee /dev/tty) | ||
else | ||
script_output=$(python3 ${script_path} \ | ||
--model ${model_path} --variant fp16 --steps ${steps} \ | ||
--height ${height} --width ${width} --seed 1 \ | ||
--compiler ${compiler} --compiler-config "${compiler_config}" \ | ||
--prompt "${prompt}" --print-output | tee /dev/tty) | ||
fi | ||
|
||
# get inference time, iterations per second, max used cuda memory, warmup time | ||
inference_time=$(echo "${script_output}" | grep -oP '(?<=Inference time: )\d+\.\d+') | ||
iterations_per_second=$(echo "${script_output}" | grep -oP '(?<=Iterations per second: )\d+\.\d+') | ||
max_used_cuda_memory=$(echo "${script_output}" | grep -oP '(?<=Max used CUDA memory : )\d+\.\d+') | ||
warmup_time=$(echo "${script_output}" | grep -oP '(?<=Warmup time: )\d+\.\d+') | ||
|
||
# add benchmark result to BENCHMARK_RESULT_TEXT | ||
BENCHMARK_RESULT_TEXT="${BENCHMARK_RESULT_TEXT}| "${current_time}" | "${gpu_name}" | "${model_name}" | ${height}x${width} | ${compiler} | ${quantize} | ${iterations_per_second} | ${inference_time} | ${max_used_cuda_memory} | ${warmup_time} |\n" | ||
XuZhang99 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
# conda init | ||
source ~/miniconda3/etc/profile.d/conda.sh | ||
|
||
XuZhang99 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
######################################### | ||
# if run_model contains sd15 or all, run sd15 | ||
if [[ "${run_model}" =~ sd15|all ]]; then | ||
conda activate oneflow | ||
benchmark_model_with_one_resolution sd15 ${sd15_path} 30 none none 512 512 False | ||
benchmark_model_with_one_resolution sd15 ${sd15_path} 30 oneflow none 512 512 False | ||
benchmark_model_with_one_resolution sd15 ${sd15_path} 30 oneflow none 512 512 True | ||
fi | ||
|
||
# if run_model contains sd21 or all, run sd21 | ||
if [[ "${run_model}" =~ sd21|all ]]; then | ||
# activate oneflow environment | ||
conda activate oneflow | ||
benchmark_model_with_one_resolution sd21 ${sd21_path} 20 none none 768 768 False | ||
benchmark_model_with_one_resolution sd21 ${sd21_path} 20 oneflow none 768 768 False | ||
benchmark_model_with_one_resolution sd21 ${sd21_path} 20 oneflow none 768 768 True | ||
fi | ||
|
||
# if run_model contains sdxl or all, run sdxl | ||
if [[ "${run_model}" =~ sdxl|all ]]; then | ||
# activate oneflow environment | ||
conda activate oneflow | ||
benchmark_model_with_one_resolution sdxl ${sdxl_path} 30 none none 1024 1024 False | ||
benchmark_model_with_one_resolution sdxl ${sdxl_path} 30 oneflow none 1024 1024 False | ||
benchmark_model_with_one_resolution sdxl ${sdxl_path} 30 oneflow none 1024 1024 True | ||
fi | ||
######################################### | ||
|
||
######################################### | ||
# if run_model contains sd3 or all, run sd3 | ||
if [[ "${run_model}" =~ sd3|all ]]; then | ||
conda activate nexfort | ||
# activate nexfort environment | ||
benchmark_model_with_one_resolution sd3 ${sd3_path} 28 none none 1024 1024 False | ||
benchmark_model_with_one_resolution sd3 ${sd3_path} 28 nexfort "${sd3_nexfort_compiler_config}" 1024 1024 False | ||
benchmark_model_with_one_resolution sd3 ${sd3_path} 28 nexfort "${sd3_nexfort_compiler_config}" 1024 1024 True | ||
fi | ||
|
||
# if run_model contains flux or all, run flux | ||
if [[ "${run_model}" =~ flux|all ]]; then | ||
# activate nexfort environment | ||
conda activate nexfort | ||
benchmark_model_with_one_resolution flux_dev ${flux_dev_path} 20 none none 1024 1024 False | ||
benchmark_model_with_one_resolution flux_dev ${flux_dev_path} 20 nexfort "${flux_nexfort_compiler_config}" 1024 1024 False | ||
benchmark_model_with_one_resolution flux_dev ${flux_dev_path} 20 nexfort "${flux_nexfort_compiler_config}" 1024 1024 True | ||
benchmark_model_with_one_resolution flux_dev ${flux_dev_path} 20 transform none 1024 1024 False | ||
|
||
|
||
benchmark_model_with_one_resolution flux_schell ${flux_schell_path} 4 none none 1024 1024 False | ||
benchmark_model_with_one_resolution flux_schell ${flux_schell_path} 4 nexfort "${flux_nexfort_compiler_config}" 1024 1024 False | ||
benchmark_model_with_one_resolution flux_schell ${flux_schell_path} 4 nexfort "${flux_nexfort_compiler_config}" 1024 1024 True | ||
benchmark_model_with_one_resolution flux_schell ${flux_schell_path} 4 transform none 1024 1024 False | ||
fi | ||
XuZhang99 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
######################################### | ||
|
||
|
||
echo -e "\nBenchmark Results:" | ||
# print benchmark result and add benchmark result to markdown file | ||
echo -e ${BENCHMARK_RESULT_TEXT} | tee -a benchmark_result_"${gpu_name}".md | ||
echo -e "\nBenchmark Done!" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid hardcoding paths and add validation.
The script uses hardcoded paths which makes it less portable and could fail silently if models aren't present.
Consider: