llm demo crash for backend opencl #3175

junka · 2025-01-24T07:23:18Z

Hi all,

我当前的操作是
1、clone最新代码，交叉编译MNN，运行在arm linux上
选项如下：
-DMNN_OPENCL=ON -DMNN_ARM82=ON -DMNN_LOW_MEMORY=ON -DMNN_BUILD_DEMO=ON -DMNN_BUILD_CONVERTER=ON -DMNN_BUILD_TOOLS=ON -DMNN_EVALUATION=ON -DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON -DMNN_BUILD_LLM=ON -DMNN_SUPPORT_TRANSFORMER_FUSE=ON

目标板是一个arm，带有mali gpu.
运行llm_demo llm.mnn 可以默认cpu方式运行起来。

当修改config.json 如下后

{
    "llm_model": "llm.mnn",
    "llm_weight": "llm.mnn.weight",
    "backend_type": "opencl",
    "thread_num": 16,
    "precision": "low",
    "memory": "low"
}

再次运行llm_demo config.json 方式，程序crash

model path is config.json
CPU Group: [ 4  5  0  1  2  3 ], 1500000 - 2000000
The device supports: i8sdot:1, fp16:1, i8mm: 0, sve2: 0
Can't open file:./mnn_cachefile.bin
Load Cache file error.
load tokenizer
tokenizer_type = 3
load tokenizer Done
LLVM ERROR: Cannot select: intrinsic %llvm.bifrost.2586
Aborted (core dumped)

gdb 的栈信息如下

(gdb) bt
#0  __pthread_kill_implementation (threadid=281474842458496, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at ./nptl/pthread_kill.c:44
#1  0x0000fffff68e0a64 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x0000fffff689a76c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x0000fffff68874bc in __GI_abort () at ./stdlib/abort.c:79
#4  0x0000fffff5b8db40 in ?? () from /usr/lib/libmali.so
#5  0x0000fffff5072034 in ?? () from /usr/lib/libmali.so
#6  0x0000fffff50730e4 in ?? () from /usr/lib/libmali.so
#7  0x0000fffff4a39d44 in ?? () from /usr/lib/libmali.so
#8  0x0000fffff50710c0 in ?? () from /usr/lib/libmali.so
#9  0x0000fffff5078a7c in ?? () from /usr/lib/libmali.so
#10 0x0000fffff507b414 in ?? () from /usr/lib/libmali.so
#11 0x0000fffff507d424 in ?? () from /usr/lib/libmali.so
#12 0x0000fffff52e7c04 in ?? () from /usr/lib/libmali.so
#13 0x0000fffff4e86c90 in ?? () from /usr/lib/libmali.so
#14 0x0000fffff4e8b740 in ?? () from /usr/lib/libmali.so
#15 0x0000fffff4e8edb8 in ?? () from /usr/lib/libmali.so
#16 0x0000fffff3368c1c in ?? () from /usr/lib/libmali.so
#17 0x0000fffff3369834 in ?? () from /usr/lib/libmali.so
#18 0x0000fffff3369ccc in ?? () from /usr/lib/libmali.so
#19 0x0000fffff33202a8 in ?? () from /usr/lib/libmali.so
#20 0x0000fffff32f29dc in ?? () from /usr/lib/libmali.so
#21 0x0000fffff3288e3c in ?? () from /usr/lib/libmali.so
#22 0x0000fffff32a1934 in ?? () from /usr/lib/libmali.so
#23 0x0000fffff32a19d8 in ?? () from /usr/lib/libmali.so
#24 0x0000fffff32a1ce8 in ?? () from /usr/lib/libmali.so
#25 0x0000fffff327e0f0 in clBuildProgram () from /usr/lib/libmali.so
#26 0x0000fffff6826730 in clBuildProgram () from /usr/lib/libOpenCL.so
#27 0x0000fffff7aaea50 in clBuildProgram (program=0x2584a60, num_devices=1, device_list=0x2584a30,
--Type <RET> for more, q to quit, c to continue without paging--
    options=0x257edf0 "-DFLOAT=half -DFLOAT2=half2 -DFLOAT3=half3 -DFLOAT4=half4 -DFLOAT8=half8 -DFLOAT16=half16 -DCOMPUTE_FLOAT=half  -DCOMPUTE_FLOAT2=half2 -DCOMPUTE_FLOAT3=half3 -DCOMPUTE_FLOAT4=half4 -DCOMPUTE_FLOAT8=ha"...,
    pfn_notify=0x0, user_data=0x0)
    at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/core/runtime/OpenCLWrapper.cpp:443
#28 0x0000fffff7a7d6c8 in cl::Program::build (this=0xffffffffcda8,
    devices=std::vector of length 1, capacity 1 = {...},
    options=0x257edf0 "-DFLOAT=half -DFLOAT2=half2 -DFLOAT3=half3 -DFLOAT4=half4 -DFLOAT8=half8 -DFLOAT16=half16 -DCOMPUTE_FLOAT=half  -DCOMPUTE_FLOAT2=half2 -DCOMPUTE_FLOAT3=half3 -DCOMPUTE_FLOAT4=half4 -DCOMPUTE_FLOAT8=ha"...,
    notifyFptr=0x0, data=0x0) at /home/disk1/junjie/armgpu/MNN/3rd_party/OpenCLHeaders/CL/cl2.hpp:6376
#29 0x0000fffff7a826c0 in MNN::OpenCLRuntime::buildProgram (this=0x4c2780,
    buildOptionsStr="-DFLOAT=half -DFLOAT2=half2 -DFLOAT3=half3 -DFLOAT4=half4 -DFLOAT8=half8 -DFLOAT16=half16 -DCOMPUTE_FLOAT=half  -DCOMPUTE_FLOAT2=half2 -DCOMPUTE_FLOAT3=half3 -DCOMPUTE_FLOAT4=half4 -DCOMPUTE_FLOAT8=ha"...,
    program=0xffffffffcda8) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/core/runtime/OpenCLRuntime.cpp:547
#30 0x0000fffff7a836d0 in MNN::OpenCLRuntime::buildKernelWithCache (this=0x4c2780, programName="buffer_convert_quant",
    kernelName="conv2d_1x1_weight_quant_image", buildOptions=std::set with 1 element = {...}, input=0x0, output=0x0,
    useCache=true) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/core/runtime/OpenCLRuntime.cpp:714
#31 0x0000fffff7aca834 in MNN::OpenCL::ConvBufLowMemoryExecution::convertToQuantWeight1x1Buffer (this=0x2436e60,
    input=..., packCin=4, packCout=8)
    at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/execution/buffer/ConvBufLowMemoryExecution.cpp:153
#32 0x0000fffff7acb278 in MNN::OpenCL::ConvBufLowMemoryExecution::set1x1WeightLowMemory (this=0x2436e60, packCout=8,
    packCin=4, filterDataPtr=0x248aa80,
    quanCommon=std::shared_ptr<MNN::ConvolutionCommon::Int8Common> (use count 1, weak count 0) = {...})
    at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/execution/buffer/ConvBufLowMemoryExecution.cpp:235
#33 0x0000fffff7ad1490 in MNN::OpenCL::ConvBufLowMemoryExecution::ConvBufLowMemoryExecution (this=0x2436e60,
    inputs=std::vector of length 1, capacity 1 = {...}, outputs=std::vector of length 1, capacity 1 = {...},
    op=0x2477ae8, backend=0x2466900)
    at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/execution/buffer/ConvBufLowMemoryExecution.cpp:767
#34 0x0000fffff7ac523c in MNN::OpenCL::ConvolutionBufCreator::onCreate (this=0x4422a0,
    inputs=std::vector of length 1, capacity 1 = {...}, outputs=std::vector of length 1, capacity 1 = {...},
--Type <RET> for more, q to quit, c to continue without paging--
    op=0x2477ae8, backend=0x2466900) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/execution/buffer/ConvBufExecution.cpp:799
#35 0x0000fffff7a56554 in MNN::OpenCL::OpenCLBackend::onCreate (this=0x2466900, inputs=std::vector of length 1, capacity 1 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, op=0x2477ae8) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/core/OpenCLBackend.cpp:563
#36 0x0000fffff7146a54 in MNN::OpCommonUtils::createExecutionWithExternal (backend=0x2466900, inputs=std::vector of length 1, capacity 1 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, op=0x22bfe70, externalFile=0xffffffffd758,
    tmpstore=std::shared_ptr<MNN::BufferStorage> (empty) = {...}) at /home/disk1/junjie/armgpu/MNN/source/core/OpCommonUtils.cpp:722
#37 0x0000fffff77f0fa0 in MNN::Express::preRearrangeWeights (scheduleInfo=..., firstbackend=0x2466900, backupBackend=0x185abb0, base=0x0)
    at /home/disk1/junjie/armgpu/MNN/express/module/StaticModule.cpp:121
#38 0x0000fffff77f2660 in MNN::Express::StaticModule::StaticModule (this=0x24667c0, inputs=std::vector of length 3, capacity 3 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, buffer=..., scheduleInfo=...,
    sharedConst=std::shared_ptr<MNN::Schedule::ScheduleInfo> (use count 4, weak count 0) = {...}, mode=...,
    rtm=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=...)
    at /home/disk1/junjie/armgpu/MNN/express/module/StaticModule.cpp:317
#39 0x0000fffff77e275c in MNN::Express::_createSubModule (bufferStorage=std::shared_ptr<MNN::BufferStorage> (use count 4, weak count 0) = {...}, info=...,
    subs=std::map with 0 elements, sharedConst=std::shared_ptr<MNN::Schedule::ScheduleInfo> (use count 4, weak count 0) = {...}, config=...,
    runtimeConfig=...) at /home/disk1/junjie/armgpu/MNN/express/module/PipelineModule.cpp:672
#40 0x0000fffff77e3708 in MNN::Express::PipelineModule::load (inputs=std::vector of length 3, capacity 3 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, bufferStorage=std::shared_ptr<MNN::BufferStorage> (use count 4, weak count 0) = {...},
    rtMgr=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=0xffffffffec40,
    subGraphMap=std::map with 0 elements) at /home/disk1/junjie/armgpu/MNN/express/module/PipelineModule.cpp:796
#41 0x0000fffff77e2c2c in MNN::Express::PipelineModule::load (inputs=std::vector of length 3, capacity 3 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, buffer=0xffffee6e0040 " ", length=733288,
    rtMgr=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=0xffffffffec40)
    at /home/disk1/junjie/armgpu/MNN/express/module/PipelineModule.cpp:711
#42 0x0000fffff77d13b8 in MNN::Express::loadInternal (inputs=std::vector of length 3, capacity 3 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, buffer=0xffffee6e0040 " ", length=733288,
    _rtMgr=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=0xffffffffec40)
    at /home/disk1/junjie/armgpu/MNN/express/module/Module.cpp:407
#43 0x0000fffff77d0e7c in MNN::Express::Module::load (inputs=std::vector of length 3, capacity 3 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, fileName=0xffffffffec30 "./llm.mnn",
    _rtMgr=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=0xffffffffec40)
    at /home/disk1/junjie/armgpu/MNN/express/module/Module.cpp:351
#44 0x0000fffff7e70d80 in MNN::Transformer::Llm::load (this=0x44f590) at /home/disk1/junjie/armgpu/MNN/transformers/llm/engine/src/llm.cpp:319
#45 0x00000000004099d8 in main (argc=2, argv=0xfffffffff228) at /home/disk1/junjie/armgpu/MNN/transformers/llm/engine/llm_demo.cpp:194

看起来是cl代码编译不过。
用gpu backend跑可行吗？

The text was updated successfully, but these errors were encountered:

jxt1234 · 2025-01-25T13:36:27Z

看上去是这台设备 gpu 驱动问题。对应的 mali 型号是什么？可以向设备厂商反应 bug 。
也可以把 config.json precision 设成 high 试一下，可能是这个驱动编译 fp16 的 kernel 有问题

jxt1234 added the OpenCL label Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm demo crash for backend opencl #3175

llm demo crash for backend opencl #3175

junka commented Jan 24, 2025 •

edited

Loading

jxt1234 commented Jan 25, 2025

llm demo crash for backend opencl #3175

llm demo crash for backend opencl #3175

Comments

junka commented Jan 24, 2025 • edited Loading

jxt1234 commented Jan 25, 2025

junka commented Jan 24, 2025 •

edited

Loading