Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llm demo crash for backend opencl #3175

Open
junka opened this issue Jan 24, 2025 · 1 comment
Open

llm demo crash for backend opencl #3175

junka opened this issue Jan 24, 2025 · 1 comment
Labels

Comments

@junka
Copy link

junka commented Jan 24, 2025

Hi all,

我当前的操作是
1、clone最新代码,交叉编译MNN,运行在arm linux上
选项如下:
-DMNN_OPENCL=ON -DMNN_ARM82=ON -DMNN_LOW_MEMORY=ON -DMNN_BUILD_DEMO=ON -DMNN_BUILD_CONVERTER=ON -DMNN_BUILD_TOOLS=ON -DMNN_EVALUATION=ON -DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON -DMNN_BUILD_LLM=ON -DMNN_SUPPORT_TRANSFORMER_FUSE=ON

目标板是一个arm,带有mali gpu.
运行llm_demo llm.mnn 可以默认cpu方式运行起来。

当修改config.json 如下后

{
    "llm_model": "llm.mnn",
    "llm_weight": "llm.mnn.weight",
    "backend_type": "opencl",
    "thread_num": 16,
    "precision": "low",
    "memory": "low"
}

再次运行llm_demo config.json 方式,程序crash

model path is config.json
CPU Group: [ 4  5  0  1  2  3 ], 1500000 - 2000000
The device supports: i8sdot:1, fp16:1, i8mm: 0, sve2: 0
Can't open file:./mnn_cachefile.bin
Load Cache file error.
load tokenizer
tokenizer_type = 3
load tokenizer Done
LLVM ERROR: Cannot select: intrinsic %llvm.bifrost.2586
Aborted (core dumped)

gdb 的栈信息如下

(gdb) bt
#0  __pthread_kill_implementation (threadid=281474842458496, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at ./nptl/pthread_kill.c:44
#1  0x0000fffff68e0a64 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x0000fffff689a76c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x0000fffff68874bc in __GI_abort () at ./stdlib/abort.c:79
#4  0x0000fffff5b8db40 in ?? () from /usr/lib/libmali.so
#5  0x0000fffff5072034 in ?? () from /usr/lib/libmali.so
#6  0x0000fffff50730e4 in ?? () from /usr/lib/libmali.so
#7  0x0000fffff4a39d44 in ?? () from /usr/lib/libmali.so
#8  0x0000fffff50710c0 in ?? () from /usr/lib/libmali.so
#9  0x0000fffff5078a7c in ?? () from /usr/lib/libmali.so
#10 0x0000fffff507b414 in ?? () from /usr/lib/libmali.so
#11 0x0000fffff507d424 in ?? () from /usr/lib/libmali.so
#12 0x0000fffff52e7c04 in ?? () from /usr/lib/libmali.so
#13 0x0000fffff4e86c90 in ?? () from /usr/lib/libmali.so
#14 0x0000fffff4e8b740 in ?? () from /usr/lib/libmali.so
#15 0x0000fffff4e8edb8 in ?? () from /usr/lib/libmali.so
#16 0x0000fffff3368c1c in ?? () from /usr/lib/libmali.so
#17 0x0000fffff3369834 in ?? () from /usr/lib/libmali.so
#18 0x0000fffff3369ccc in ?? () from /usr/lib/libmali.so
#19 0x0000fffff33202a8 in ?? () from /usr/lib/libmali.so
#20 0x0000fffff32f29dc in ?? () from /usr/lib/libmali.so
#21 0x0000fffff3288e3c in ?? () from /usr/lib/libmali.so
#22 0x0000fffff32a1934 in ?? () from /usr/lib/libmali.so
#23 0x0000fffff32a19d8 in ?? () from /usr/lib/libmali.so
#24 0x0000fffff32a1ce8 in ?? () from /usr/lib/libmali.so
#25 0x0000fffff327e0f0 in clBuildProgram () from /usr/lib/libmali.so
#26 0x0000fffff6826730 in clBuildProgram () from /usr/lib/libOpenCL.so
#27 0x0000fffff7aaea50 in clBuildProgram (program=0x2584a60, num_devices=1, device_list=0x2584a30,
--Type <RET> for more, q to quit, c to continue without paging--
    options=0x257edf0 "-DFLOAT=half -DFLOAT2=half2 -DFLOAT3=half3 -DFLOAT4=half4 -DFLOAT8=half8 -DFLOAT16=half16 -DCOMPUTE_FLOAT=half  -DCOMPUTE_FLOAT2=half2 -DCOMPUTE_FLOAT3=half3 -DCOMPUTE_FLOAT4=half4 -DCOMPUTE_FLOAT8=ha"...,
    pfn_notify=0x0, user_data=0x0)
    at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/core/runtime/OpenCLWrapper.cpp:443
#28 0x0000fffff7a7d6c8 in cl::Program::build (this=0xffffffffcda8,
    devices=std::vector of length 1, capacity 1 = {...},
    options=0x257edf0 "-DFLOAT=half -DFLOAT2=half2 -DFLOAT3=half3 -DFLOAT4=half4 -DFLOAT8=half8 -DFLOAT16=half16 -DCOMPUTE_FLOAT=half  -DCOMPUTE_FLOAT2=half2 -DCOMPUTE_FLOAT3=half3 -DCOMPUTE_FLOAT4=half4 -DCOMPUTE_FLOAT8=ha"...,
    notifyFptr=0x0, data=0x0) at /home/disk1/junjie/armgpu/MNN/3rd_party/OpenCLHeaders/CL/cl2.hpp:6376
#29 0x0000fffff7a826c0 in MNN::OpenCLRuntime::buildProgram (this=0x4c2780,
    buildOptionsStr="-DFLOAT=half -DFLOAT2=half2 -DFLOAT3=half3 -DFLOAT4=half4 -DFLOAT8=half8 -DFLOAT16=half16 -DCOMPUTE_FLOAT=half  -DCOMPUTE_FLOAT2=half2 -DCOMPUTE_FLOAT3=half3 -DCOMPUTE_FLOAT4=half4 -DCOMPUTE_FLOAT8=ha"...,
    program=0xffffffffcda8) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/core/runtime/OpenCLRuntime.cpp:547
#30 0x0000fffff7a836d0 in MNN::OpenCLRuntime::buildKernelWithCache (this=0x4c2780, programName="buffer_convert_quant",
    kernelName="conv2d_1x1_weight_quant_image", buildOptions=std::set with 1 element = {...}, input=0x0, output=0x0,
    useCache=true) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/core/runtime/OpenCLRuntime.cpp:714
#31 0x0000fffff7aca834 in MNN::OpenCL::ConvBufLowMemoryExecution::convertToQuantWeight1x1Buffer (this=0x2436e60,
    input=..., packCin=4, packCout=8)
    at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/execution/buffer/ConvBufLowMemoryExecution.cpp:153
#32 0x0000fffff7acb278 in MNN::OpenCL::ConvBufLowMemoryExecution::set1x1WeightLowMemory (this=0x2436e60, packCout=8,
    packCin=4, filterDataPtr=0x248aa80,
    quanCommon=std::shared_ptr<MNN::ConvolutionCommon::Int8Common> (use count 1, weak count 0) = {...})
    at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/execution/buffer/ConvBufLowMemoryExecution.cpp:235
#33 0x0000fffff7ad1490 in MNN::OpenCL::ConvBufLowMemoryExecution::ConvBufLowMemoryExecution (this=0x2436e60,
    inputs=std::vector of length 1, capacity 1 = {...}, outputs=std::vector of length 1, capacity 1 = {...},
    op=0x2477ae8, backend=0x2466900)
    at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/execution/buffer/ConvBufLowMemoryExecution.cpp:767
#34 0x0000fffff7ac523c in MNN::OpenCL::ConvolutionBufCreator::onCreate (this=0x4422a0,
    inputs=std::vector of length 1, capacity 1 = {...}, outputs=std::vector of length 1, capacity 1 = {...},
--Type <RET> for more, q to quit, c to continue without paging--
    op=0x2477ae8, backend=0x2466900) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/execution/buffer/ConvBufExecution.cpp:799
#35 0x0000fffff7a56554 in MNN::OpenCL::OpenCLBackend::onCreate (this=0x2466900, inputs=std::vector of length 1, capacity 1 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, op=0x2477ae8) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/core/OpenCLBackend.cpp:563
#36 0x0000fffff7146a54 in MNN::OpCommonUtils::createExecutionWithExternal (backend=0x2466900, inputs=std::vector of length 1, capacity 1 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, op=0x22bfe70, externalFile=0xffffffffd758,
    tmpstore=std::shared_ptr<MNN::BufferStorage> (empty) = {...}) at /home/disk1/junjie/armgpu/MNN/source/core/OpCommonUtils.cpp:722
#37 0x0000fffff77f0fa0 in MNN::Express::preRearrangeWeights (scheduleInfo=..., firstbackend=0x2466900, backupBackend=0x185abb0, base=0x0)
    at /home/disk1/junjie/armgpu/MNN/express/module/StaticModule.cpp:121
#38 0x0000fffff77f2660 in MNN::Express::StaticModule::StaticModule (this=0x24667c0, inputs=std::vector of length 3, capacity 3 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, buffer=..., scheduleInfo=...,
    sharedConst=std::shared_ptr<MNN::Schedule::ScheduleInfo> (use count 4, weak count 0) = {...}, mode=...,
    rtm=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=...)
    at /home/disk1/junjie/armgpu/MNN/express/module/StaticModule.cpp:317
#39 0x0000fffff77e275c in MNN::Express::_createSubModule (bufferStorage=std::shared_ptr<MNN::BufferStorage> (use count 4, weak count 0) = {...}, info=...,
    subs=std::map with 0 elements, sharedConst=std::shared_ptr<MNN::Schedule::ScheduleInfo> (use count 4, weak count 0) = {...}, config=...,
    runtimeConfig=...) at /home/disk1/junjie/armgpu/MNN/express/module/PipelineModule.cpp:672
#40 0x0000fffff77e3708 in MNN::Express::PipelineModule::load (inputs=std::vector of length 3, capacity 3 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, bufferStorage=std::shared_ptr<MNN::BufferStorage> (use count 4, weak count 0) = {...},
    rtMgr=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=0xffffffffec40,
    subGraphMap=std::map with 0 elements) at /home/disk1/junjie/armgpu/MNN/express/module/PipelineModule.cpp:796
#41 0x0000fffff77e2c2c in MNN::Express::PipelineModule::load (inputs=std::vector of length 3, capacity 3 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, buffer=0xffffee6e0040 " ", length=733288,
    rtMgr=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=0xffffffffec40)
    at /home/disk1/junjie/armgpu/MNN/express/module/PipelineModule.cpp:711
#42 0x0000fffff77d13b8 in MNN::Express::loadInternal (inputs=std::vector of length 3, capacity 3 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, buffer=0xffffee6e0040 " ", length=733288,
    _rtMgr=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=0xffffffffec40)
    at /home/disk1/junjie/armgpu/MNN/express/module/Module.cpp:407
#43 0x0000fffff77d0e7c in MNN::Express::Module::load (inputs=std::vector of length 3, capacity 3 = {...},
    outputs=std::vector of length 1, capacity 1 = {...}, fileName=0xffffffffec30 "./llm.mnn",
    _rtMgr=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=0xffffffffec40)
    at /home/disk1/junjie/armgpu/MNN/express/module/Module.cpp:351
#44 0x0000fffff7e70d80 in MNN::Transformer::Llm::load (this=0x44f590) at /home/disk1/junjie/armgpu/MNN/transformers/llm/engine/src/llm.cpp:319
#45 0x00000000004099d8 in main (argc=2, argv=0xfffffffff228) at /home/disk1/junjie/armgpu/MNN/transformers/llm/engine/llm_demo.cpp:194

看起来是cl代码编译不过。
用gpu backend跑可行吗?

@jxt1234 jxt1234 added the OpenCL label Jan 25, 2025
@jxt1234
Copy link
Collaborator

jxt1234 commented Jan 25, 2025

看上去是这台设备 gpu 驱动问题。对应的 mali 型号是什么?可以向设备厂商反应 bug 。
也可以把 config.json precision 设成 high 试一下,可能是这个驱动编译 fp16 的 kernel 有问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants