We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi all,
我当前的操作是 1、clone最新代码,交叉编译MNN,运行在arm linux上 选项如下: -DMNN_OPENCL=ON -DMNN_ARM82=ON -DMNN_LOW_MEMORY=ON -DMNN_BUILD_DEMO=ON -DMNN_BUILD_CONVERTER=ON -DMNN_BUILD_TOOLS=ON -DMNN_EVALUATION=ON -DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON -DMNN_BUILD_LLM=ON -DMNN_SUPPORT_TRANSFORMER_FUSE=ON
目标板是一个arm,带有mali gpu. 运行llm_demo llm.mnn 可以默认cpu方式运行起来。
当修改config.json 如下后
{ "llm_model": "llm.mnn", "llm_weight": "llm.mnn.weight", "backend_type": "opencl", "thread_num": 16, "precision": "low", "memory": "low" }
再次运行llm_demo config.json 方式,程序crash
model path is config.json CPU Group: [ 4 5 0 1 2 3 ], 1500000 - 2000000 The device supports: i8sdot:1, fp16:1, i8mm: 0, sve2: 0 Can't open file:./mnn_cachefile.bin Load Cache file error. load tokenizer tokenizer_type = 3 load tokenizer Done LLVM ERROR: Cannot select: intrinsic %llvm.bifrost.2586 Aborted (core dumped)
gdb 的栈信息如下
(gdb) bt #0 __pthread_kill_implementation (threadid=281474842458496, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44 #1 0x0000fffff68e0a64 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 #2 0x0000fffff689a76c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x0000fffff68874bc in __GI_abort () at ./stdlib/abort.c:79 #4 0x0000fffff5b8db40 in ?? () from /usr/lib/libmali.so #5 0x0000fffff5072034 in ?? () from /usr/lib/libmali.so #6 0x0000fffff50730e4 in ?? () from /usr/lib/libmali.so #7 0x0000fffff4a39d44 in ?? () from /usr/lib/libmali.so #8 0x0000fffff50710c0 in ?? () from /usr/lib/libmali.so #9 0x0000fffff5078a7c in ?? () from /usr/lib/libmali.so #10 0x0000fffff507b414 in ?? () from /usr/lib/libmali.so #11 0x0000fffff507d424 in ?? () from /usr/lib/libmali.so #12 0x0000fffff52e7c04 in ?? () from /usr/lib/libmali.so #13 0x0000fffff4e86c90 in ?? () from /usr/lib/libmali.so #14 0x0000fffff4e8b740 in ?? () from /usr/lib/libmali.so #15 0x0000fffff4e8edb8 in ?? () from /usr/lib/libmali.so #16 0x0000fffff3368c1c in ?? () from /usr/lib/libmali.so #17 0x0000fffff3369834 in ?? () from /usr/lib/libmali.so #18 0x0000fffff3369ccc in ?? () from /usr/lib/libmali.so #19 0x0000fffff33202a8 in ?? () from /usr/lib/libmali.so #20 0x0000fffff32f29dc in ?? () from /usr/lib/libmali.so #21 0x0000fffff3288e3c in ?? () from /usr/lib/libmali.so #22 0x0000fffff32a1934 in ?? () from /usr/lib/libmali.so #23 0x0000fffff32a19d8 in ?? () from /usr/lib/libmali.so #24 0x0000fffff32a1ce8 in ?? () from /usr/lib/libmali.so #25 0x0000fffff327e0f0 in clBuildProgram () from /usr/lib/libmali.so #26 0x0000fffff6826730 in clBuildProgram () from /usr/lib/libOpenCL.so #27 0x0000fffff7aaea50 in clBuildProgram (program=0x2584a60, num_devices=1, device_list=0x2584a30, --Type <RET> for more, q to quit, c to continue without paging-- options=0x257edf0 "-DFLOAT=half -DFLOAT2=half2 -DFLOAT3=half3 -DFLOAT4=half4 -DFLOAT8=half8 -DFLOAT16=half16 -DCOMPUTE_FLOAT=half -DCOMPUTE_FLOAT2=half2 -DCOMPUTE_FLOAT3=half3 -DCOMPUTE_FLOAT4=half4 -DCOMPUTE_FLOAT8=ha"..., pfn_notify=0x0, user_data=0x0) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/core/runtime/OpenCLWrapper.cpp:443 #28 0x0000fffff7a7d6c8 in cl::Program::build (this=0xffffffffcda8, devices=std::vector of length 1, capacity 1 = {...}, options=0x257edf0 "-DFLOAT=half -DFLOAT2=half2 -DFLOAT3=half3 -DFLOAT4=half4 -DFLOAT8=half8 -DFLOAT16=half16 -DCOMPUTE_FLOAT=half -DCOMPUTE_FLOAT2=half2 -DCOMPUTE_FLOAT3=half3 -DCOMPUTE_FLOAT4=half4 -DCOMPUTE_FLOAT8=ha"..., notifyFptr=0x0, data=0x0) at /home/disk1/junjie/armgpu/MNN/3rd_party/OpenCLHeaders/CL/cl2.hpp:6376 #29 0x0000fffff7a826c0 in MNN::OpenCLRuntime::buildProgram (this=0x4c2780, buildOptionsStr="-DFLOAT=half -DFLOAT2=half2 -DFLOAT3=half3 -DFLOAT4=half4 -DFLOAT8=half8 -DFLOAT16=half16 -DCOMPUTE_FLOAT=half -DCOMPUTE_FLOAT2=half2 -DCOMPUTE_FLOAT3=half3 -DCOMPUTE_FLOAT4=half4 -DCOMPUTE_FLOAT8=ha"..., program=0xffffffffcda8) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/core/runtime/OpenCLRuntime.cpp:547 #30 0x0000fffff7a836d0 in MNN::OpenCLRuntime::buildKernelWithCache (this=0x4c2780, programName="buffer_convert_quant", kernelName="conv2d_1x1_weight_quant_image", buildOptions=std::set with 1 element = {...}, input=0x0, output=0x0, useCache=true) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/core/runtime/OpenCLRuntime.cpp:714 #31 0x0000fffff7aca834 in MNN::OpenCL::ConvBufLowMemoryExecution::convertToQuantWeight1x1Buffer (this=0x2436e60, input=..., packCin=4, packCout=8) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/execution/buffer/ConvBufLowMemoryExecution.cpp:153 #32 0x0000fffff7acb278 in MNN::OpenCL::ConvBufLowMemoryExecution::set1x1WeightLowMemory (this=0x2436e60, packCout=8, packCin=4, filterDataPtr=0x248aa80, quanCommon=std::shared_ptr<MNN::ConvolutionCommon::Int8Common> (use count 1, weak count 0) = {...}) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/execution/buffer/ConvBufLowMemoryExecution.cpp:235 #33 0x0000fffff7ad1490 in MNN::OpenCL::ConvBufLowMemoryExecution::ConvBufLowMemoryExecution (this=0x2436e60, inputs=std::vector of length 1, capacity 1 = {...}, outputs=std::vector of length 1, capacity 1 = {...}, op=0x2477ae8, backend=0x2466900) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/execution/buffer/ConvBufLowMemoryExecution.cpp:767 #34 0x0000fffff7ac523c in MNN::OpenCL::ConvolutionBufCreator::onCreate (this=0x4422a0, inputs=std::vector of length 1, capacity 1 = {...}, outputs=std::vector of length 1, capacity 1 = {...}, --Type <RET> for more, q to quit, c to continue without paging-- op=0x2477ae8, backend=0x2466900) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/execution/buffer/ConvBufExecution.cpp:799 #35 0x0000fffff7a56554 in MNN::OpenCL::OpenCLBackend::onCreate (this=0x2466900, inputs=std::vector of length 1, capacity 1 = {...}, outputs=std::vector of length 1, capacity 1 = {...}, op=0x2477ae8) at /home/disk1/junjie/armgpu/MNN/source/backend/opencl/core/OpenCLBackend.cpp:563 #36 0x0000fffff7146a54 in MNN::OpCommonUtils::createExecutionWithExternal (backend=0x2466900, inputs=std::vector of length 1, capacity 1 = {...}, outputs=std::vector of length 1, capacity 1 = {...}, op=0x22bfe70, externalFile=0xffffffffd758, tmpstore=std::shared_ptr<MNN::BufferStorage> (empty) = {...}) at /home/disk1/junjie/armgpu/MNN/source/core/OpCommonUtils.cpp:722 #37 0x0000fffff77f0fa0 in MNN::Express::preRearrangeWeights (scheduleInfo=..., firstbackend=0x2466900, backupBackend=0x185abb0, base=0x0) at /home/disk1/junjie/armgpu/MNN/express/module/StaticModule.cpp:121 #38 0x0000fffff77f2660 in MNN::Express::StaticModule::StaticModule (this=0x24667c0, inputs=std::vector of length 3, capacity 3 = {...}, outputs=std::vector of length 1, capacity 1 = {...}, buffer=..., scheduleInfo=..., sharedConst=std::shared_ptr<MNN::Schedule::ScheduleInfo> (use count 4, weak count 0) = {...}, mode=..., rtm=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=...) at /home/disk1/junjie/armgpu/MNN/express/module/StaticModule.cpp:317 #39 0x0000fffff77e275c in MNN::Express::_createSubModule (bufferStorage=std::shared_ptr<MNN::BufferStorage> (use count 4, weak count 0) = {...}, info=..., subs=std::map with 0 elements, sharedConst=std::shared_ptr<MNN::Schedule::ScheduleInfo> (use count 4, weak count 0) = {...}, config=..., runtimeConfig=...) at /home/disk1/junjie/armgpu/MNN/express/module/PipelineModule.cpp:672 #40 0x0000fffff77e3708 in MNN::Express::PipelineModule::load (inputs=std::vector of length 3, capacity 3 = {...}, outputs=std::vector of length 1, capacity 1 = {...}, bufferStorage=std::shared_ptr<MNN::BufferStorage> (use count 4, weak count 0) = {...}, rtMgr=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=0xffffffffec40, subGraphMap=std::map with 0 elements) at /home/disk1/junjie/armgpu/MNN/express/module/PipelineModule.cpp:796 #41 0x0000fffff77e2c2c in MNN::Express::PipelineModule::load (inputs=std::vector of length 3, capacity 3 = {...}, outputs=std::vector of length 1, capacity 1 = {...}, buffer=0xffffee6e0040 " ", length=733288, rtMgr=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=0xffffffffec40) at /home/disk1/junjie/armgpu/MNN/express/module/PipelineModule.cpp:711 #42 0x0000fffff77d13b8 in MNN::Express::loadInternal (inputs=std::vector of length 3, capacity 3 = {...}, outputs=std::vector of length 1, capacity 1 = {...}, buffer=0xffffee6e0040 " ", length=733288, _rtMgr=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=0xffffffffec40) at /home/disk1/junjie/armgpu/MNN/express/module/Module.cpp:407 #43 0x0000fffff77d0e7c in MNN::Express::Module::load (inputs=std::vector of length 3, capacity 3 = {...}, outputs=std::vector of length 1, capacity 1 = {...}, fileName=0xffffffffec30 "./llm.mnn", _rtMgr=std::shared_ptr<MNN::Express::Executor::RuntimeManager> (use count 11, weak count 0) = {...}, config=0xffffffffec40) at /home/disk1/junjie/armgpu/MNN/express/module/Module.cpp:351 #44 0x0000fffff7e70d80 in MNN::Transformer::Llm::load (this=0x44f590) at /home/disk1/junjie/armgpu/MNN/transformers/llm/engine/src/llm.cpp:319 #45 0x00000000004099d8 in main (argc=2, argv=0xfffffffff228) at /home/disk1/junjie/armgpu/MNN/transformers/llm/engine/llm_demo.cpp:194
看起来是cl代码编译不过。 用gpu backend跑可行吗?
The text was updated successfully, but these errors were encountered:
看上去是这台设备 gpu 驱动问题。对应的 mali 型号是什么?可以向设备厂商反应 bug 。 也可以把 config.json precision 设成 high 试一下,可能是这个驱动编译 fp16 的 kernel 有问题
Sorry, something went wrong.
No branches or pull requests
Hi all,
我当前的操作是
1、clone最新代码,交叉编译MNN,运行在arm linux上
选项如下:
-DMNN_OPENCL=ON -DMNN_ARM82=ON -DMNN_LOW_MEMORY=ON -DMNN_BUILD_DEMO=ON -DMNN_BUILD_CONVERTER=ON -DMNN_BUILD_TOOLS=ON -DMNN_EVALUATION=ON -DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON -DMNN_BUILD_LLM=ON -DMNN_SUPPORT_TRANSFORMER_FUSE=ON
目标板是一个arm,带有mali gpu.
运行llm_demo llm.mnn 可以默认cpu方式运行起来。
当修改config.json 如下后
再次运行llm_demo config.json 方式,程序crash
gdb 的栈信息如下
看起来是cl代码编译不过。
用gpu backend跑可行吗?
The text was updated successfully, but these errors were encountered: