Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] longbench报错 #1824

Open
2 tasks done
fxnie opened this issue Jan 15, 2025 · 3 comments
Open
2 tasks done

[Bug] longbench报错 #1824

fxnie opened this issue Jan 15, 2025 · 3 comments
Assignees

Comments

@fxnie
Copy link

fxnie commented Jan 15, 2025

先决条件

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0',
'GPU 0,1,2,3': 'NVIDIA A800-SXM4-40GB',
'MMEngine': '0.10.5',
'MUSA available': False,
'NVCC': 'Cuda compilation tools, release 12.4, V12.4.131',
'OpenCV': '4.9.0',
'PyTorch': '2.5.1+cu124',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2024.2-Product Build 20240605 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.5.3 (Git Hash '
'66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX2\n'
' - CUDA Runtime 12.4\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
' - CuDNN 90.1\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.4, '
'CUDNN_VERSION=9.1.0, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON '
'-DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wsuggest-override '
'-Wno-psabi -Wno-error=old-style-cast '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, '
'USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, '
'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, '
'USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, '
'USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, '
'USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n',
'Python': '3.10.16 | packaged by conda-forge | (main, Dec 5 2024, 14:16:10) '
'[GCC 13.3.0]',
'TorchVision': '0.20.1+cu124',
'lmdeploy': '0.2.2',
'numpy_random_seed': 2147483648,
'opencompass': '0.3.9+7f2aeef',
'sys.platform': 'linux',
'transformers': '4.48.0'}

重现问题 - 代码/配置示例

重现问题 - 命令或脚本

from mmengine.config import read_base
from opencompass.models import OpenAISDK

with read_base():
# from opencompass.configs.datasets.humaneval.humaneval_gen_8e312c import humaneval_datasets # noqa: F401, F403
# from opencompass.configs.datasets.ARC_c.ARC_c_gen import ARC_c_datasets # noqa: F401, F403
from opencompass.configs.datasets.longbench.longbench import longbench_datasets
# from opencompass.configs.datasets.leval.leval import leval_datasets
# from opencompass.configs.datasets.needlebench.needlebench_4k.needlebench_4k import needlebench_datasets
# from .summarizers.needlebench import needlebench_4k_summarizer as summarizer

datasets = longbench_datasets

api_meta_template = dict(
round=[
dict(role='HUMAN', api_role='HUMAN'),
dict(role='BOT', api_role='BOT', generate=True),
],
reserved_roles=[dict(role='SYSTEM', api_role='SYSTEM')],
)

models = [
dict(
abbr='mamba',
type=OpenAISDK,
key='EMPTY', # API key
openai_api_base='http://0.0.0.0:6606/v1', # 服务地址
path='mamba', # 请求服务时的 model name
tokenizer_path='/workspace/mnt/cm-nfx/model/Falcon3-Mamba-7B-Instruct', # 请求服务时的 tokenizer name 或 path, 为None时使用默认tokenizer gpt-4
rpm_verbose=True, # 是否打印请求速率
meta_template=api_meta_template, # 服务请求模板
query_per_second=1, # 服务请求速率
max_out_len=512, # 最大输出长度
max_seq_len=32768, # 最大输入长度
temperature=0.01, # 生成温度
batch_size=8, # 批处理大小
)
]

重现问题 - 错误信息

(swift3) root@pod-1325886428700045312:/workspace/mnt/cm-nfx/opencompass# python run.py configs/eval_chat_demo_nfx.py --debug
/workspace/mnt/cm-nfx/opencompass/opencompass/init.py:19: UserWarning: Starting from v0.4.0, all AMOTIC configuration files currently located in ./configs/datasets, ./configs/models, and ./configs/summarizers will be migrated to the opencompass/configs/ package. Please update your configuration file paths accordingly.
_warn_about_config_migration()
2025-01-15 16:19:14.242326: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-15 16:19:14.396754: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2025-01-15 16:19:14.396800: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2025-01-15 16:19:15.071027: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2025-01-15 16:19:15.071119: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2025-01-15 16:19:15.071127: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
01/15 16:19:16 - OpenCompass - INFO - Current exp folder: outputs/default/20250115_161916
01/15 16:19:17 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
01/15 16:19:17 - OpenCompass - INFO - Partitioned into 1 tasks.
01/15 16:19:19 - OpenCompass - INFO - Task [mamba/LongBench_2wikimqa,mamba/LongBench_hotpotqa,mamba/LongBench_musique,mamba/LongBench_multifieldqa_en,mamba/LongBench_multifieldqa_zh,mamba/LongBench_narrativeqa,mamba/LongBench_qasper,mamba/LongBench_triviaqa,mamba/LongBench_gov_report,mamba/LongBench_qmsum,mamba/LongBench_vcsum,mamba/LongBench_dureader,mamba/LongBench_lcc,mamba/LongBench_repobench-p,mamba/LongBench_passage_retrieval_en,mamba/LongBench_passage_retrieval_zh,mamba/LongBench_passage_count,mamba/LongBench_trec,mamba/LongBench_lsht,mamba/LongBench_multi_news,mamba/LongBench_samsum]
01/15 16:19:19 - OpenCompass - WARNING - Max Completion tokens for mamba is 16384
01/15 16:19:19 - OpenCompass - INFO - Try to load the data from /root/.cache/opencompass/./data/Longbench
01/15 16:19:20 - OpenCompass - INFO - Start inferencing [mamba/LongBench_2wikimqa]
01/15 16:19:20 - OpenCompass - WARNING - 'Could not automatically map /workspace/mnt/cm-nfx/model/Falcon3-Mamba-7B-Instruct to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect.', tiktoken encoding cannot load /workspace/mnt/cm-nfx/model/Falcon3-Mamba-7B-Instruct
01/15 16:19:20 - OpenCompass - INFO - Successfully load HF Tokenizer from /workspace/mnt/cm-nfx/model/Falcon3-Mamba-7B-Instruct
[2025-01-15 16:19:25,485] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2025-01-15 16:19:25,485] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
0%| | 0/25 [00:00<?, ?it/s]01/15 16:19:25 - OpenCompass - INFO - Current RPM 1.
01/15 16:19:25 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:25 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 7278 tokens (7246 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:26 - OpenCompass - INFO - Current RPM 2.
01/15 16:19:26 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:26 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 7428 tokens (7396 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:27 - OpenCompass - INFO - Current RPM 3.
01/15 16:19:27 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:27 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 6751 tokens (6719 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:28 - OpenCompass - INFO - Current RPM 4.
01/15 16:19:28 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:28 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 11352 tokens (11320 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:29 - OpenCompass - INFO - Current RPM 5.
01/15 16:19:29 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:29 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 7429 tokens (7397 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:30 - OpenCompass - INFO - Current RPM 6.
01/15 16:19:30 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:30 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 7825 tokens (7793 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:31 - OpenCompass - INFO - Current RPM 7.
01/15 16:19:31 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:31 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 7746 tokens (7714 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:32 - OpenCompass - INFO - Current RPM 8.
01/15 16:19:32 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:32 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 11416 tokens (11384 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:33 - OpenCompass - INFO - Current RPM 9.
01/15 16:19:33 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:33 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 7278 tokens (7246 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:34 - OpenCompass - INFO - Current RPM 10.
01/15 16:19:34 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:34 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 7428 tokens (7396 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
Inferencing: 0%| | 0/8 [00:09<?, ?it/s]
01/15 16:19:35 - OpenCompass - INFO - Current RPM 11.
01/15 16:19:35 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:35 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 6751 tokens (6719 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:36 - OpenCompass - INFO - Current RPM 12.
01/15 16:19:36 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:36 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 11352 tokens (11320 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:37 - OpenCompass - INFO - Current RPM 13.
01/15 16:19:37 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:37 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 7429 tokens (7397 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:38 - OpenCompass - INFO - Current RPM 14.
01/15 16:19:38 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:38 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 7825 tokens (7793 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:39 - OpenCompass - INFO - Current RPM 15.
01/15 16:19:39 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:39 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 7746 tokens (7714 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:19:40 - OpenCompass - INFO - Current RPM 16.
01/15 16:19:40 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:19:40 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 2048 tokens. However, you requested 11416 tokens (11384 in the messages, 32 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
0%| | 0/25 [00:15<?, ?it/s]
Traceback (most recent call last):
File "/workspace/mnt/cm-nfx/opencompass/run.py", line 4, in
main()
File "/workspace/mnt/cm-nfx/opencompass/opencompass/cli/main.py", line 308, in main
runner(tasks)
File "/workspace/mnt/cm-nfx/opencompass/opencompass/runners/base.py", line 38, in call
status = self.launch(tasks)
File "/workspace/mnt/cm-nfx/opencompass/opencompass/runners/local.py", line 128, in launch
task.run(cur_model=getattr(self, 'cur_model',
File "/workspace/mnt/cm-nfx/opencompass/opencompass/tasks/openicl_infer.py", line 89, in run
self._inference()
File "/workspace/mnt/cm-nfx/opencompass/opencompass/tasks/openicl_infer.py", line 134, in _inference
inferencer.inference(retriever,
File "/workspace/mnt/cm-nfx/opencompass/opencompass/openicl/icl_inferencer/icl_gen_inferencer.py", line 153, in inference
results = self.model.generate_from_template(
File "/workspace/mnt/cm-nfx/opencompass/opencompass/models/base.py", line 201, in generate_from_template
return self.generate(inputs, max_out_len=max_out_len, **kwargs)
File "/workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py", line 176, in generate
results = list(
File "/opt/conda/envs/swift3/lib/python3.10/site-packages/tqdm/std.py", line 1181, in iter
for obj in iterable:
File "/opt/conda/envs/swift3/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/opt/conda/envs/swift3/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/opt/conda/envs/swift3/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/opt/conda/envs/swift3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/opt/conda/envs/swift3/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py", line 655, in _generate
raise RuntimeError('Calling OpenAI API failed after retrying for '
RuntimeError: Calling OpenAI API failed after retrying for 2 times. Check the logs for details.

其他信息

应该如何设置?大海捞针也遇到了这个问题

@fxnie
Copy link
Author

fxnie commented Jan 15, 2025

python -m vllm.entrypoints.openai.api_server
--model /workspace/mnt/cm-nfx/model/Falcon3-Mamba-7B-Instruct
--served-model-name mamba
--host 0.0.0.0
--port 6606
--tensor-parallel-size 2
--max-model-len 32768 我已经将max-model-len设置这么大了,还是会报错。nferencing: 100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:08<00:00, 1.07s/it]
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [03:30<00:00, 8.41s/it]
01/15 16:51:16 - OpenCompass - INFO - Try to load the data from /root/.cache/opencompass/./data/Longbench
01/15 16:51:18 - OpenCompass - INFO - Start inferencing [mamba/LongBench_narrativeqa]
[2025-01-15 16:51:40,650] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2025-01-15 16:51:40,651] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
0%| | 0/25 [00:00<?, ?it/s]01/15 16:51:40 - OpenCompass - INFO - Current RPM 33.
01/15 16:51:40 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:51:40 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 33105 tokens (32977 in the messages, 128 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:51:41 - OpenCompass - INFO - Current RPM 34.
01/15 16:51:42 - OpenCompass - INFO - Current RPM 34.
01/15 16:51:43 - OpenCompass - INFO - Current RPM 34.
01/15 16:51:44 - OpenCompass - INFO - Current RPM 34.
01/15 16:51:44 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:51:44 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 38780 tokens (38652 in the messages, 128 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:51:45 - OpenCompass - INFO - Current RPM 34.
01/15 16:51:46 - OpenCompass - INFO - Current RPM 34.
01/15 16:51:46 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:51:46 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 33858 tokens (33730 in the messages, 128 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:51:47 - OpenCompass - INFO - Current RPM 34.
01/15 16:51:47 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:51:47 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 52873 tokens (52745 in the messages, 128 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:51:48 - OpenCompass - INFO - Current RPM 34.
01/15 16:51:48 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:51:48 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 33105 tokens (32977 in the messages, 128 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
Inferencing: 0%| | 0/8 [00:07<?, ?it/s]
01/15 16:51:49 - OpenCompass - INFO - Current RPM 34.
01/15 16:51:49 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 648 - error occurs at http://0.0.0.0:6606/v1
01/15 16:51:49 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 38780 tokens (38652 in the messages, 128 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
01/15 16:51:50 - OpenCompass - INFO - Current RPM 35.

@MaiziXiao
Copy link
Collaborator

01/15 16:51:47 - OpenCompass - ERROR - /workspace/mnt/cm-nfx/opencompass/opencompass/models/openai_api.py - _generate - 650 - Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 32768 tokens. However, you requested 52873 tokens (52745 in the messages, 128 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}

部分 Longbench 数据集可能会超过 32k

@fxnie
Copy link
Author

fxnie commented Jan 16, 2025

@MaiziXiao 能看下这个问题吗#1825

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants