support qwen2_5 long (#2982)

modelscope · Jan 24, 2025 · 10b98cb · 10b98cb
1 parent b4b2f4b
commit 10b98cb
Show file tree

Hide file tree

Showing 5 changed files with 67 additions and 6 deletions.
diff --git a/docs/source/Instruction/支持的模型和数据集.md b/docs/source/Instruction/支持的模型和数据集.md
@@ -100,6 +100,8 @@
 |[Qwen/Qwen2-Math-7B](https://modelscope.cn/models/Qwen/Qwen2-Math-7B)|qwen2|qwen|transformers>=4.37|math|[Qwen/Qwen2-Math-7B](https://huggingface.co/Qwen/Qwen2-Math-7B)|
 |[Qwen/Qwen2-Math-72B](https://modelscope.cn/models/Qwen/Qwen2-Math-72B)|qwen2|qwen|transformers>=4.37|math|[Qwen/Qwen2-Math-72B](https://huggingface.co/Qwen/Qwen2-Math-72B)|
 |[PowerInfer/SmallThinker-3B-Preview](https://modelscope.cn/models/PowerInfer/SmallThinker-3B-Preview)|qwen2|qwen|transformers>=4.37|-|[PowerInfer/SmallThinker-3B-Preview](https://huggingface.co/PowerInfer/SmallThinker-3B-Preview)|
+|[Qwen/Qwen2.5-7B-Instruct-1M](https://modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct-1M)|qwen2|qwen|transformers>=4.37|-|[Qwen/Qwen2.5-7B-Instruct-1M](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-1M)|
+|[Qwen/Qwen2.5-14B-Instruct-1M](https://modelscope.cn/models/Qwen/Qwen2.5-14B-Instruct-1M)|qwen2|qwen|transformers>=4.37|-|[Qwen/Qwen2.5-14B-Instruct-1M](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-1M)|
 |[Qwen/Qwen2.5-0.5B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-0.5B-Instruct)|qwen2_5|qwen2_5|transformers>=4.37|-|[Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)|
 |[Qwen/Qwen2.5-1.5B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-1.5B-Instruct)|qwen2_5|qwen2_5|transformers>=4.37|-|[Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)|
 |[Qwen/Qwen2.5-3B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-3B-Instruct)|qwen2_5|qwen2_5|transformers>=4.37|-|[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)|
@@ -480,6 +482,7 @@
 |[Shanghai_AI_Laboratory/internlm2-20b-reward](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-20b-reward)|internlm2_reward|internlm2_reward|transformers>=4.38|-|[internlm/internlm2-20b-reward](https://huggingface.co/internlm/internlm2-20b-reward)|
 |[Qwen/Qwen2-Math-RM-72B](https://modelscope.cn/models/Qwen/Qwen2-Math-RM-72B)|qwen2_reward|qwen|transformers>=4.37|-|[Qwen/Qwen2-Math-RM-72B](https://huggingface.co/Qwen/Qwen2-Math-RM-72B)|
 |[Qwen/Qwen2.5-Math-PRM-7B](https://modelscope.cn/models/Qwen/Qwen2.5-Math-PRM-7B)|qwen2_5_prm|qwen2_5_math_prm|transformers>=4.37|-|[Qwen/Qwen2.5-Math-PRM-7B](https://huggingface.co/Qwen/Qwen2.5-Math-PRM-7B)|
+|[Qwen/Qwen2.5-Math-7B-PRM800K](https://modelscope.cn/models/Qwen/Qwen2.5-Math-7B-PRM800K)|qwen2_5_prm|qwen2_5_math_prm|transformers>=4.37|-|[Qwen/Qwen2.5-Math-7B-PRM800K](https://huggingface.co/Qwen/Qwen2.5-Math-7B-PRM800K)|
 |[Qwen/Qwen2.5-Math-PRM-72B](https://modelscope.cn/models/Qwen/Qwen2.5-Math-PRM-72B)|qwen2_5_prm|qwen2_5_math_prm|transformers>=4.37|-|[Qwen/Qwen2.5-Math-PRM-72B](https://huggingface.co/Qwen/Qwen2.5-Math-PRM-72B)|
 |[Qwen/Qwen2.5-Math-RM-72B](https://modelscope.cn/models/Qwen/Qwen2.5-Math-RM-72B)|qwen2_5_math_reward|qwen2_5_math|transformers>=4.37|-|[Qwen/Qwen2.5-Math-RM-72B](https://huggingface.co/Qwen/Qwen2.5-Math-RM-72B)|
 |[AI-ModelScope/Skywork-Reward-Llama-3.1-8B](https://modelscope.cn/models/AI-ModelScope/Skywork-Reward-Llama-3.1-8B)|llama3_2_reward|llama3_2|transformers>=4.43|-|[Skywork/Skywork-Reward-Llama-3.1-8B](https://huggingface.co/Skywork/Skywork-Reward-Llama-3.1-8B)|

diff --git a/docs/source_en/Instruction/Supported-models-and-datasets.md b/docs/source_en/Instruction/Supported-models-and-datasets.md
@@ -100,6 +100,8 @@ The table below introduces the models integrated with ms-swift:
 |[Qwen/Qwen2-Math-7B](https://modelscope.cn/models/Qwen/Qwen2-Math-7B)|qwen2|qwen|transformers>=4.37|math|[Qwen/Qwen2-Math-7B](https://huggingface.co/Qwen/Qwen2-Math-7B)|
 |[Qwen/Qwen2-Math-72B](https://modelscope.cn/models/Qwen/Qwen2-Math-72B)|qwen2|qwen|transformers>=4.37|math|[Qwen/Qwen2-Math-72B](https://huggingface.co/Qwen/Qwen2-Math-72B)|
 |[PowerInfer/SmallThinker-3B-Preview](https://modelscope.cn/models/PowerInfer/SmallThinker-3B-Preview)|qwen2|qwen|transformers>=4.37|-|[PowerInfer/SmallThinker-3B-Preview](https://huggingface.co/PowerInfer/SmallThinker-3B-Preview)|
+|[Qwen/Qwen2.5-7B-Instruct-1M](https://modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct-1M)|qwen2|qwen|transformers>=4.37|-|[Qwen/Qwen2.5-7B-Instruct-1M](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-1M)|
+|[Qwen/Qwen2.5-14B-Instruct-1M](https://modelscope.cn/models/Qwen/Qwen2.5-14B-Instruct-1M)|qwen2|qwen|transformers>=4.37|-|[Qwen/Qwen2.5-14B-Instruct-1M](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-1M)|
 |[Qwen/Qwen2.5-0.5B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-0.5B-Instruct)|qwen2_5|qwen2_5|transformers>=4.37|-|[Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)|
 |[Qwen/Qwen2.5-1.5B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-1.5B-Instruct)|qwen2_5|qwen2_5|transformers>=4.37|-|[Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)|
 |[Qwen/Qwen2.5-3B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-3B-Instruct)|qwen2_5|qwen2_5|transformers>=4.37|-|[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)|
@@ -480,6 +482,7 @@ The table below introduces the models integrated with ms-swift:
 |[Shanghai_AI_Laboratory/internlm2-20b-reward](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-20b-reward)|internlm2_reward|internlm2_reward|transformers>=4.38|-|[internlm/internlm2-20b-reward](https://huggingface.co/internlm/internlm2-20b-reward)|
 |[Qwen/Qwen2-Math-RM-72B](https://modelscope.cn/models/Qwen/Qwen2-Math-RM-72B)|qwen2_reward|qwen|transformers>=4.37|-|[Qwen/Qwen2-Math-RM-72B](https://huggingface.co/Qwen/Qwen2-Math-RM-72B)|
 |[Qwen/Qwen2.5-Math-PRM-7B](https://modelscope.cn/models/Qwen/Qwen2.5-Math-PRM-7B)|qwen2_5_prm|qwen2_5_math_prm|transformers>=4.37|-|[Qwen/Qwen2.5-Math-PRM-7B](https://huggingface.co/Qwen/Qwen2.5-Math-PRM-7B)|
+|[Qwen/Qwen2.5-Math-7B-PRM800K](https://modelscope.cn/models/Qwen/Qwen2.5-Math-7B-PRM800K)|qwen2_5_prm|qwen2_5_math_prm|transformers>=4.37|-|[Qwen/Qwen2.5-Math-7B-PRM800K](https://huggingface.co/Qwen/Qwen2.5-Math-7B-PRM800K)|
 |[Qwen/Qwen2.5-Math-PRM-72B](https://modelscope.cn/models/Qwen/Qwen2.5-Math-PRM-72B)|qwen2_5_prm|qwen2_5_math_prm|transformers>=4.37|-|[Qwen/Qwen2.5-Math-PRM-72B](https://huggingface.co/Qwen/Qwen2.5-Math-PRM-72B)|
 |[Qwen/Qwen2.5-Math-RM-72B](https://modelscope.cn/models/Qwen/Qwen2.5-Math-RM-72B)|qwen2_5_math_reward|qwen2_5_math|transformers>=4.37|-|[Qwen/Qwen2.5-Math-RM-72B](https://huggingface.co/Qwen/Qwen2.5-Math-RM-72B)|
 |[AI-ModelScope/Skywork-Reward-Llama-3.1-8B](https://modelscope.cn/models/AI-ModelScope/Skywork-Reward-Llama-3.1-8B)|llama3_2_reward|llama3_2|transformers>=4.43|-|[Skywork/Skywork-Reward-Llama-3.1-8B](https://huggingface.co/Skywork/Skywork-Reward-Llama-3.1-8B)|

diff --git a/swift/llm/infer/protocol.py b/swift/llm/infer/protocol.py
@@ -49,7 +49,7 @@ class RequestConfig:
     top_p: Optional[float] = None
     repetition_penalty: Optional[float] = None
     num_beams: int = 1
-    stop: List[str] = field(default_factory=list)
+    stop: Optional[List[str]] = field(default_factory=list)
 
     seed: Optional[int] = None
     stream: bool = False

diff --git a/swift/llm/model/model/qwen.py b/swift/llm/model/model/qwen.py
@@ -336,7 +336,11 @@ def _get_cast_dtype(self) -> torch.dtype:
                     Model('Qwen/Qwen2-Math-72B', 'Qwen/Qwen2-Math-72B'),
                 ],
                 tags=['math']),
-            ModelGroup([Model('PowerInfer/SmallThinker-3B-Preview', 'PowerInfer/SmallThinker-3B-Preview')])
+            ModelGroup([Model('PowerInfer/SmallThinker-3B-Preview', 'PowerInfer/SmallThinker-3B-Preview')]),
+            ModelGroup([
+                Model('Qwen/Qwen2.5-7B-Instruct-1M', 'Qwen/Qwen2.5-7B-Instruct-1M'),
+                Model('Qwen/Qwen2.5-14B-Instruct-1M', 'Qwen/Qwen2.5-14B-Instruct-1M'),
+            ])
         ],
         TemplateType.qwen,
         get_model_tokenizer_with_flash_attn,
@@ -693,6 +697,7 @@ def update(self, key_states: torch.Tensor, value_states: torch.Tensor, layer_idx
         [
             ModelGroup([
                 Model('Qwen/Qwen2.5-Math-PRM-7B', 'Qwen/Qwen2.5-Math-PRM-7B'),
+                Model('Qwen/Qwen2.5-Math-7B-PRM800K', 'Qwen/Qwen2.5-Math-7B-PRM800K'),
                 Model('Qwen/Qwen2.5-Math-PRM-72B', 'Qwen/Qwen2.5-Math-PRM-72B'),
             ]),
         ],

diff --git a/tests/test_align/test_template/test_llm.py b/tests/test_align/test_template/test_llm.py
@@ -1,5 +1,6 @@
 import os
 
+import json
 import torch
 
 os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2,3'
@@ -29,10 +30,11 @@ def _infer_model(pt_engine, system=None, messages=None):
 
 
 def test_qwen2_5():
-    pt_engine = PtEngine('Qwen/Qwen2.5-3B')
-    _infer_model(pt_engine)
+    pt_engine = PtEngine('Qwen/Qwen2.5-7B-Instruct-1M')
+    response = _infer_model(pt_engine)
     pt_engine.default_template.template_backend = 'jinja'
-    _infer_model(pt_engine)
+    response2 = _infer_model(pt_engine)
+    assert response == response2
 
 
 def test_phi4():
@@ -265,6 +267,53 @@ def test_deepseek_r1_distill():
     assert res == res2, f'res: {res}, res2: {res2}'
 
 
+def test_qwen2_5_prm():
+    pt_engine = PtEngine('Qwen/Qwen2.5-Math-7B-PRM800K')
+    data = {
+        'system':
+        'Please reason step by step, and put your final answer within \\boxed{}.',
+        'query': ('Sue lives in a fun neighborhood.  One weekend, the neighbors decided to play a prank on Sue.  '
+                  "On Friday morning, the neighbors placed 18 pink plastic flamingos out on Sue's front yard.  "
+                  'On Saturday morning, the neighbors took back one third of the flamingos, painted them white, and '
+                  "put these newly painted white flamingos back out on Sue's front yard.  Then, on Sunday morning, "
+                  'they added another 18 pink plastic flamingos to the collection. At noon on Sunday, how many more '
+                  'pink plastic flamingos were out than white plastic flamingos?'),
+        'response':
+        [('To find out how many more pink plastic flamingos were out than white plastic flamingos at noon on Sunday, '
+          'we can break down the problem into steps. First, on Friday, the neighbors start with 18 pink '
+          'plastic flamingos.'),
+         ('On Saturday, they take back one third of the flamingos. Since there were 18 flamingos, (1/3 \\times 18 = 6) '
+          'flamingos are taken back. So, they have (18 - 6 = 12) flamingos left in their possession. Then, they paint '
+          "these 6 flamingos white and put them back out on Sue's front yard. Now, Sue has the original 12 pink "
+          'flamingos plus the 6 new white ones. Thus, by the end of Saturday, Sue has (12 + 6 = 18) pink flamingos '
+          'and 6 white flamingos.'),
+         ("On Sunday, the neighbors add another 18 pink plastic flamingos to Sue's front yard. By the end of Sunday "
+          'morning, Sue has (18 + 18 = 36) pink flamingos and still 6 white flamingos.'),
+         ('To find the difference, subtract the number of white flamingos from the number of pink '
+          'flamingos: (36 - 6 = 30). Therefore, at noon on Sunday, there were 30 more pink plastic flamingos out '
+          'than white plastic flamingos. The answer is (\\boxed{30}).')]
+    }
+
+    messages = [
+        {
+            'role': 'system',
+            'content': data['system']
+        },
+        {
+            'role': 'user',
+            'content': data['query']
+        },
+        {
+            'role': 'assistant',
+            'content': '<extra_0>'.join(data['response']) + '<extra_0>'
+        },
+    ]
+    res = _infer_model(pt_engine, messages=messages)
+    pt_engine.default_template.template_backend = 'jinja'
+    res2 = _infer_model(pt_engine, messages=messages)
+    assert res == res2 == json.dumps([0.9921875, 0.2490234375, 0.70703125, 0.9375]), f'res: {res}, res2: {res2}'
+
+
 if __name__ == '__main__':
     from swift.llm import PtEngine, RequestConfig, get_template, get_model_tokenizer
     from swift.utils import get_logger, seed_everything
@@ -292,4 +341,5 @@ def test_deepseek_r1_distill():
     # test_skywork_reward()
     # test_phi4()
     # test_internlm3()
-    test_deepseek_r1_distill()
+    # test_deepseek_r1_distill()
+    test_qwen2_5_prm()