Skip to content

Commit

Permalink
support Chinese-LLaMA-Alpaca-2 series models (#763)
Browse files Browse the repository at this point in the history
  • Loading branch information
hjh0119 authored Apr 22, 2024
1 parent 1de3579 commit c1d4dc2
Show file tree
Hide file tree
Showing 5 changed files with 137 additions and 1 deletion.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ To facilitate use by users unfamiliar with deep learning, we provide a Gradio we
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.

## 🎉 News
- 2024.04.22: Support for inference, fine-tuning, and deployment of **chinese-llama-alpaca-2** series models. This includes:chinese-llama-2-1.3b, chinese-llama-2-7b, chinese-llama-2-13b, chinese-alpaca-2-1.3b, chinese-alpaca-2-7b and chinese-alpaca-2-13b along with their corresponding 16k and 64k long text versions.
- 2024.04.22: Support for inference and fine-tuning of Llama3 GPTQ-Int4, GPTQ-Int8, and AWQ series models. Support for inference and fine-tuning of chatglm3-6b-128k, Openbuddy-Llama3.
- 2024.04.20: Support for inference, fine-tuning, and deployment of **Atom** series models. This includes: Atom-7B and Atom-7B-Chat. use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/atom_7b_chat/lora/sft.sh) to train.
- 2024.04.19: Support for single-card, DDP, ZeRO2, and ZeRO3 training and inference with NPU, please refer to [NPU Inference and Fine-tuning Best Practices](docs/source_en/LLM/NPU-best-practice.md).
Expand Down Expand Up @@ -469,6 +470,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
| c4ai-command-r | [c4ai](https://cohere.com/command) | Multilingual | 35B-104B | chat model |
| WizardLM2 | [WizardLM2 series models](https://github.com/nlpxucan/WizardLM) | English | 7B-8x22B<br>including quantized versions | chat model<br>MoE model |
| Atom | [Atom](https://github.com/LlamaFamily/Llama-Chinese) | Chinese | 7B| base model<br>chat model|
| Chinese-LLaMA-Alpaca-2 | [Chinese-LLaMA-Alpaca-2](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | Chinese | 1.3B-13B| base model<br>chat model<br>long text model|

#### MLLMs

Expand Down
2 changes: 2 additions & 0 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
此外,我们也在拓展其他模态的能力,目前我们支持了AnimateDiff的全参数训练和LoRA训练。

## 🎉 新闻
- 2024.04.22: 支持**chinese-llama-alpaca-2**系列模型的推理与微调和部署等. 包括:chinese-llama-2-1.3b, chinese-llama-2-7b, chinese-llama-2-13b, chinese-alpaca-2-1.3b, chinese-alpaca-2-7b和chinese-alpaca-2-13b以及对应的16k和64k长文本模型.
- 2024.04.22: 支持Llama3 GPTQ-Int4, GPTQ-Int8, AWQ系列模型的推理与微调. 支持chatglm3-6b-128k, Openbuddy-llama3的推理与微调.
- 2024.04.20: 支持**Atom**系列模型的推理, 微调和部署等. 包括: Atom-7B and Atom-7B-Chat. 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/atom_7b_chat/lora/sft.sh)来开始训练!
- 2024.04.19: 支持NPU的单卡、DDP、ZeRO2和ZeRO3的训练与推理, 可以查看[NPU推理与微调最佳实践](docs/source/LLM/NPU推理与微调最佳实践.md).
Expand Down Expand Up @@ -466,6 +467,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
| c4ai-command-r | [c4ai](https://cohere.com/command) | 多语种 | 35B-104B | chat模型 |
| WizardLM2 | [WizardLM2系列模型](https://github.com/nlpxucan/WizardLM) | 多语种 | 7B-8x22B<br>包含量化版本 | chat模型<br>MoE模型 |
| Atom | [Atom](https://github.com/LlamaFamily/Llama-Chinese) | 中文 | 7B| base模型<br>chat模型|
| Chinese-LLaMA-Alpaca-2 | [Chinese-LLaMA-Alpaca-2](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | 中文 | 1.3B-13B| base模型<br>chat模型<br>长文本模型|


#### 多模态大模型
Expand Down
12 changes: 12 additions & 0 deletions docs/source/LLM/支持的模型和数据集.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,18 @@
|llama3-70b-instruct-int4|[huangjintao/Meta-Llama-3-70B-Instruct-GPTQ-Int4](https://modelscope.cn/models/huangjintao/Meta-Llama-3-70B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|auto_gptq|-|-|
|llama3-70b-instruct-int8|[huangjintao/Meta-Llama-3-70b-Instruct-GPTQ-Int8](https://modelscope.cn/models/huangjintao/Meta-Llama-3-70b-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|auto_gptq|-|-|
|llama3-70b-instruct-awq|[huangjintao/Meta-Llama-3-70B-Instruct-AWQ](https://modelscope.cn/models/huangjintao/Meta-Llama-3-70B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|autoawq|-|-|
|chinese-llama-2-1_3b|[AI-ModelScope/chinese-llama-2-1.3b](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-1.3b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-1.3b](https://huggingface.co/hfl/chinese-llama-2-1.3b)|
|chinese-llama-2-7b|[AI-ModelScope/chinese-llama-2-7b](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-7b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-7b](https://huggingface.co/hfl/chinese-llama-2-7b)|
|chinese-llama-2-7b-16k|[AI-ModelScope/chinese-llama-2-7b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-7b-16k/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-7b-16k](https://huggingface.co/hfl/chinese-llama-2-7b-16k)|
|chinese-llama-2-7b-64k|[AI-ModelScope/chinese-llama-2-7b-64k](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-7b-64k/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-7b-64k](https://huggingface.co/hfl/chinese-llama-2-7b-64k)|
|chinese-llama-2-13b|[AI-ModelScope/chinese-llama-2-13b](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-13b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-13b](https://huggingface.co/hfl/chinese-llama-2-13b)|
|chinese-llama-2-13b-16k|[AI-ModelScope/chinese-llama-2-13b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-13b-16k/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-13b-16k](https://huggingface.co/hfl/chinese-llama-2-13b-16k)|
|chinese-alpaca-2-1_3b|[AI-ModelScope/chinese-alpaca-2-1.3b](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-1.3b/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-1.3b](https://huggingface.co/hfl/chinese-alpaca-2-1.3b)|
|chinese-alpaca-2-7b|[AI-ModelScope/chinese-alpaca-2-7b](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-7b/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-7b](https://huggingface.co/hfl/chinese-alpaca-2-7b)|
|chinese-alpaca-2-7b-16k|[AI-ModelScope/chinese-alpaca-2-7b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-7b-16k/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-7b-16k](https://huggingface.co/hfl/chinese-alpaca-2-7b-16k)|
|chinese-alpaca-2-7b-64k|[AI-ModelScope/chinese-alpaca-2-7b-64k](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-7b-64k/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-7b-64k](https://huggingface.co/hfl/chinese-alpaca-2-7b-64k)|
|chinese-alpaca-2-13b|[AI-ModelScope/chinese-alpaca-2-13b](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-13b/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-13b](https://huggingface.co/hfl/chinese-alpaca-2-13b)|
|chinese-alpaca-2-13b-16k|[AI-ModelScope/chinese-alpaca-2-13b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-13b-16k/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-13b-16k](https://huggingface.co/hfl/chinese-alpaca-2-13b-16k)|
|atom-7b|[FlagAlpha/Atom-7B](https://modelscope.cn/models/FlagAlpha/Atom-7B/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;||-|[FlagAlpha/Atom-7B](https://huggingface.co/FlagAlpha/Atom-7B)|
|atom-7b-chat|[FlagAlpha/Atom-7B-Chat](https://modelscope.cn/models/FlagAlpha/Atom-7B-Chat/summary)|q_proj, k_proj, v_proj|atom|&#x2714;|&#x2714;||-|[FlagAlpha/Atom-7B-Chat](https://huggingface.co/FlagAlpha/Atom-7B-Chat)|
|llava1d6-mistral-7b-instruct|[AI-ModelScope/llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary)|q_proj, k_proj, v_proj|llava-mistral-instruct|&#x2714;|&#x2718;|transformers>=4.34|multi-modal, vision|[liuhaotian/llava-v1.6-mistral-7b](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b)|
Expand Down
12 changes: 12 additions & 0 deletions docs/source_en/LLM/Supported-models-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,18 @@ The table below introcudes all models supported by SWIFT:
|llama3-70b-instruct-int4|[huangjintao/Meta-Llama-3-70B-Instruct-GPTQ-Int4](https://modelscope.cn/models/huangjintao/Meta-Llama-3-70B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|auto_gptq|-|-|
|llama3-70b-instruct-int8|[huangjintao/Meta-Llama-3-70b-Instruct-GPTQ-Int8](https://modelscope.cn/models/huangjintao/Meta-Llama-3-70b-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|auto_gptq|-|-|
|llama3-70b-instruct-awq|[huangjintao/Meta-Llama-3-70B-Instruct-AWQ](https://modelscope.cn/models/huangjintao/Meta-Llama-3-70B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|autoawq|-|-|
|chinese-llama-2-1_3b|[AI-ModelScope/chinese-llama-2-1.3b](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-1.3b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-1.3b](https://huggingface.co/hfl/chinese-llama-2-1.3b)|
|chinese-llama-2-7b|[AI-ModelScope/chinese-llama-2-7b](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-7b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-7b](https://huggingface.co/hfl/chinese-llama-2-7b)|
|chinese-llama-2-7b-16k|[AI-ModelScope/chinese-llama-2-7b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-7b-16k/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-7b-16k](https://huggingface.co/hfl/chinese-llama-2-7b-16k)|
|chinese-llama-2-7b-64k|[AI-ModelScope/chinese-llama-2-7b-64k](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-7b-64k/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-7b-64k](https://huggingface.co/hfl/chinese-llama-2-7b-64k)|
|chinese-llama-2-13b|[AI-ModelScope/chinese-llama-2-13b](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-13b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-13b](https://huggingface.co/hfl/chinese-llama-2-13b)|
|chinese-llama-2-13b-16k|[AI-ModelScope/chinese-llama-2-13b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-13b-16k/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-13b-16k](https://huggingface.co/hfl/chinese-llama-2-13b-16k)|
|chinese-alpaca-2-1_3b|[AI-ModelScope/chinese-alpaca-2-1.3b](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-1.3b/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-1.3b](https://huggingface.co/hfl/chinese-alpaca-2-1.3b)|
|chinese-alpaca-2-7b|[AI-ModelScope/chinese-alpaca-2-7b](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-7b/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-7b](https://huggingface.co/hfl/chinese-alpaca-2-7b)|
|chinese-alpaca-2-7b-16k|[AI-ModelScope/chinese-alpaca-2-7b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-7b-16k/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-7b-16k](https://huggingface.co/hfl/chinese-alpaca-2-7b-16k)|
|chinese-alpaca-2-7b-64k|[AI-ModelScope/chinese-alpaca-2-7b-64k](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-7b-64k/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-7b-64k](https://huggingface.co/hfl/chinese-alpaca-2-7b-64k)|
|chinese-alpaca-2-13b|[AI-ModelScope/chinese-alpaca-2-13b](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-13b/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-13b](https://huggingface.co/hfl/chinese-alpaca-2-13b)|
|chinese-alpaca-2-13b-16k|[AI-ModelScope/chinese-alpaca-2-13b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-13b-16k/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-13b-16k](https://huggingface.co/hfl/chinese-alpaca-2-13b-16k)|
|atom-7b|[FlagAlpha/Atom-7B](https://modelscope.cn/models/FlagAlpha/Atom-7B/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;||-|[FlagAlpha/Atom-7B](https://huggingface.co/FlagAlpha/Atom-7B)|
|atom-7b-chat|[FlagAlpha/Atom-7B-Chat](https://modelscope.cn/models/FlagAlpha/Atom-7B-Chat/summary)|q_proj, k_proj, v_proj|atom|&#x2714;|&#x2714;||-|[FlagAlpha/Atom-7B-Chat](https://huggingface.co/FlagAlpha/Atom-7B-Chat)|
|llava1d6-mistral-7b-instruct|[AI-ModelScope/llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary)|q_proj, k_proj, v_proj|llava-mistral-instruct|&#x2714;|&#x2718;|transformers>=4.34|multi-modal, vision|[liuhaotian/llava-v1.6-mistral-7b](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b)|
Expand Down
110 changes: 109 additions & 1 deletion swift/llm/utils/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,19 @@ class ModelType:
llama3_70b_instruct_int4 = 'llama3-70b-instruct-int4'
llama3_70b_instruct_int8 = 'llama3-70b-instruct-int8'
llama3_70b_instruct_awq = 'llama3-70b-instruct-awq'

# chinese-llama-alpaca-2
chinese_llama_2_1_3b = 'chinese-llama-2-1_3b'
chinese_llama_2_7b = 'chinese-llama-2-7b'
chinese_llama_2_7b_16k = 'chinese-llama-2-7b-16k'
chinese_llama_2_7b_64k = 'chinese-llama-2-7b-64k'
chinese_llama_2_13b = 'chinese-llama-2-13b'
chinese_llama_2_13b_16k = 'chinese-llama-2-13b-16k'
chinese_alpaca_2_1_3b = 'chinese-alpaca-2-1_3b'
chinese_alpaca_2_7b = 'chinese-alpaca-2-7b'
chinese_alpaca_2_7b_16k = 'chinese-alpaca-2-7b-16k'
chinese_alpaca_2_7b_64k = 'chinese-alpaca-2-7b-64k'
chinese_alpaca_2_13b = 'chinese-alpaca-2-13b'
chinese_alpaca_2_13b_16k = 'chinese-alpaca-2-13b-16k'
# atom
atom_7b = 'atom-7b'
atom_7b_chat = 'atom-7b-chat'
Expand Down Expand Up @@ -641,6 +653,102 @@ def _new_forward(self, x):
support_vllm=False,
support_flash_attn=True,
hf_model_id='CohereForAI/c4ai-command-r-plus')
@register_model(
ModelType.chinese_llama_2_1_3b,
'AI-ModelScope/chinese-llama-2-1.3b',
LoRATM.llama2,
TemplateType.default_generation,
support_vllm=True,
support_flash_attn=True,
hf_model_id='hfl/chinese-llama-2-1.3b')
@register_model(
ModelType.chinese_llama_2_7b,
'AI-ModelScope/chinese-llama-2-7b',
LoRATM.llama2,
TemplateType.default_generation,
support_vllm=True,
support_flash_attn=True,
hf_model_id='hfl/chinese-llama-2-7b')
@register_model(
ModelType.chinese_llama_2_7b_16k,
'AI-ModelScope/chinese-llama-2-7b-16k',
LoRATM.llama2,
TemplateType.default_generation,
support_vllm=True,
support_flash_attn=True,
hf_model_id='hfl/chinese-llama-2-7b-16k')
@register_model(
ModelType.chinese_llama_2_7b_64k,
'AI-ModelScope/chinese-llama-2-7b-64k',
LoRATM.llama2,
TemplateType.default_generation,
support_vllm=True,
support_flash_attn=True,
hf_model_id='hfl/chinese-llama-2-7b-64k')
@register_model(
ModelType.chinese_llama_2_13b,
'AI-ModelScope/chinese-llama-2-13b',
LoRATM.llama2,
TemplateType.default_generation,
support_vllm=True,
support_flash_attn=True,
hf_model_id='hfl/chinese-llama-2-13b')
@register_model(
ModelType.chinese_llama_2_13b_16k,
'AI-ModelScope/chinese-llama-2-13b-16k',
LoRATM.llama2,
TemplateType.default_generation,
support_vllm=True,
support_flash_attn=True,
hf_model_id='hfl/chinese-llama-2-13b-16k')
@register_model(
ModelType.chinese_alpaca_2_1_3b,
'AI-ModelScope/chinese-alpaca-2-1.3b',
LoRATM.llama2,
TemplateType.llama,
support_vllm=True,
support_flash_attn=True,
hf_model_id='hfl/chinese-alpaca-2-1.3b')
@register_model(
ModelType.chinese_alpaca_2_7b,
'AI-ModelScope/chinese-alpaca-2-7b',
LoRATM.llama2,
TemplateType.llama,
support_vllm=True,
support_flash_attn=True,
hf_model_id='hfl/chinese-alpaca-2-7b')
@register_model(
ModelType.chinese_alpaca_2_7b_16k,
'AI-ModelScope/chinese-alpaca-2-7b-16k',
LoRATM.llama2,
TemplateType.llama,
support_vllm=True,
support_flash_attn=True,
hf_model_id='hfl/chinese-alpaca-2-7b-16k')
@register_model(
ModelType.chinese_alpaca_2_7b_64k,
'AI-ModelScope/chinese-alpaca-2-7b-64k',
LoRATM.llama2,
TemplateType.llama,
support_vllm=True,
support_flash_attn=True,
hf_model_id='hfl/chinese-alpaca-2-7b-64k')
@register_model(
ModelType.chinese_alpaca_2_13b,
'AI-ModelScope/chinese-alpaca-2-13b',
LoRATM.llama2,
TemplateType.llama,
support_vllm=True,
support_flash_attn=True,
hf_model_id='hfl/chinese-alpaca-2-13b')
@register_model(
ModelType.chinese_alpaca_2_13b_16k,
'AI-ModelScope/chinese-alpaca-2-13b-16k',
LoRATM.llama2,
TemplateType.llama,
support_vllm=True,
support_flash_attn=True,
hf_model_id='hfl/chinese-alpaca-2-13b-16k')
def get_model_tokenizer_from_repo(model_dir: str,
torch_dtype: Optional[Dtype],
model_kwargs: Dict[str, Any],
Expand Down

0 comments on commit c1d4dc2

Please sign in to comment.