update readme (#390)

alibaba · Nov 27, 2024 · 20c4a95 · 20c4a95
1 parent 11cf249
commit 20c4a95
Show file tree

Hide file tree

Showing 3 changed files with 10 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -3,6 +3,7 @@
 
 |             |                                                  Megatron-LM-Dense                                                   |                                                   Megatron-Core-Dense                                                   |                                                    Megatron-Core-MoE                                                     |                                                  MegaBlocks-MoE                                                   |
 |:------------|:--------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------:|
+| Qwen2-VL      |  N/A |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_vl/README.md#Megatron-Core模型训练流程)   |                                                           N/A                                                            | N/A
 | LLaVA       |  N/A |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llava_mcore/README.md#Megatron-Core模型训练流程)   |                                                           N/A                                                            | N/A
 | Qwen2.5     |  N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5/README.md#Megatron-Core-Dense模型训练流程)  |                                                           N/A                                                            | N/A
 | LLama3.1    |  N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3_1/README.md#Megatron-Core-Dense模型训练流程) |                                                           N/A                                                            | N/A
@@ -20,6 +21,7 @@ English | [简体中文](./README_zh-CN.md)
 Pai-Megatron-Patch (https://github.com/alibaba/Pai-Megatron-Patch) is a deep learning training toolkit built for developers to train and predict LLMs & VLMs by using Megatron framework easily. With the continuous development of LLMs, the model structure and scale are rapidly evolving. Although these models can be conveniently manufactured using Transformers or DeepSpeed training framework, the training efficiency is comparably low. This phenomenon becomes even severer when the model scale exceeds 10 billion. The primary objective of Pai-Megatron-Patch is to effectively utilize the computational power of GPUs for LLM. This tool allows convenient training of commonly used LLM with all the accelerating techniques provided by Megatron-LM.
 
 What's New:
+- **Support training Qwen2-VL models by using Megatron-Core.** [🔥🔥 2024.11.27]
 - **Support training LLaVA models by using Megatron-Core.** [🔥🔥 2024.11.20]
 - **Add llm auto configurator and apply per seq sft loss for qwen2/2.5 models.** [🔥🔥 2024.10.30]
 - **Upgrade deepseek-v2-moe models to support MLA via transformer engine and pipeline ckpts conversion.** [🔥🔥 2024.09.26]

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -2,6 +2,7 @@
 
 |             |                                                  Megatron-LM-Dense                                                   |                                                   Megatron-Core-Dense                                                   |                                                    Megatron-Core-MoE                                                     |                                                  MegaBlocks-MoE                                                   |
 |:------------|:--------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------:|
+| Qwen2-VL      |  N/A |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_vl/README.md#Megatron-Core模型训练流程)   |                                                           N/A                                                            | N/A
 | LLaVA       |  N/A |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llava_mcore/README.md#Megatron-Core模型训练流程)   |                                                           N/A                                                            | N/A
 | Qwen2.5     |  N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5/README.md#Megatron-Core-Dense模型训练流程)  |                                                           N/A                                                            | N/A
 | LLama3.1    |  N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3_1/README.md#Megatron-Core-Dense模型训练流程) |                                                           N/A                                                            | N/A
@@ -40,6 +41,7 @@ Pai-Megatron-Patch是各类开源大模型和Megatron训练加速引擎之间的
 - [阿里云PAI获得FewCLUE基于大模型的小样本学习双料冠军](https://developer.aliyun.com/article/788081?spm=a2c6h.12873639.article-detail.17.11c5383cHpFZks&tlog=yuekan_8)
 
 新功能：
+- **支持用Megatron-Core框架训练Qwen2-VL模型** [🔥🔥 2024.11.27]
 - **支持用Megatron-Core框架训练LLaVA模型** [🔥🔥 2024.11.20]
 - **添加大模型训练最优吞吐参数自动配置以及针对qwen2/2.5系列模型优化微调per seq sft loss.** [🔥🔥 2024.10.30]
 - **升级Deepseek-V2-MoE系列模型支持TE版的MLA以及流水并行CKPT转换** [🔥🔥 2024.09.26]

diff --git a/examples/qwen2_vl/README.md b/examples/qwen2_vl/README.md
@@ -77,9 +77,9 @@ cd /workspace/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen
 bash hf2mcore_qwen2_vl_convertor.sh \
 7B \
 /mnt/qwen2-vl-ckpts/Qwen2-VL-7B-Instruct \
-/mnt/qwen2-vl-ckpts/Qwen2-VL-7B-Instruct-tp1pp1 \
-1  \
-1  \
+/mnt/qwen2-vl-ckpts/Qwen2-VL-7B-Instruct-tp2pp2 \
+2  \
+2  \
 false \
 bf16
 ```
@@ -129,8 +129,8 @@ dsw  \
 1024  \
 1024  \
 bf16  \
-1   \
-1  \
+2   \
+2  \
 1 \
 true \
 true   \
@@ -139,7 +139,7 @@ false \
 100000  \
 /mnt/llava-datasets/LLaVA-Pretrain/wds   \
 /mnt/llava-datasets/LLaVA-Pretrain/wds   \
-/mnt/qwen2-vl-ckpts/Qwen2-VL-7B-Instruct-tp1pp1 \
+/mnt/qwen2-vl-ckpts/Qwen2-VL-7B-Instruct-tp2pp2 \
 20000  \
 200   \
 /workspace/output_mcore_qwen2vl_pretrain