Skip to content

Commit

Permalink
update readme (#390)
Browse files Browse the repository at this point in the history
  • Loading branch information
lostkevin authored Nov 27, 2024
1 parent 11cf249 commit 20c4a95
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 6 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

| | Megatron-LM-Dense | Megatron-Core-Dense | Megatron-Core-MoE | MegaBlocks-MoE |
|:------------|:--------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------:|
| Qwen2-VL | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_vl/README.md#Megatron-Core模型训练流程) | N/A | N/A
| LLaVA | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llava_mcore/README.md#Megatron-Core模型训练流程) | N/A | N/A
| Qwen2.5 | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5/README.md#Megatron-Core-Dense模型训练流程) | N/A | N/A
| LLama3.1 | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3_1/README.md#Megatron-Core-Dense模型训练流程) | N/A | N/A
Expand All @@ -20,6 +21,7 @@ English | [简体中文](./README_zh-CN.md)
Pai-Megatron-Patch (https://github.com/alibaba/Pai-Megatron-Patch) is a deep learning training toolkit built for developers to train and predict LLMs & VLMs by using Megatron framework easily. With the continuous development of LLMs, the model structure and scale are rapidly evolving. Although these models can be conveniently manufactured using Transformers or DeepSpeed training framework, the training efficiency is comparably low. This phenomenon becomes even severer when the model scale exceeds 10 billion. The primary objective of Pai-Megatron-Patch is to effectively utilize the computational power of GPUs for LLM. This tool allows convenient training of commonly used LLM with all the accelerating techniques provided by Megatron-LM.

What's New:
- **Support training Qwen2-VL models by using Megatron-Core.** [🔥🔥 2024.11.27]
- **Support training LLaVA models by using Megatron-Core.** [🔥🔥 2024.11.20]
- **Add llm auto configurator and apply per seq sft loss for qwen2/2.5 models.** [🔥🔥 2024.10.30]
- **Upgrade deepseek-v2-moe models to support MLA via transformer engine and pipeline ckpts conversion.** [🔥🔥 2024.09.26]
Expand Down
2 changes: 2 additions & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

| | Megatron-LM-Dense | Megatron-Core-Dense | Megatron-Core-MoE | MegaBlocks-MoE |
|:------------|:--------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------:|
| Qwen2-VL | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_vl/README.md#Megatron-Core模型训练流程) | N/A | N/A
| LLaVA | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llava_mcore/README.md#Megatron-Core模型训练流程) | N/A | N/A
| Qwen2.5 | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5/README.md#Megatron-Core-Dense模型训练流程) | N/A | N/A
| LLama3.1 | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3_1/README.md#Megatron-Core-Dense模型训练流程) | N/A | N/A
Expand Down Expand Up @@ -40,6 +41,7 @@ Pai-Megatron-Patch是各类开源大模型和Megatron训练加速引擎之间的
- [阿里云PAI获得FewCLUE基于大模型的小样本学习双料冠军](https://developer.aliyun.com/article/788081?spm=a2c6h.12873639.article-detail.17.11c5383cHpFZks&tlog=yuekan_8)

新功能:
- **支持用Megatron-Core框架训练Qwen2-VL模型** [🔥🔥 2024.11.27]
- **支持用Megatron-Core框架训练LLaVA模型** [🔥🔥 2024.11.20]
- **添加大模型训练最优吞吐参数自动配置以及针对qwen2/2.5系列模型优化微调per seq sft loss.** [🔥🔥 2024.10.30]
- **升级Deepseek-V2-MoE系列模型支持TE版的MLA以及流水并行CKPT转换** [🔥🔥 2024.09.26]
Expand Down
12 changes: 6 additions & 6 deletions examples/qwen2_vl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,9 @@ cd /workspace/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen
bash hf2mcore_qwen2_vl_convertor.sh \
7B \
/mnt/qwen2-vl-ckpts/Qwen2-VL-7B-Instruct \
/mnt/qwen2-vl-ckpts/Qwen2-VL-7B-Instruct-tp1pp1 \
1 \
1 \
/mnt/qwen2-vl-ckpts/Qwen2-VL-7B-Instruct-tp2pp2 \
2 \
2 \
false \
bf16
```
Expand Down Expand Up @@ -129,8 +129,8 @@ dsw \
1024 \
1024 \
bf16 \
1 \
1 \
2 \
2 \
1 \
true \
true \
Expand All @@ -139,7 +139,7 @@ false \
100000 \
/mnt/llava-datasets/LLaVA-Pretrain/wds \
/mnt/llava-datasets/LLaVA-Pretrain/wds \
/mnt/qwen2-vl-ckpts/Qwen2-VL-7B-Instruct-tp1pp1 \
/mnt/qwen2-vl-ckpts/Qwen2-VL-7B-Instruct-tp2pp2 \
20000 \
200 \
/workspace/output_mcore_qwen2vl_pretrain
Expand Down

0 comments on commit 20c4a95

Please sign in to comment.