[SUBMISSION] December 2024 Completion of Module 2 #153

ShankarChavan · 2024-12-31T06:11:26Z

December 2024 Student Submission

Describe what you've done in this PR:

What concepts did you learn?
I learned about the DPO and ORPO methods which I was not aware, also how can we apply these methods for llm alignment of base llm instead of using RLHF. Additionally, the dataset format required for this method usage is also critical.

In the process of learning I came to know about agrilla and distillabel tools which can be leveraged to prepare dpo datasets required for applying these methods.

Also, I have learned that DPO alignment method is not up to the mark and ORPO alignment method is much better and optimized then DPO but it requires non-sft based llm model to be used for alignment.
What changes or additions did you make?
I have tried using truthy-dpo-v0.1 dataset for DPO method and also added inference method call for newly trained smol-135M llm.

For ORPO I have used the same dataset ultrafeedback_binarized.

Any challenges you faced?
Yes, for ORPO with ultrafeedback_binarized dataset the alignment time was ~3 hours but yes, I'm happy it got completed.

List any notebooks you've added or modified:

Added new example in 2_preference_alignment/student_examples/ShankarChavan/dpo_finetuning_example.ipynb
Modified existing notebook with additional examples
Added documentation or comments

Add any questions you have or points you'd like to discuss:

I was finding it difficult which DPO dataset format must be used for e.g. between prompt|chosen|rejected (truthy_dpo) and chosen|rejected (ultrafeedback)?
How do we decide on this if you can show and point me to link or blog which can guide me based on base llm we are using for alignment?

Any other information that might be helpful for reviewers:

ShankarChavan and others added 5 commits December 23, 2024 17:58

modified using Colab

b452374

updated the dataset changes

c80745b

Updated the dpo_finetuning notebook

51645be

update folder

f8a0f3d

Updated the ORPO approach

38e1a92