Gathering some data

This is a leetcode solver project all from gathering the data to fine tuning it with GPT2.

Gathering some data

Used 2 kaggle datasets containing questions title along with their text and solution link. Combined them both in order to get non-overlapping problems and gather a larger dataset. https://www.kaggle.com/datasets/manthansolanki/leetcode-questions https://www.kaggle.com/datasets/manthansolanki/leetcode-questions

As for the solutions, I took advantage of a github repo containing most if not all solutions written in python3, this facilitates the jobs since I don't have to waste time and computational power for scraping it myself. https://github.com/cnkyrpsgl/leetcode/blob/master/README.md

After getting my inputs and outputs, I've cleaned up the data, kept the columns I needed and mapped the problems to code solutions for easy integration. Code in analyzing_data.ipynb Just like that, we have around 1k problems, 600 more than my previous attempt when using a repo with around 400 code solutions.

Fine tuning

A pretty straight forward 100 liner code was written in fine-tuner.py where we use Hugging Face's transformers library for the pre-trained model and PyTorch's helper functions for creating the dataloader, calculating the loss and optimizing the model. After a couple of training tries, the loss per epoch would decrease by around 0.2. After 10 epochs, the model was horrible but will try soon with GPUs for faster and more training. If this doesn't do it, might have to tweak the hyperparameters to better off the training.

Hopefully it doesn't suck later on.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
analyzing_data.ipynb		analyzing_data.ipynb
fine-tuner.py		fine-tuner.py
get_table_from_readme.py		get_table_from_readme.py
inference.py		inference.py
leetcode_ds1.csv		leetcode_ds1.csv
leetcode_ds2.csv		leetcode_ds2.csv
merged.csv		merged.csv
table_df.csv		table_df.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gathering some data

Fine tuning

About

Releases

Packages

Languages

alexhristov14/leetcoding-llm

Folders and files

Latest commit

History

Repository files navigation

Gathering some data

Fine tuning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages