Skip to content

pdkang/llamasail

Repository files navigation

llamasail

Finetune Llama 3 with Appian SAIL code (~12000 items) and run locally with a single Nvidia GPU with 24G VRAM.

Fine-tuning Technology:

QLoRA (4-bit quantization with LoRA), final GGUF model is around 16GB in size
Base model: Meta-Llama-3-8B-Instruct
LoRA config: rank=4, alpha=16, targeting attention layers
Training: 2 epochs, batch size=1, gradient accumulation=32

Merging & Conversion Process:

Merge: PEFT to combine LoRA weights with base model
Convert: llama.cpp tools to create GGUF format
Output: f16 precision GGUF file for local inference

Notable Features:

Memory-efficient with CPU merging and garbage collection
Custom dataset handler for structured code data
Final output compatible with LLMStudio and llama.cpp

About

Finetune Llama 3 with Appian SAIL code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages