llamasail

Finetune Llama 3 with Appian SAIL code (~12000 items) and run locally with a single Nvidia GPU with 24G VRAM.

Fine-tuning Technology:

QLoRA (4-bit quantization with LoRA), final GGUF model is around 16GB in size
Base model: Meta-Llama-3-8B-Instruct
LoRA config: rank=4, alpha=16, targeting attention layers
Training: 2 epochs, batch size=1, gradient accumulation=32

Merging & Conversion Process:

Merge: PEFT to combine LoRA weights with base model
Convert: llama.cpp tools to create GGUF format
Output: f16 precision GGUF file for local inference

Notable Features:

Memory-efficient with CPU merging and garbage collection
Custom dataset handler for structured code data
Final output compatible with LLMStudio and llama.cpp

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
interface-designer.png		interface-designer.png
merge_and_convert_sail.py		merge_and_convert_sail.py
sail-llama-finetune.py		sail-llama-finetune.py
sail-lllama-crtGrid.png		sail-lllama-crtGrid.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llamasail

About

Releases

Packages

Languages

License

pdkang/llamasail

Folders and files

Latest commit

History

Repository files navigation

llamasail

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages