Skip to content

feat: LoRA training pipeline + Colab notebook for free GPU fine-tuning#75

Open
oyi77 wants to merge 4 commits into
kyegomez:mainfrom
oyi77:feature/lora-training
Open

feat: LoRA training pipeline + Colab notebook for free GPU fine-tuning#75
oyi77 wants to merge 4 commits into
kyegomez:mainfrom
oyi77:feature/lora-training

Conversation

@oyi77
Copy link
Copy Markdown

@oyi77 oyi77 commented May 20, 2026

Summary

Adds LoRA (Low-Rank Adaptation) support for parameter-efficient fine-tuning of OpenMythos models. Includes a complete training pipeline and Colab notebook for free GPU training.

Changes

open_mythos/lora.py

  • LoRAConfig: Configuration (rank, alpha, dropout, target_modules)
  • LoRALinear: Linear layer with low-rank adapter (A + B matrices)
  • apply_lora(): Model-level LoRA application
  • save/load_lora_adapter(): Lightweight adapter persistence (~1-10MB)
  • merge_lora_weights(): Merge LoRA into base model for inference

training/lora_finetune.py

  • Complete CLI training script
  • Built-in finance demo dataset
  • Mixed precision (FP16), gradient clipping, cosine LR scheduler
  • Custom dataset support (JSONL/JSON/TXT)

notebooks/OpenMythos_LoRA_FineTune.ipynb

  • Step-by-step Colab notebook (free T4 GPU)
  • QLoRA mode for 8GB VRAM
  • Finance/trading demo data

Usage

from open_mythos import OpenMythos, mythos_1b
from open_mythos.lora import LoRAConfig, apply_lora, save_lora_adapter

model = OpenMythos(mythos_1b())
model = apply_lora(model, LoRAConfig(rank=16, alpha=32))
# Train...
save_lora_adapter(model, 'my_adapter.pt')

CLI

# Standard LoRA (16GB VRAM)
python training/lora_finetune.py --variant 1b --dataset finance

# QLoRA (8GB VRAM, fits Colab free T4)
python training/lora_finetune.py --variant 1b --dataset finance --qlora

Key Features

  • Only ~0.5% parameters trained (LoRA)
  • Adapter file: ~1-10MB (shareable)
  • QLoRA: INT4 + LoRA = 8GB VRAM
  • Free GPU compatible (Colab T4, Kaggle)

oyi77 and others added 4 commits May 20, 2026 10:23
…ardware

- open_mythos/quantization.py: INT4/INT8 weight quantization with group-wise scaling
  - QuantizedLinear: Memory-efficient quantized linear layer (4x compression)
  - quantize_model(): Model-level quantization (MoE experts only by default)
  - Supports INT4 packing (two 4-bit values per byte)

- open_mythos/expert_offloader.py: GPU/CPU/NVMe expert management
  - ExpertOffloader: LRU-based expert caching across memory hierarchy
  - Automatic expert loading on-demand during inference
  - Statistics tracking (hit rates, evictions)

- examples/quantized_inference.py: Demo script for consumer hardware
- tests/test_quantization.py: Unit tests for both modules

Enables:
- mythos_1b on 8GB VRAM (RTX 3060)
- mythos_3b on 12GB VRAM with expert offloading
- mythos_500b/1t with aggressive offloading (GPU + CPU + NVMe)

Co-authored-by: BerkahKarya <coder@berkahkarya.com>
quantization.py:
- Replace assert with proper ValueError/TypeError exceptions
- Add logging for quantization progress tracking
- Add __repr__ to QuantizedLinear for debugging
- Extract _dequantize_weight() method (cleaner forward pass)
- Remove unused math import
- Fix duplicate docstring in quantize_moe_experts
- Add input validation to quantize_model()

expert_offloader.py:
- Fix bug: expert.state_dict → expert.state_dict() (missing parentheses)
- Add bounds checking for expert_id access
- Add proper KeyError/IndexError/AttributeError for invalid access
- Add __repr__ to ExpertOffloader for debugging
- Add input validation for layer_name existence

All changes maintain backward compatibility.
…uning

open_mythos/lora.py (10,286 lines):
- LoRAConfig: Configuration dataclass (rank, alpha, dropout, target_modules)
- LoRALinear: Linear layer with low-rank adapter (A + B matrices)
  - Kaiming init for A, zeros for B (starts at zero adaptation)
  - Scaling factor: alpha/rank
  - Weight merging for inference
- apply_lora(): Model-level LoRA application
- save_lora_adapter() / load_lora_adapter(): Lightweight adapter persistence
- merge_lora_weights(): Merge LoRA into base model for inference
- get_lora_params() / print_lora_summary(): Parameter statistics

training/lora_finetune.py (14,470 lines):
- Complete training script for LoRA fine-tuning
- Built-in finance demo dataset
- Support for custom JSONL/JSON/TXT datasets
- Mixed precision training (FP16)
- Gradient clipping, cosine LR scheduler
- Checkpoint saving and evaluation
- CLI arguments for all hyperparameters

notebooks/OpenMythos_LoRA_FineTune.ipynb:
- Step-by-step Colab notebook
- Free T4 GPU compatible
- QLoRA mode (8GB VRAM)
- Finance/trading demo data
- Save and share adapters

Enables:
- Fine-tune mythos_1b on Colab free T4 (~30-60 min)
- Only ~0.5% parameters trained (LoRA)
- Adapter file: ~1-10MB (shareable)
- QLoRA: INT4 quantization + LoRA = 8GB VRAM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant