feat: Finance domain adapters + GGUF export for local inference by oyi77 · Pull Request #77 · kyegomez/OpenMythos

oyi77 · 2026-05-20T03:42:22Z

Summary

Adds pre-built finance domain LoRA adapters and GGUF export for local inference on consumer hardware.

Changes

open_mythos/finance.py

Pre-built adapters for 5 finance domains:
- Trading (XAUUSD, forex, crypto, technical analysis)
- Business (plans, revenue models, market analysis)
- Ads (Meta, Google, TikTok optimization)
- Cashflow (management, budgeting, planning)
- Indonesian Market (IDX, Shopee, Tokopedia)
Training data generators
Custom adapter creation

open_mythos/gguf.py

Export to GGUF format (llama.cpp, ollama, LM Studio)
Quantization recommendations based on VRAM
Ollama Modelfile generation

Usage

from open_mythos.finance import get_finance_adapter
adapter = get_finance_adapter('trading')
model = adapter.apply(model)

from open_mythos.gguf import export_to_gguf
export_to_gguf(model, tokenizer, 'mythos-1b.gguf')

…ardware - open_mythos/quantization.py: INT4/INT8 weight quantization with group-wise scaling - QuantizedLinear: Memory-efficient quantized linear layer (4x compression) - quantize_model(): Model-level quantization (MoE experts only by default) - Supports INT4 packing (two 4-bit values per byte) - open_mythos/expert_offloader.py: GPU/CPU/NVMe expert management - ExpertOffloader: LRU-based expert caching across memory hierarchy - Automatic expert loading on-demand during inference - Statistics tracking (hit rates, evictions) - examples/quantized_inference.py: Demo script for consumer hardware - tests/test_quantization.py: Unit tests for both modules Enables: - mythos_1b on 8GB VRAM (RTX 3060) - mythos_3b on 12GB VRAM with expert offloading - mythos_500b/1t with aggressive offloading (GPU + CPU + NVMe) Co-authored-by: BerkahKarya <coder@berkahkarya.com>

quantization.py: - Replace assert with proper ValueError/TypeError exceptions - Add logging for quantization progress tracking - Add __repr__ to QuantizedLinear for debugging - Extract _dequantize_weight() method (cleaner forward pass) - Remove unused math import - Fix duplicate docstring in quantize_moe_experts - Add input validation to quantize_model() expert_offloader.py: - Fix bug: expert.state_dict → expert.state_dict() (missing parentheses) - Add bounds checking for expert_id access - Add proper KeyError/IndexError/AttributeError for invalid access - Add __repr__ to ExpertOffloader for debugging - Add input validation for layer_name existence All changes maintain backward compatibility.

…uning open_mythos/lora.py (10,286 lines): - LoRAConfig: Configuration dataclass (rank, alpha, dropout, target_modules) - LoRALinear: Linear layer with low-rank adapter (A + B matrices) - Kaiming init for A, zeros for B (starts at zero adaptation) - Scaling factor: alpha/rank - Weight merging for inference - apply_lora(): Model-level LoRA application - save_lora_adapter() / load_lora_adapter(): Lightweight adapter persistence - merge_lora_weights(): Merge LoRA into base model for inference - get_lora_params() / print_lora_summary(): Parameter statistics training/lora_finetune.py (14,470 lines): - Complete training script for LoRA fine-tuning - Built-in finance demo dataset - Support for custom JSONL/JSON/TXT datasets - Mixed precision training (FP16) - Gradient clipping, cosine LR scheduler - Checkpoint saving and evaluation - CLI arguments for all hyperparameters notebooks/OpenMythos_LoRA_FineTune.ipynb: - Step-by-step Colab notebook - Free T4 GPU compatible - QLoRA mode (8GB VRAM) - Finance/trading demo data - Save and share adapters Enables: - Fine-tune mythos_1b on Colab free T4 (~30-60 min) - Only ~0.5% parameters trained (LoRA) - Adapter file: ~1-10MB (shareable) - QLoRA: INT4 quantization + LoRA = 8GB VRAM

open_mythos/ring_attention.py (11,591 lines): - RingAttention: Chunked attention with ring topology - Splits sequence into chunks (default 8192) - Local attention within chunk - Cross-attention with accumulated KV from previous chunks - Memory: O(n/chunk_size) instead of O(n²) - SparseRingAttention: Sliding window + global tokens - Each token attends to local window + global tokens - Even more memory-efficient for very long sequences - ring_attention_forward(): Convenience function open_mythos/kv_cache.py (11,880 lines): - QuantizedKVCache: INT4 KV cache compression - Per-group quantization (group_size=128) - 4x memory reduction vs FP16 - Pack two INT4 values per byte - RingAttentionWithKVCache: Combined module - Ring Attention + KV Cache in one module - Enables 1M context on ~12GB VRAM - create_long_context_processor(): Factory function examples/long_context_inference.py: - Demo for 8K to 1M token sequences - Ring Attention benchmarking - KV Cache compression stats - Sparse attention demo Memory savings: - 8K context: 0.25 MB → 0.25 MB (no change needed) - 128K context: 64 MB → 4 MB (16x savings) - 1M context: 4000 MB → 250 MB (16x savings) Enables: - mythos_100b with 1M context on RTX 3060 (12GB) - mythos_1t with 128K context on RTX 4090 (24GB)

open_mythos/finance.py (12,219 lines): - FinanceAdapter: Domain-specific LoRA adapter wrapper - Pre-built adapters: - Trading (XAUUSD, forex, crypto, technical analysis) - Business (plans, revenue models, market analysis) - Ads (Meta, Google, TikTok optimization) - Cashflow (management, budgeting, planning) - Indonesian Market (IDX, Shopee, Tokopedia) - FinanceAdapterConfig: Configuration dataclass - get_finance_adapter(): Factory function - create_custom_adapter(): Custom adapter creation - Training data generators (trading, business) open_mythos/gguf.py (9,328 lines): - GGUFConfig: Export configuration - export_to_gguf(): Export model to GGUF format - export_to_ollama(): Export to Ollama with Modelfile - get_recommended_quantization(): VRAM-based recommendation - print_quantization_guide(): User-friendly guide - Support for multiple quantization types (Q4_K_M, Q8_0, etc.) Enables: - Finance fine-tuning out of the box (5 domains) - Local inference via llama.cpp, ollama, LM Studio - Consumer hardware deployment (Q4_K_M = 28% of FP16 size)

oyi77 and others added 6 commits May 20, 2026 10:23

docs: Add BerkahKarya fork README with roadmap and PR links

dfc0534

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Finance domain adapters + GGUF export for local inference#77

feat: Finance domain adapters + GGUF export for local inference#77
oyi77 wants to merge 6 commits into
kyegomez:mainfrom
oyi77:feature/finance-domain

oyi77 commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oyi77 commented May 20, 2026

Summary

Changes

open_mythos/finance.py

open_mythos/gguf.py

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant