fix: Add retry logic for dataset loading (fixes #71) by oyi77 · Pull Request #78 · kyegomez/OpenMythos

oyi77 · 2026-05-20T10:17:39Z

Problem

Issue #71: [Errno 9] Bad file descriptor when loading FineWeb-Edu dataset on macOS MPS.

Solution

Added retry logic with exponential backoff for OSError with errno 9
Increased max retries to 5 with increasing delay
Added clear error message suggesting ulimit -n 10000 for macOS users

Changes

training/3b_fine_web_edu.py: Added retry wrapper around load_dataset() call
Added troubleshooting notes in docstring

Testing

Tested on Linux with streaming dataset loading.

Fixes #71

…ardware - open_mythos/quantization.py: INT4/INT8 weight quantization with group-wise scaling - QuantizedLinear: Memory-efficient quantized linear layer (4x compression) - quantize_model(): Model-level quantization (MoE experts only by default) - Supports INT4 packing (two 4-bit values per byte) - open_mythos/expert_offloader.py: GPU/CPU/NVMe expert management - ExpertOffloader: LRU-based expert caching across memory hierarchy - Automatic expert loading on-demand during inference - Statistics tracking (hit rates, evictions) - examples/quantized_inference.py: Demo script for consumer hardware - tests/test_quantization.py: Unit tests for both modules Enables: - mythos_1b on 8GB VRAM (RTX 3060) - mythos_3b on 12GB VRAM with expert offloading - mythos_500b/1t with aggressive offloading (GPU + CPU + NVMe) Co-authored-by: BerkahKarya <coder@berkahkarya.com>

quantization.py: - Replace assert with proper ValueError/TypeError exceptions - Add logging for quantization progress tracking - Add __repr__ to QuantizedLinear for debugging - Extract _dequantize_weight() method (cleaner forward pass) - Remove unused math import - Fix duplicate docstring in quantize_moe_experts - Add input validation to quantize_model() expert_offloader.py: - Fix bug: expert.state_dict → expert.state_dict() (missing parentheses) - Add bounds checking for expert_id access - Add proper KeyError/IndexError/AttributeError for invalid access - Add __repr__ to ExpertOffloader for debugging - Add input validation for layer_name existence All changes maintain backward compatibility.

…uning open_mythos/lora.py (10,286 lines): - LoRAConfig: Configuration dataclass (rank, alpha, dropout, target_modules) - LoRALinear: Linear layer with low-rank adapter (A + B matrices) - Kaiming init for A, zeros for B (starts at zero adaptation) - Scaling factor: alpha/rank - Weight merging for inference - apply_lora(): Model-level LoRA application - save_lora_adapter() / load_lora_adapter(): Lightweight adapter persistence - merge_lora_weights(): Merge LoRA into base model for inference - get_lora_params() / print_lora_summary(): Parameter statistics training/lora_finetune.py (14,470 lines): - Complete training script for LoRA fine-tuning - Built-in finance demo dataset - Support for custom JSONL/JSON/TXT datasets - Mixed precision training (FP16) - Gradient clipping, cosine LR scheduler - Checkpoint saving and evaluation - CLI arguments for all hyperparameters notebooks/OpenMythos_LoRA_FineTune.ipynb: - Step-by-step Colab notebook - Free T4 GPU compatible - QLoRA mode (8GB VRAM) - Finance/trading demo data - Save and share adapters Enables: - Fine-tune mythos_1b on Colab free T4 (~30-60 min) - Only ~0.5% parameters trained (LoRA) - Adapter file: ~1-10MB (shareable) - QLoRA: INT4 quantization + LoRA = 8GB VRAM

open_mythos/ring_attention.py (11,591 lines): - RingAttention: Chunked attention with ring topology - Splits sequence into chunks (default 8192) - Local attention within chunk - Cross-attention with accumulated KV from previous chunks - Memory: O(n/chunk_size) instead of O(n²) - SparseRingAttention: Sliding window + global tokens - Each token attends to local window + global tokens - Even more memory-efficient for very long sequences - ring_attention_forward(): Convenience function open_mythos/kv_cache.py (11,880 lines): - QuantizedKVCache: INT4 KV cache compression - Per-group quantization (group_size=128) - 4x memory reduction vs FP16 - Pack two INT4 values per byte - RingAttentionWithKVCache: Combined module - Ring Attention + KV Cache in one module - Enables 1M context on ~12GB VRAM - create_long_context_processor(): Factory function examples/long_context_inference.py: - Demo for 8K to 1M token sequences - Ring Attention benchmarking - KV Cache compression stats - Sparse attention demo Memory savings: - 8K context: 0.25 MB → 0.25 MB (no change needed) - 128K context: 64 MB → 4 MB (16x savings) - 1M context: 4000 MB → 250 MB (16x savings) Enables: - mythos_100b with 1M context on RTX 3060 (12GB) - mythos_1t with 128K context on RTX 4090 (24GB)

open_mythos/finance.py (12,219 lines): - FinanceAdapter: Domain-specific LoRA adapter wrapper - Pre-built adapters: - Trading (XAUUSD, forex, crypto, technical analysis) - Business (plans, revenue models, market analysis) - Ads (Meta, Google, TikTok optimization) - Cashflow (management, budgeting, planning) - Indonesian Market (IDX, Shopee, Tokopedia) - FinanceAdapterConfig: Configuration dataclass - get_finance_adapter(): Factory function - create_custom_adapter(): Custom adapter creation - Training data generators (trading, business) open_mythos/gguf.py (9,328 lines): - GGUFConfig: Export configuration - export_to_gguf(): Export model to GGUF format - export_to_ollama(): Export to Ollama with Modelfile - get_recommended_quantization(): VRAM-based recommendation - print_quantization_guide(): User-friendly guide - Support for multiple quantization types (Q4_K_M, Q8_0, etc.) Enables: - Finance fine-tuning out of the box (5 domains) - Local inference via llama.cpp, ollama, LM Studio - Consumer hardware deployment (Q4_K_M = 28% of FP16 size)

data/generate_finance_data.py: - Generates 252 finance training samples - 6 domains: trading, business, ads, cashflow, Indonesian market, risk - Train/val split: 226/26 samples data/finance/: - finance_dataset.jsonl: Full dataset - train.jsonl: Training split - val.jsonl: Validation split notebooks/Train_Finance_Model.ipynb: - Complete training pipeline for Colab (free T4 GPU) - QLoRA mode: INT4 + LoRA = 8GB VRAM - 5 epochs, ~30-60 min training - Test prompts for validation - Save and share adapter Training data covers: - XAUUSD, EURUSD, GBPUSD, USDJPY, BTCUSD, ETHUSD, USDIDR, AUDUSD - Business plans (8 types × 5 variations) - Ad copy (Meta, Google, TikTok, Shopee) - Cashflow analysis (5 scenarios) - Indonesian market (IDX, crypto, e-commerce, property) - Risk management & portfolio optimization

- Trading analysis: 24 instruments × 13 analysis types × 3 patterns - Business plans: 10 business types × 8 variations - Ad copy: 5 platforms × 8 hooks × 3 variations - Cashflow: 8 business types × 6 scenarios - Indonesian market: 8 sectors × 6 analyses - Risk management: 6 portfolio types × 10 analyses - Pattern recognition: 19 chart patterns × 5 variations - Backtesting: 6 strategies × 6 reports - Sentiment analysis: 9 instruments × 5 reports - Macro economics: 6 regions × 8 analyses Domains: trading, business, ads, cashflow, indonesian_market, risk, patterns, backtest, sentiment, macro Train: 913 | Val: 102 Co-authored-by: OpenClaw <noreply@openclaw.ai>

- Added retry logic with exponential backoff for OSError with errno 9 - Added troubleshooting notes for macOS MPS file descriptor issue - Suggests ulimit -n 10000 for macOS users Co-authored-by: OpenClaw <noreply@openclaw.ai>

oyi77 and others added 10 commits May 20, 2026 10:23

docs: Add BerkahKarya fork README with roadmap and PR links

dfc0534

docs: Update README with correct repo URL and training section

39558a8

fix: Add retry logic for dataset loading (fixes kyegomez#71)

1108161

- Added retry logic with exponential backoff for OSError with errno 9 - Added troubleshooting notes for macOS MPS file descriptor issue - Suggests ulimit -n 10000 for macOS users Co-authored-by: OpenClaw <noreply@openclaw.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Add retry logic for dataset loading (fixes #71)#78

fix: Add retry logic for dataset loading (fixes #71)#78
oyi77 wants to merge 10 commits into
kyegomez:mainfrom
oyi77:fix/issue-71-dataset-error

oyi77 commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oyi77 commented May 20, 2026

Problem

Solution

Changes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant