Skip to content

phi9t/mardia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mardia

Mardi (French for Tuesday) + IA (Intelligence Artificielle) = Mardia

A curated collection of open-source AI infrastructure and model implementations. Currently focused on DeepSeek's releases, but may expand to include other notable open-source AI projects.

Disclaimer: All work in this repository belongs to their respective authors and organizations (primarily DeepSeek). This collection is provided solely as a convenience for study and reference. Please refer to the original repositories for official documentation, updates, and licensing terms.

Release Timeline

Date Release Key Innovation
2024.01 DeepSeek-MoE Fine-grained expert segmentation, shared expert isolation
2024.01 DeepSeek-Coder Code LLM with 86 languages, FIM training
2024.02 DeepSeek-Math GRPO algorithm, math pre-training corpus, tool-integrated reasoning
2024.05 DeepSeek-V2 Multi-head Latent Attention (MLA), 93% KV cache reduction
2024.06 DeepSeek-Coder-V2 MoE code model, 338 languages, 128K context
2024.12 DeepSeek-V3 671B MoE, FP8 training, auxiliary-loss-free balancing
2024.12 DeepSeek-VL2 MoE vision-language model
2025.01 DeepSeek-R1 Reasoning via pure RL, o1-level performance
2025.02 Open Source Week FlashMLA, DeepEP, DeepGEMM, DualPipe, 3FS
2025.09 DeepSeek-V3.2 DeepSeek Sparse Attention (DSA)

Reading Order

Architecture track (understand the model evolution):

  1. DeepSeek-MoE - Foundation: fine-grained experts, shared experts
  2. DeepSeek-Math - GRPO: efficient RL without critic model
  3. DeepSeek-V2 - MLA attention that makes decoding compute-bound
  4. DeepSeek-V3 - Full stack: FP8 training, MTP, load balancing
  5. DeepSeek-R1 - RL-based reasoning emergence

Infrastructure track (understand the systems):

  1. 3FS - Storage layer: CRAQ consistency, 6.6 TiB/s throughput
  2. DeepGEMM - Compute: FP8 GEMM, 1550 TFLOPS
  3. FlashMLA - Attention: MLA kernels, 660 TFLOPS
  4. DeepEP - Communication: expert parallelism, 77μs latency
  5. DualPipe - Training: bidirectional PP, 78% less bubble

Infrastructure

Project Description
3fs Fire-Flyer File System - High-performance distributed file system for AI workloads
deep_ep Communication library for Mixture-of-Experts (MoE) and expert parallelism
deep_gemm Efficient GEMM kernels (FP8/BF16) with JIT compilation
dualpipe Bidirectional pipeline parallelism with full computation-communication overlap
flash_mla Optimized Multi-head Latent Attention kernels for Hopper GPUs
smallpond Lightweight data processing framework built on DuckDB and 3FS
engram Conditional memory via scalable N-gram lookup for LLMs

Models

Project Description
qwen3_tts Educational implementation of Qwen3-TTS architecture, training recipe, and validation
qwen3_vl Educational implementation of Qwen3-VL architecture, multimodal training format, and validation
deepseek_v3 DeepSeek-V3 model implementation
deepseek_v3_2_exp DeepSeek-V3.2 experimental release
deepseek_r1 DeepSeek-R1 reasoning model
deepseek_v2 DeepSeek-V2 model
deepseek_vl2 DeepSeek-VL2 vision-language model
deepseek_coder DeepSeek-Coder for code generation
deepseek_coder_v2 DeepSeek-Coder-V2
deepseek_math DeepSeek-Math for mathematical reasoning (GRPO)
deepseek_math_v2 DeepSeek-Math-V2 with self-verifiable proofs
deepseek_moe DeepSeek-MoE base implementation
deepseek_ocr_v2 DeepSeek-OCR-V2

Resources

License

See individual project directories for specific licenses.

About

AI Reading Group

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors