Mardi (French for Tuesday) + IA (Intelligence Artificielle) = Mardia
A curated collection of open-source AI infrastructure and model implementations. Currently focused on DeepSeek's releases, but may expand to include other notable open-source AI projects.
Disclaimer: All work in this repository belongs to their respective authors and organizations (primarily DeepSeek). This collection is provided solely as a convenience for study and reference. Please refer to the original repositories for official documentation, updates, and licensing terms.
| Date | Release | Key Innovation |
|---|---|---|
| 2024.01 | DeepSeek-MoE | Fine-grained expert segmentation, shared expert isolation |
| 2024.01 | DeepSeek-Coder | Code LLM with 86 languages, FIM training |
| 2024.02 | DeepSeek-Math | GRPO algorithm, math pre-training corpus, tool-integrated reasoning |
| 2024.05 | DeepSeek-V2 | Multi-head Latent Attention (MLA), 93% KV cache reduction |
| 2024.06 | DeepSeek-Coder-V2 | MoE code model, 338 languages, 128K context |
| 2024.12 | DeepSeek-V3 | 671B MoE, FP8 training, auxiliary-loss-free balancing |
| 2024.12 | DeepSeek-VL2 | MoE vision-language model |
| 2025.01 | DeepSeek-R1 | Reasoning via pure RL, o1-level performance |
| 2025.02 | Open Source Week | FlashMLA, DeepEP, DeepGEMM, DualPipe, 3FS |
| 2025.09 | DeepSeek-V3.2 | DeepSeek Sparse Attention (DSA) |
Architecture track (understand the model evolution):
- DeepSeek-MoE - Foundation: fine-grained experts, shared experts
- DeepSeek-Math - GRPO: efficient RL without critic model
- DeepSeek-V2 - MLA attention that makes decoding compute-bound
- DeepSeek-V3 - Full stack: FP8 training, MTP, load balancing
- DeepSeek-R1 - RL-based reasoning emergence
Infrastructure track (understand the systems):
- 3FS - Storage layer: CRAQ consistency, 6.6 TiB/s throughput
- DeepGEMM - Compute: FP8 GEMM, 1550 TFLOPS
- FlashMLA - Attention: MLA kernels, 660 TFLOPS
- DeepEP - Communication: expert parallelism, 77μs latency
- DualPipe - Training: bidirectional PP, 78% less bubble
| Project | Description |
|---|---|
| 3fs | Fire-Flyer File System - High-performance distributed file system for AI workloads |
| deep_ep | Communication library for Mixture-of-Experts (MoE) and expert parallelism |
| deep_gemm | Efficient GEMM kernels (FP8/BF16) with JIT compilation |
| dualpipe | Bidirectional pipeline parallelism with full computation-communication overlap |
| flash_mla | Optimized Multi-head Latent Attention kernels for Hopper GPUs |
| smallpond | Lightweight data processing framework built on DuckDB and 3FS |
| engram | Conditional memory via scalable N-gram lookup for LLMs |
| Project | Description |
|---|---|
| qwen3_tts | Educational implementation of Qwen3-TTS architecture, training recipe, and validation |
| qwen3_vl | Educational implementation of Qwen3-VL architecture, multimodal training format, and validation |
| deepseek_v3 | DeepSeek-V3 model implementation |
| deepseek_v3_2_exp | DeepSeek-V3.2 experimental release |
| deepseek_r1 | DeepSeek-R1 reasoning model |
| deepseek_v2 | DeepSeek-V2 model |
| deepseek_vl2 | DeepSeek-VL2 vision-language model |
| deepseek_coder | DeepSeek-Coder for code generation |
| deepseek_coder_v2 | DeepSeek-Coder-V2 |
| deepseek_math | DeepSeek-Math for mathematical reasoning (GRPO) |
| deepseek_math_v2 | DeepSeek-Math-V2 with self-verifiable proofs |
| deepseek_moe | DeepSeek-MoE base implementation |
| deepseek_ocr_v2 | DeepSeek-OCR-V2 |
- Open Infra Index - Overview of DeepSeek's open-source releases
See individual project directories for specific licenses.