Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
9bc651a
chore: migrate mlx-swift from hardcoded source partition to git submo…
solderzzc Apr 6, 2026
0fdfea7
docs: add workflow skill for MLX upstream synchronization and SSD-str…
solderzzc Apr 6, 2026
1395456
docs: expand upstream sync flow to include verification, integration,…
solderzzc Apr 6, 2026
e7f4224
chore: properly register SharpAI/mlx-swift as an isolated git submodule
solderzzc Apr 6, 2026
98e43d8
fix(deps): use remote URL dependencies for mlx-swift and mlx-swift-lm
Apr 6, 2026
5718e74
chore(submodules): bump mlx-swift to fix C++20 build and signatures
solderzzc Apr 6, 2026
0c84bc5
Remove obsolete TokenizerLoader and Downloader bridge structs
Apr 6, 2026
c17312b
chore: update mlx-swift submodule reference and package lock
solderzzc Apr 6, 2026
47ccaa3
Merge remote-tracking branch 'origin/feature/use-upstream-mlx-swift' …
solderzzc Apr 6, 2026
e01279a
chore: bump mlx-swift local package reference to head
solderzzc Apr 6, 2026
357db2e
docs(workflows): add SharpAI custom patches inventory to upstream syn…
solderzzc Apr 6, 2026
b20f16d
fix(ci): Resolve mlx-swift-lm v3 API updates and update dependencies
solderzzc Apr 7, 2026
f1bc236
ci: remove branch push triggers to fix PR pipeline duplication
solderzzc Apr 7, 2026
edba29e
feat: implement streaming json output sanitization for prefixes
solderzzc Apr 7, 2026
8c6dcd2
chore: align inference engine loaders and update dependency hashes
solderzzc Apr 7, 2026
50ab9b6
chore: remove LocalPackages/mlx-swift proxy repo now that SharpAI rem…
solderzzc Apr 7, 2026
fd107c3
chore: wipe out lingering tracked files from former local package inj…
solderzzc Apr 7, 2026
0753297
fix: update build.sh MLX_SRC to point to remote tracker checkouts and…
solderzzc Apr 7, 2026
9c87932
feat: dynamically auto-clone homesec-benchmark logic using sparse che…
solderzzc Apr 7, 2026
d5897eb
feat: securely sparsify benchmark cloning locally to generic homesec-…
solderzzc Apr 7, 2026
852fa86
fix: bust github actions SPM cache to clear stale upstream C++ files
solderzzc Apr 7, 2026
0f59dad
fix: explicit Tokenizers dependency for MLXInferenceCore to prevent m…
solderzzc Apr 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
134 changes: 134 additions & 0 deletions .agents/workflows/mlx-upstream-sync.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
description: How to synchronize Apple MLX ecosystem updates into SharpAI forks and triage SSD-streaming bugs
---

# Upstream MLX Synchronization & SSD Streaming Maintenance

This workflow documents the architecture for maintaining Apple MLX forks within the SharpAI repository ecosystem, executing upstream synchronization, and resolving bugs within the `ssd_streamer` custom extensions.

## 1. Ecosystem Architecture

The `mlx-server` repository now cleanly references the upstream Swift layer `SharpAI/mlx-swift` via Swift Package Manager (`SPM`).

```
mlx-server (SharpAI/SwiftLM)
└── SPM Dependency: SharpAI/mlx-swift (The Swift wrapper wrapper)
├── .gitmodules
│ ├── submodules/mlx -> https://github.com/SharpAI/mlx (Branch: main)
│ └── submodules/mlx-c -> https://github.com/SharpAI/mlx-c (Branch: main)
```

**Never bundle C++ source files directly into `mlx-swift`.** All Apple core Engine updates and C-wrapper modifications MUST be executed in the `SharpAI/mlx` and `SharpAI/mlx-c` forks respectively.

## 2. Upstream Feature Verification & Integration Flow

When Apple releases new features to `ml-explore/mlx` or `ml-explore/mlx-c`, follow this systematic process to verify, integrate, and validate the changes before bringing them into the SharpAI ecosystem.

### 2.1 Double-Checking Upstream Features

Before syncing, verify if Apple's upstream actually fulfills all your custom requirements (which informs whether you should safely drop your custom patches):

1. **Review Upstream Logging/Releases:** Actively monitor the [Apple MLX Releases page](https://github.com/ml-explore/mlx/releases) or the `main` commit history for mentions of "quantization", "streaming", "memory-mapped operations", or "out-of-core inference".
2. **Examine Target C++ Kernels:**
- Look primarily in `mlx/backend/metal/` and `mlx/core/`.
- Has upstream Apple added an equivalent to `moe_stream_op.cpp` natively?
- Do the Metal shaders in `mlx/backend/metal/kernels/` natively introduce block execution / memory-mapped loading primitives similar to our `ssd_streamer.mm` and `fence.air` logic?
3. **Check Exported C-APIs:** Look at `mlx/c/ops.h` and `mlx/c/fast.h` in `ml-explore/mlx-c`. If Apple has added official C-bindings for out-of-core tensor operations, you can securely begin stripping out the custom SharpAI C++ bridging codebase.

### 2.2 Integration Flow

If Apple's features are highly beneficial (e.g., core Metal optimizations) but do not explicitly replace our SSD streaming, we need to pull their features *while maintaining* the SharpAI SSD kernels.

1. **Pull Upstream to SharpAI forks**:
```bash
git clone https://github.com/SharpAI/mlx && cd mlx
git remote add upstream https://github.com/ml-explore/mlx
git fetch upstream

# Rebase Apple's latest main directly under our custom SSD commits
git rebase upstream/main
# Resolve any merge conflicts specifically around `fast.cpp` or Make/CMake builds
git push -f origin main
```
2. Execute the identical rebasing process for `SharpAI/mlx-c`, monitoring `mlx_c/ops.cpp`.
3. In `SharpAI/mlx-swift`, update the submodule pointers to mount your freshly rebased commits:
```bash
cd LocalPackages/mlx-swift
git submodule update --remote --recursive
git commit -am "chore: sync latest Apple MLX components and re-graft SSD patches"
git push origin main
```

### 2.3 Validation Flow

Do not deploy binary updates to the inference engine without executing the extreme validation matrix.

1. **Clean Re-Build:** Always execute a destructive cache wipe before a Metal compilation test.
```bash
# In mlx-server framework
rm -rf .build
./build.sh
```
2. **Swift API Layer Verification:** Run the test suites within your wrapper to certify that the Swift `->` C `->` C++ bindings remain structurally unified.
```bash
cd LocalPackages/mlx-swift
swift test
```
3. **Extreme Context Benchmarking (The Harness):**
- Run the dedicated `/run-benchmark` workflow from the root `mlx-server` directory (utilizing `run_benchmark.sh` or `profile_runner.py`).
- Specifically target models invoking >32k token contexts. High prompt generation latency, GPU thrashing, or hard Out-of-Memory (OOM) faults directly indicate that the Metal barrier (`fence.air`) or `ssd_streamer.mm` broke silently during the git rebase.

## 3. Triaging SSD-Stream Bugs

The SSD streaming kernels introduce custom memory synchronization routines (`ssd_streamer.h`, `ssd_streamer.mm`) that interact with Apple's core MLX framework (`mlx/core/moe_stream_op.cpp`).

**Triage Protocol:**
- **Crash in Metal Execution (`fence.air`, `moe_stream.metal`)**: Identify if Apple's upstream Metal API (`mlx/backend/metal/device.h`) changed rendering assumptions. Navigate to `SharpAI/mlx` and patch `mlx/backend/metal/ssd_streamer.mm`.
- **C-API Mapping Errors (`fast.cpp`, `ops.cpp`)**: Swift throws errors linking to underlying kernels. Navigate to `SharpAI/mlx-c` and ensure `mlx/c/ops.cpp` cleanly wraps the updated arguments from `SharpAI/mlx`'s `moe_stream_op.h`.
- **Memory Leaks/High Swap Usage**: Typically arises if the `fence.air` streaming barrier lacks synchronization with the newly upstreamed Apple thread-pool executors.

## 4. Retiring the Fork (When to Drop)

> [!WARNING]
> The ultimate goal is to delete the `SharpAI/mlx` and `SharpAI/mlx-c` forks and point `SharpAI/mlx-swift` directly to `ml-explore/mlx` natively.

**Indications for Dropping the Fork:**
1. Apple officially merges Turbo Quant framework into `ml-explore/mlx/fast/turbo_quant.h` or equivalent upstream PR.
2. Apple natively supports out-of-core SSD context offloading (e.g., streaming inference blocks directly from Non-Volatile Memory to GPU) in `ml-explore/mlx/backend/metal/`.
3. If Apple's `moe_stream_op` native implementations match or exceed the latency speedups provided by your custom `ssd_streamer.mm`.

If any of these conditions are met, simply rewrite `SharpAI/mlx-swift/.gitmodules` back to `https://github.com/ml-explore/mlx` and delete your Github forks!

## 5. SharpAI Custom Patches Inventory (vs. Upstream ml-explore)

As of **April 2026**, the following specific features exist ONLY in our custom forks. Knowing precisely *what* we added is the key to knowing exactly *when* we can revert to Apple's native upstream (`ml-explore`).

### 🛠️ In `SharpAI/mlx` (C++ Engine)
*Compared to `ml-explore/mlx:main`*
1. `feat: custom ssd-streaming kernels and custom MLX I/O fast loaders`
- Added `moe_stream_op` primitives enabling SSD flash streaming (out-of-core execution).
2. `fix(metal): align moe_stream_op add_temporary signature with latest apple upstream`
- Custom extensions needed maintaining against newer MLX memory-pool updates.
3. `fix(metal): add default initialization loop for bound encoder contexts in async`
- Patched `device.cpp` so thread pool reassignments by Swift's async engine don't result in fatal runtime aborts due to missing context dictionaries.

### 🛠️ In `SharpAI/mlx-c` (C-API Bridge)
*Compared to `ml-explore/mlx-c:main`*
1. `chore: rebase SharpAI custom ops onto latest Apple MLX-C upstream to fix fft/dequantize signatures`
2. `fix(ops): align c wrappers with mlx 0.30.0+ upstream signatures for dequantize, qqmm, and fft`
3. `fix(fft): restore Shape type for fft methods n parameter` & `fix(fft): remove invalid norm from fftshift calls`
- Resolves signature drift and struct mismatches linking the new C++ API modifications down to Swift C headers.

### 🛠️ In `SharpAI/mlx-swift` (Swift Wrappers)
*Compared to `ml-explore/mlx-swift:main`*
1. `Restoration of missing MLX custom extensions including C-API and Swift bridge` & `Update custom C++ kernel patches for SSD Streaming`
- Recreated Swift integrations bridging into out-of-core functionality.
2. `chore: isolate SharpAI custom MLX/MLX-C engines into dedicated GitHub forks`
- Submodule remotes internally pinned from `ml-explore` tracking links to `SharpAI` ecosystem forks.
3. `fix(build): bump cxxLanguageStandard to .gnucxx20 for Apple MLX upstream compatibility`
- Custom `Package.swift` override explicitly permitting C++20 standard since upstream didn't upgrade constraints simultaneously.
4. `fix(mlx): build steel_conv_3d C++ string for Cmlx target`
- Added missing header dependencies specifically isolated by recent upstream migrations.
5. `fix(jit): update generated mlx c++ metal headers and fix fast.h signature to match fast.cpp`
- Recompiled Metal header string buffers internally inside `mlx-generated` ensuring `affine_qmm_t_splitk` and other functions are dynamically injected at runtime.
6 changes: 3 additions & 3 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: Build

on:
push:
branches: [main, develop, feature/*]
branches: [main]
pull_request:
branches: [main]

Expand All @@ -23,9 +23,9 @@ jobs:
path: .build
# Key includes product name so any rename (e.g. mlx-server→SwiftLM)
# automatically busts the cache and prevents stale PCH errors.
key: ${{ runner.os }}-spm-SwiftLM-${{ hashFiles('Package.resolved') }}
key: ${{ runner.os }}-spm-SwiftLM-v2-${{ hashFiles('Package.resolved') }}
restore-keys: |
${{ runner.os }}-spm-SwiftLM-
${{ runner.os }}-spm-SwiftLM-v2-

- name: Resolve dependencies
run: swift package resolve
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/e2e-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: E2E Tests

on:
push:
branches: [main, feature/*]
branches: [main]
pull_request:
branches: [main]

Expand All @@ -24,9 +24,9 @@ jobs:
uses: actions/cache@v4
with:
path: .build
key: ${{ runner.os }}-spm-SwiftLM-${{ hashFiles('Package.resolved') }}
key: ${{ runner.os }}-spm-SwiftLM-v2-${{ hashFiles('Package.resolved') }}
restore-keys: |
${{ runner.os }}-spm-SwiftLM-
${{ runner.os }}-spm-SwiftLM-v2-

- name: Clear stale module cache
# Prevents: "PCH was compiled with module cache path '…mlx-server…'
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ DerivedData/
curl_out.txt
sample.txt
tmp/
/homesec-benchmark/
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[submodule "mlx-swift-lm"]
path = mlx-swift-lm
url = https://github.com/SharpAI/mlx-swift-lm.git
[submodule "LocalPackages/mlx-swift"]
path = LocalPackages/mlx-swift
url = https://github.com/SharpAI/mlx-swift
29 changes: 0 additions & 29 deletions LocalPackages/mlx-swift/.github/ISSUE_TEMPLATE/bug_report.md

This file was deleted.

12 changes: 0 additions & 12 deletions LocalPackages/mlx-swift/.github/pull_request_template.md

This file was deleted.

22 changes: 0 additions & 22 deletions LocalPackages/mlx-swift/.github/scripts/build-linux-cuda-cmake.sh

This file was deleted.

8 changes: 0 additions & 8 deletions LocalPackages/mlx-swift/.github/scripts/run-xcode-tests.sh

This file was deleted.

This file was deleted.

Loading
Loading