Skip to content

Add MachO support for kext and bundle file types#674

Open
adamdoupe wants to merge 5 commits intomasterfrom
macho-kext-bundle-support
Open

Add MachO support for kext and bundle file types#674
adamdoupe wants to merge 5 commits intomasterfrom
macho-kext-bundle-support

Conversation

@adamdoupe
Copy link
Copy Markdown
Contributor

@adamdoupe adamdoupe commented Apr 12, 2026

Summary

  • Adds support for loading MH_KEXT_BUNDLE (filetype 11) and MH_BUNDLE (filetype 8) Mach-O binaries, enabling CLE to load macOS kernel extensions from KDKs.
  • Fixes the dyld chained fixups chain walker to use the correct next bitfield and stride per pointer format — the old code always used generic64.rebase.next (12 bits at bit 52), which reads garbage for Arm64e formats like DYLD_CHAINED_PTR_ARM64E_KERNEL where next is 11 bits at bit 51.
  • Handles file-offset-to-vaddr shift in segments where vmaddr != fileoff (common in kexts' __DATA_CONST), which previously crashed with an assertion that only __ETC segments could have this.
  • Adds missing _fields_ to dyld_chained_ptr_arm64e_bind24 struct and adds defensive bounds checks for chain walks.
  • Fixes PIC detection from buggy filetype & MH_DYLIB (bitwise AND of ints) to proper filetype in (...).
  • Universal2: when no arch= is specified, load only the first slice (with a warning) for the main binary, and pick the slice matching the main binary's arch when loaded as a dependency. Avoids address collisions and downstream breakage from multiple is_main_bin objects.

Test binary lives in angr/binaries#166. The macOS and Windows workflows in this PR pick up a same-named branch from angr/binaries when present (falling back to master); the linux CI uses the body reference above.

Test plan

  • All 27 MachO/Universal2 unit tests pass (10 new kext tests, helper test, 8 macho tests, 9 universal2 tests)
  • Successfully loads all 907 kext binaries in KDK_26.4.1_25E253.kdk/System/Library/Extensions/
  • Successfully loads 699/699 Mach-O binaries in macOS /bin + /usr/bin
  • Successfully loads 88/90 Mach-O binaries from an extracted iPhone filesystem (2 remaining: dyld itself which is filetype 7, and a dangling symlink to a shared-cache-only dylib)
  • Pre-commit hooks pass

🤖 Generated with Claude Code

Also fixes several bugs in the dyld chained fixups walker that were
exposed by kext binaries:

- Use correct `next` field and stride per pointer format. Arm64e packs
  `next` as 11 bits at bit 51; Generic64 packs it as 12 bits at bit 52.
  The old code always used generic64.rebase.next, producing garbage
  chain walks for ARM64E_KERNEL (stride 4) kexts.

- Handle file-offset-to-vaddr shift in segments where vmaddr != fileoff
  (common in kexts' __DATA_CONST, previously only allowed for __ETC).

- Add bounds checks: stop chain walks that exceed page boundaries or
  produce out-of-range bind ordinals instead of crashing.

- Add missing _fields_ to dyld_chained_ptr_arm64e_bind24 struct.

- Fix PIC detection to use `filetype in (...)` instead of bitwise AND.

Tested against all 907 kext binaries in KDK_26.4.1_25E253.kdk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@angr-bot
Copy link
Copy Markdown
Member

Corpus decompilation diffs can be found at angr/dec-snapshots@master...angr/cle_674

adamdoupe and others added 2 commits April 11, 2026 20:44
When a universal binary contains multiple architecture slices and no
arch= is passed, loading all slices causes address collisions (e.g.
multiple MH_EXECUTE slices all mapping to 0x400000). Pick the first
slice and log a warning telling the user how to select a specific one.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add test_macho_kext.py with 10 tests covering MH_KEXT_BUNDLE loading
using the IPwnKit kext binary: filetype detection, PIC, base address,
segments, sections, symbols (including IOKit class names), relocations,
and code readability.

Update test_universal2.py to match the new default behavior of loading
only the first architecture slice when no arch= is specified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rhelmot
Copy link
Copy Markdown
Member

rhelmot commented Apr 12, 2026

Wrt the "only load the first slice" commit, it looks like you've found an issue that I found before and have been procrastinating on fixing. Can you make sure your fix aligns with the fix I describe here?

Per @rhelmot's review feedback: when a fat binary is loaded as a
dependency rather than as the main object, select the slice that
matches the main binary's arch instead of the first slice. This
keeps dependency loading consistent with the main binary's arch.

Also extracts the slice filter into a static helper and adds a
unit test for it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@adamdoupe
Copy link
Copy Markdown
Contributor Author

Thanks @rhelmot — pushed f49c805 aligning with your design:

  • Main binary: pick a single arch (first slice by default, overridable via arch=), with a warning when there's ambiguity.
  • Loaded as a dependency: pick the slice matching the main binary's arch (via the new _filter_slices_by_arch helper), falling back to the first slice only if there's no main object yet.

Added a unit test for the filter helper. There's a separate pre-existing issue where loading a fat binary containing MH_EXECUTE slices as a dependency hits an is_main_bin assertion in the MachO backend — that's orthogonal to this change and would affect normal fat-dylib dependency loading the same way regardless. Happy to address it in a follow-up if you'd like.

@rhelmot
Copy link
Copy Markdown
Member

rhelmot commented Apr 12, 2026

@fmagin I don't seem to be able to request a review from you, but can you take a look at this?

@fmagin
Copy link
Copy Markdown
Contributor

fmagin commented Apr 12, 2026

Doesn't look wrong in any way that I noticed by reading it. I can run this on a larger dataset at work next week to catch regressions on the data that I care about

@adamdoupe
Copy link
Copy Markdown
Contributor Author

I tested on all kexts in KDK 26.4 and a bunch of user space binaries, they all at least load. Also added test case kext that I wrote to binaries.

@fmagin
Copy link
Copy Markdown
Contributor

fmagin commented Apr 12, 2026

FWIW I'm fine with merging this now, it does look good to me.
I'm updating the angr version we use at work currently anyway, and as part of that I am running acceptance tests for a dataset of a few thousand apps. So that would catch things like some symbol address changing, symbols disappearing or appearing, etc

- macho.py: drop duplicate DyldChainedPtrFormats import alias and use the
  full enum name in _CHAIN_STRIDE; add docstring to _ChainStride to fix
  pylint missing-class-docstring.
- test_macho_kext.py: assert isinstance(MachO)/isinstance(MachOSegment)
  so the type checker can resolve segname/sections attributes.
- macos.yml + windows.yml: check out a same-named branch from angr/binaries
  when one exists, falling back to master. Lets cross-repo PRs picked up
  on macOS and Windows runners (linux CI already does this via PR-body
  references).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rhelmot
Copy link
Copy Markdown
Member

rhelmot commented Apr 13, 2026

It looks like your edits to the workflows just didn't work at all, so I've merged the binaries branch. Please remove those lines and we'll see what CI says.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants