Add MachO support for kext and bundle file types#674
Add MachO support for kext and bundle file types#674
Conversation
Also fixes several bugs in the dyld chained fixups walker that were exposed by kext binaries: - Use correct `next` field and stride per pointer format. Arm64e packs `next` as 11 bits at bit 51; Generic64 packs it as 12 bits at bit 52. The old code always used generic64.rebase.next, producing garbage chain walks for ARM64E_KERNEL (stride 4) kexts. - Handle file-offset-to-vaddr shift in segments where vmaddr != fileoff (common in kexts' __DATA_CONST, previously only allowed for __ETC). - Add bounds checks: stop chain walks that exceed page boundaries or produce out-of-range bind ordinals instead of crashing. - Add missing _fields_ to dyld_chained_ptr_arm64e_bind24 struct. - Fix PIC detection to use `filetype in (...)` instead of bitwise AND. Tested against all 907 kext binaries in KDK_26.4.1_25E253.kdk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Corpus decompilation diffs can be found at angr/dec-snapshots@master...angr/cle_674 |
When a universal binary contains multiple architecture slices and no arch= is passed, loading all slices causes address collisions (e.g. multiple MH_EXECUTE slices all mapping to 0x400000). Pick the first slice and log a warning telling the user how to select a specific one. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add test_macho_kext.py with 10 tests covering MH_KEXT_BUNDLE loading using the IPwnKit kext binary: filetype detection, PIC, base address, segments, sections, symbols (including IOKit class names), relocations, and code readability. Update test_universal2.py to match the new default behavior of loading only the first architecture slice when no arch= is specified. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Wrt the "only load the first slice" commit, it looks like you've found an issue that I found before and have been procrastinating on fixing. Can you make sure your fix aligns with the fix I describe here? |
Per @rhelmot's review feedback: when a fat binary is loaded as a dependency rather than as the main object, select the slice that matches the main binary's arch instead of the first slice. This keeps dependency loading consistent with the main binary's arch. Also extracts the slice filter into a static helper and adds a unit test for it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks @rhelmot — pushed
Added a unit test for the filter helper. There's a separate pre-existing issue where loading a fat binary containing |
|
@fmagin I don't seem to be able to request a review from you, but can you take a look at this? |
|
Doesn't look wrong in any way that I noticed by reading it. I can run this on a larger dataset at work next week to catch regressions on the data that I care about |
|
I tested on all kexts in KDK 26.4 and a bunch of user space binaries, they all at least load. Also added test case kext that I wrote to binaries. |
|
FWIW I'm fine with merging this now, it does look good to me. |
- macho.py: drop duplicate DyldChainedPtrFormats import alias and use the full enum name in _CHAIN_STRIDE; add docstring to _ChainStride to fix pylint missing-class-docstring. - test_macho_kext.py: assert isinstance(MachO)/isinstance(MachOSegment) so the type checker can resolve segname/sections attributes. - macos.yml + windows.yml: check out a same-named branch from angr/binaries when one exists, falling back to master. Lets cross-repo PRs picked up on macOS and Windows runners (linux CI already does this via PR-body references). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
It looks like your edits to the workflows just didn't work at all, so I've merged the binaries branch. Please remove those lines and we'll see what CI says. |
Summary
MH_KEXT_BUNDLE(filetype 11) andMH_BUNDLE(filetype 8) Mach-O binaries, enabling CLE to load macOS kernel extensions from KDKs.nextbitfield and stride per pointer format — the old code always usedgeneric64.rebase.next(12 bits at bit 52), which reads garbage for Arm64e formats likeDYLD_CHAINED_PTR_ARM64E_KERNELwherenextis 11 bits at bit 51.vmaddr != fileoff(common in kexts'__DATA_CONST), which previously crashed with an assertion that only__ETCsegments could have this._fields_todyld_chained_ptr_arm64e_bind24struct and adds defensive bounds checks for chain walks.filetype & MH_DYLIB(bitwise AND of ints) to properfiletype in (...).arch=is specified, load only the first slice (with a warning) for the main binary, and pick the slice matching the main binary's arch when loaded as a dependency. Avoids address collisions and downstream breakage from multipleis_main_binobjects.Test binary lives in angr/binaries#166. The macOS and Windows workflows in this PR pick up a same-named branch from
angr/binarieswhen present (falling back tomaster); the linux CI uses the body reference above.Test plan
KDK_26.4.1_25E253.kdk/System/Library/Extensions//bin+/usr/bindylditself which is filetype 7, and a dangling symlink to a shared-cache-only dylib)🤖 Generated with Claude Code