Skip to content

Add OMF read support#807

Draft
encounter wants to merge 13 commits into
gimli-rs:mainfrom
encounter:omf
Draft

Add OMF read support#807
encounter wants to merge 13 commits into
gimli-rs:mainfrom
encounter:omf

Conversation

@encounter

@encounter encounter commented Sep 25, 2025

Copy link
Copy Markdown
Contributor

Adds read support for OMF (Relocatable Object Module Format), the object format used by DOS-era compilers (Borland C++, Open Watcom, MS C, etc.). Both 16-bit and 32-bit variants are supported.

OMF doesn't have a notion of sections; data is contributed to segments (SEGDEF) and COMDATs. This implementation maps both to sections in the unified read API:

  • Each SEGDEF becomes a section, with LEDATA/LIDATA records contributing data chunks.
  • Each COMDAT synthesizes a section and a defined symbol, with ObjectComdat tying them together. Continuation records and iterated data are supported.
  • Borland's virtual segment extension (COMDEF with a segment index data type, referenced by other records via segment indices with bit 14 set) is handled the same way as COMDATs.

Since segment data is split across records (and LIDATA requires expansion), data() returns a contiguous &'data [u8] only when possible, and uncompressed_data() returns the assembled/expanded data otherwise (as discussed below).

Fixups are exposed as relocations, with the full location/mode/frame/target information preserved in RelocationFlags::Omf. The generic RelocationKind mapping is best-effort:

  • self-relative fixups map to Relative (with the addend adjusted to be relative to the end of the location)
  • segment-relative offset fixups map to Absolute when the frame is the FLAT group, or SectionOffset when the frame is the target's segment; otherwise Unknown
  • base fixups map to SectionIndex, and far pointer fixups to Absolute

Testing:

  • objdump snapshot tests in crates/examples/testfiles/omf cover all test files, including objects built with Borland C++ 4.5 and Open Watcom (contributed by a user; added in the object-testfiles PR).
  • Handwritten tests verify details the snapshots don't cover: LIDATA expansion contents, COMDAT section data, and symbol properties.

To-do:

  • Open a separate PR to object-testfiles. (here, updated with Borland/Watcom test files)
  • Improve fixup (relocation) support.
  • Test improvements: objdump snapshot tests and more detailed handwritten tests.

Not handled (can be follow-ups):

  • Backpatch records (BAKPAT/NBKPAT) — none of the test compilers emit them.
  • Line number records (LINNUM/LINSYM) are skipped.

Resources:

Resolves #736

@philipc

philipc commented Sep 26, 2025

Copy link
Copy Markdown
Contributor

Instead, I use the uncompressed_data API to expose the expanded/contiguous data.

That's logically what I would expect that API to return, so I think this is ok.

I don't want to change the data and compressed_data APIs to allow disjoint data just for OMF. We could add methods to OmfSection if that information is required (or maybe even add a new ObjectSection method). Do you have a need in objdiff for more than what uncompressed_data gives you?

@philipc

philipc commented Sep 26, 2025

Copy link
Copy Markdown
Contributor

I don't have any major feedback, this looks sane to me. Thanks for working on it.

For testing, the preferred approach is to run objdump on the test files and record the expected output in crates/examples/testfiles. Handwritten tests are fine for things that doesn't cover. Also readobj support in the future might help too, but it's not needed immediately.

@philipc

philipc commented Sep 26, 2025

Copy link
Copy Markdown
Contributor

For example, you can add a test using:

touch crates/examples/testfiles/omf/comprehensive_test.obj.objdump
cargo xtask test-update

then manually verify comprehensive_test.obj.objdump before adding it to git.

encounter added 13 commits June 11, 2026 10:16
- Fix SEGDEF length field size to depend on record type, not the P (use32)
  bit, and honor the B (big) bit. This fixes parsing of Borland objects.
- Parse COMDAT records per the TIS spec (separate align byte, conditional
  public base fields), synthesize sections and symbols for them, and
  support continuation and iterated data.
- Allow FIXUPP records to follow COMDAT records.
- Support Borland virtual segments (COMDEF with a segment index data type,
  referenced via segment indices with bit 14 set).
- Read the FIXUP M bit from the correct byte (LOCAT instead of fix data).
- Fix iterated data (LIDATA) repeat/block counts to be plain integers
  rather than COMDEF-style encoded values, support multiple consecutive
  blocks, and propagate expansion errors.
- Fix relocation section targets to use 1-based section indices.
- Rework relocation kind mapping: self-relative fixups map to Relative
  with an end-of-location addend; segment-relative offsets map to Absolute
  (FLAT frame) or SectionOffset (target-section frame).
- Return empty imports/exports like other relocatable formats.
- Add Borland and Watcom test files (from object-testfiles)
- Add objdump snapshot outputs for all OMF test files
- Verify LIDATA expansion content and COMDAT sections/symbols in tests
- Format with rustfmt; fix clippy lints
- Add module documentation
- Reject invalid record types instead of silently skipping
- Recognize the DWARF segment class as debug sections
- Add read_core,omf to the feature test matrix
- Reject zero-length records (slice panic)
- Check for overflow when computing FIXUP offsets
- Limit iterated data block nesting depth (stack overflow)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support OMF format object files.

2 participants