Cross-Version Diff Guidebook
Cross-Version Source Diff Guidebook
Best practices for producing the cleanest possible source-level diffs between KN5000 firmware versions. Derived from the v9-to-v10 diff minimization journey (5,388 lines reduced to 44) and applied to all version pairs.
Principles
1. The diff should tell the engineering story
A good cross-version diff shows ONLY what the firmware engineers actually changed. Every diff line should answer “what did Matsushita change and why?” If a diff line exists because of how WE chose to represent the data rather than because of a genuine firmware difference, it’s noise.
2. Symbolic references absorb address shifts
When code grows or shrinks between versions, every absolute address after the change point shifts. If the source uses symbolic label references (.long FunctionName instead of .long 0x00ef1234), the linker resolves per-version and the diff shows nothing. Every pointer that CAN be symbolic MUST be symbolic.
3. Identical source for identical behavior
If two versions execute the same code or contain the same data structure (just at different addresses), the source text should be IDENTICAL. The versions should share the same .s files wherever the functionality is unchanged, with version differences isolated to the minimum number of files.
4. Data blobs are a last resort
Replacing source-level representations (.byte, .long, .ascii, instructions) with opaque .incbin binary blobs defeats the purpose of disassembly. A blob hides the structure and makes the diff useless. Binary blobs are acceptable ONLY when:
- The data has no known structure (raw bitmap pixels, opaque firmware tables)
- The data is genuinely different between versions AND the structure is unknown
- Temporarily, as a stepping stone toward full source representation
5. Code must never be a blob
Executable code must ALWAYS be disassembled instructions, never .incbin of raw bytes. If code shifted between versions, the correct fix is symbolic references (labels, .set, .equ), not binary blobs. The entire point of the disassembly project is to have readable source.
Rules
Rule 1: Use symbolic .long for ALL pointer tables
Bad (version-specific, creates diff noise):
.long 0x00ef1234 ; hardcoded address
Good (version-independent, linker resolves):
.long FunctionName ; symbolic, same in all versions
If the pointer target doesn’t have a label, ADD one. Every function, data table, and entry point should have a meaningful label.
Rule 2: Use symbolic call/jp/calr/jrl for ALL branches
Never leave numeric displacements in branch instructions when the target has a label.
Bad: calr 64899
Good: calr HdaeRom_DataDispatch_Block
Rule 3: Use addr24 with labels, not _addr24_* constants
The addr24 macro uses .reloc R_TLCS900_24 to emit linker-resolved 24-bit addresses. Use it directly with labels.
Bad: .set _addr24_Free, 0xff0af2 then addr24 _addr24_Free
Good: addr24 Free
Rule 4: Parenthesized direct addressing for consistency
All direct-address operands use parenthesized syntax: ldb_da a, (SomeLabel). This is enforced by the LLVM directaddr operand class and ensures formatting consistency across versions.
Rule 5: Use .set FW_VERSION_BYTE for version-specific constants
Version-specific scalar values should be defined as named constants in one place, not scattered as magic numbers.
Rule 6: Shared source files must be IDENTICAL
Any .s file that appears in multiple version trees must be byte-identical across versions. If a file differs, either:
- The difference is genuine (document it)
- Or the representation should be unified (fix it)
Files like positional_labels.s, macros.s, sfr_tmp94c241.s MUST be identical.
Rule 7: Data tables with embedded pointers need .long label not .byte raw bytes
When a data table contains 32-bit pointer fields mixed with non-pointer data, the pointer fields should be .long SymbolName and the non-pointer fields should be .byte. Never encode a pointer as raw .byte 0xNN, 0xNN, 0xNN, 0xNN.
Rule 8: No .incbin for data that has known structure
If the binary data is a table of pointers, a struct with named fields, or a dispatch table, it should be expressed as source-level directives (.long, .byte, .ascii), not .incbin of an extracted blob.
Rule 9: Minimize the number of version-specific files
Most .s files should be SHARED (identical) across versions. Version differences should be concentrated in:
- The entry point file (
kn5000_vN_program.s) - A small number of data files with genuine content differences
.set/.equconstants for version-specific scalar values
Rule 10: Never use code blobs between versions
If code shifted by N bytes, the correct representation is:
- Same
.includestructure in both versions - Symbolic labels that the linker resolves per-version
- Different
.fillpadding to absorb size differences
NOT: .incbin "v7_postshift_blob.bin" replacing 43 .include directives.
Scoring a Cross-Version Diff
| Criterion | Score | Description |
|---|---|---|
| No binary blobs for code | 0-10 | All code is disassembled instructions? |
| No binary blobs for structured data | 0-10 | All data tables use .long/.byte/.ascii? |
| Symbolic pointers | 0-10 | All .long pointer values use labels? |
| Symbolic branches | 0-10 | All call/jp/calr use labels? |
| Shared source files identical | 0-10 | Same .s files are byte-identical? |
| Diff tells the story | 0-10 | Every diff line reflects a genuine change? |
| Minimal diff size | 0-10 | Diff is as small as possible? |
| Version constants centralized | 0-10 | Version-specific values in one place? |
| No post-build hacks | 0-10 | No binary patching, overlay, or fixup scripts? |
| Consistent formatting | 0-10 | Tab indentation, hex case, parenthesization? |
Total: /100
See Also
- Firmware Changes: v9 vs v10 – Example of a fully minimized diff (44 lines)
- LLVM Semantic Instructions – Parenthesized operand syntax