LLVM TLCS-900: Semantic Instruction Migration
LLVM TLCS-900 Backend: Semantic Instruction Migration
The LLVM TLCS-900 backend originally used 118 custom “wrapper” mnemonics encoding raw addressing mode bytes. This page tracks the ongoing work to replace them with proper semantic instructions.
Beads issues: kn5000-7ubb (Phase 2), kn5000-1hqd (Phase 3), kn5000-xcuk (Phase 4), kn5000-0vbs (Phase 5)
Why This Matters
Wrapper mnemonics like st_dri3b L, 0xfd, 0xb8, 0x01 are unreadable. The same instruction in standard TLCS-900 syntax is lda xsp, (xsp+440) — immediately clear that it’s deallocating 440 bytes of stack frame. Semantic mnemonics make the disassembly comprehensible and cross-version diffs meaningful.
Progress Summary
| Phase | Description | Instances | Status |
|---|---|---|---|
| Phase 1 | Mnemonic renames (81 mnemonics) | 53,603 | Complete |
| Phase 1b | Parenthesized direct addresses | 61,436 | Complete |
| Phase 2 | 24-bit addressing semantics | ~1,700 | In progress |
| Phase 3 | Extended register pair modes | ~3,300 | Planned |
| Phase 4 | SRI/DRI indirect modes | ~3,500 | Planned |
| Phase 5 | Miscellaneous | ~700 | Planned |
Completed Work
Phase 1: Mnemonic Renames (Complete — March 2026)
81 wrapper mnemonics renamed to semantic forms across 53,603 instruction instances:
| Old | New | Category |
|---|---|---|
push_sr |
push sr |
Status register push/pop |
pop_sr |
pop sr |
|
ld8_24 |
ldb_da |
Direct address loads |
st8_24 |
stb_da |
Direct address stores |
sti8_24 |
stib_da |
Direct address immediate stores |
ldto_berp |
stb_erp |
Extended register pair |
ldfr_berp |
ldb_erp |
|
st_dri3b |
stb_dri |
Displacement register indirect |
| … | … | (81 total) |
Phase 1b: Parenthesized Direct Address Syntax (Complete — March 2026)
Added directaddr operand class to LLVM backend. All 61,436 direct address operands across 155 .s files now use parenthesized syntax:
| Before | After |
|---|---|
ldb_da a, 0xe12345 |
ldb_da a, (0xe12345) |
stw_da 0xe12345, wa |
stw_da (0xe12345), wa |
cpw_da 0x3ef50, 0 |
cpw_da (0x3ef50), 0 |
incdi8_24 1, 0xcee5 |
incdi8_24 1, (0xcee5) |
bitda_24 3, 0xe12345 |
bitda_24 3, (0xe12345) |
Changes:
TLCS900InstrInfo.td— 167 instruction definitions updated to usedirectaddroperand classTLCS900AsmParser.cpp— AddedparseDirectAddrOperand()for(expr)syntaxTLCS900InstPrinter.cpp— AddedprintDirectAddr()to wrap output in parentheses- Both old (bare) and new (parenthesized) syntax accepted for backward compatibility
Remaining Phases
Phase 2: 24-bit Addressing Mode Semantics (~1,700 instances)
| Current | Semantic | Count | Status |
|---|---|---|---|
ld16_24 reg, addr |
ld reg, (addr24) |
645 | Pending |
ld32_24 reg, addr |
ld reg, (addr24) |
161 | Pending |
st16_24 addr, reg |
ld (addr24), reg |
252 | Pending |
st32_24 addr, reg |
ld (addr24), reg |
139 | Pending |
sti16_24 addr, imm |
ld (addr24), imm16 |
209 | Pending |
cpi8_24 addr, imm |
cp (addr24), imm8 |
63 | Pending |
cpdi16_24 addr, imm |
cp (addr24), imm16 |
120 | Pending |
Phase 3: Extended Register Pair Modes (~3,300 instances)
| Current | Semantic | Count | Status |
|---|---|---|---|
ldto_berp |
ld (erp+off), val |
1,251 | Pending |
ldfr_berp |
ld val, (erp+off) |
597 | Pending |
ldto_werp |
ld (erp+off), val |
459 | Pending |
ldfr_werp |
ld val, (erp+off) |
221 | Pending |
ldi_berp |
ld (erp+off), imm |
316 | Pending |
ldi_werp |
ld (erp+off), imm |
281 | Pending |
push_werp / pop_werp |
push (erp) / pop (erp) |
325 | Pending |
cpi_berp / cpi_werp |
cp (erp+off), imm |
260 | Pending |
inc1_berp / inc1_werp |
inc 1, (erp+off) |
281 | Pending |
cp_werp / cp_srib_im |
cp (erp), val |
184 | Pending |
Phase 4: SRI/DRI Indirect Modes (~3,500 instances)
| Current | Semantic | Count | Status |
|---|---|---|---|
st_dri3b/w/l |
ld (reg+d16), val |
2,105 | Pending |
ld_srib3 / ld_sriw3 |
ld val, (reg+d16) |
1,073 | Pending |
lda_dri3 |
lda reg, (reg+d16) |
396 | Pending |
lda_dpi |
lda reg, (reg+d16) |
164 | Pending |
ld_spib |
ld val, (xsp+d8) |
129 | Pending |
jp_dri |
jp (reg+d16) |
240 | Pending |
stib_dri / stib_dpi |
ld (reg+d16), imm |
326 | Pending |
st_dpiw / stiw_dri |
ld (reg+d16), imm16 |
120 | Pending |
bit_dri |
bit n, (reg+d16) |
68 | Pending |
Phase 5: Miscellaneous (~700 instances)
| Current | Semantic | Count | Status |
|---|---|---|---|
ld_srib / ld_sriw |
ld val, (reg) |
341 | Pending |
mrid2 |
Various | 48 | Pending |
ldada / ldda8 / stda8 |
ld with direct addressing |
~200 | Pending |
addm32_24 / addmi16 / etc. |
add (addr), imm |
~100 | Pending |
Architecture
The LLVM TLCS-900 backend lives at /mnt/shared/llvm-project/llvm/lib/Target/TLCS900/.
Key files:
TLCS900InstrFormats.td— 79 instruction format class definitionsTLCS900InstrInfo.td— ~5000 lines of instruction definitionsTLCS900BaseInfo.h— TSFlags bit-field definitionsMCTargetDesc/TLCS900MCCodeEmitter.cpp— 1500+ lines, manual encodingDisassembler/TLCS900Disassembler.cpp— 1000+ lines, manual decoding
Encoding strategy: The backend uses manual encoding via a giant switch(Format) in MCCodeEmitter::encodeInstruction(), NOT auto-generated TableGen encoding. Each of the 79 format classes has a dedicated switch case that emits bytes using TSFlags metadata.
TSFlags layout (32 bits):
[6:0] InstFormat — selects encoding strategy (79 values)
[14:7] Opcode — primary prefix/opcode byte
[16:15] OpSize — 0=8-bit, 1=16-bit, 2=32-bit
[17] AddrWidth — 0=16-bit addr, 1=24-bit addr
[20:18] RegIdx — block transfer register index
[28:21] SubOpcode — secondary operation byte
[31:29] NumPreOps — pre-SubOpcode operand count
Process for Each Phase
- Define new instruction in
.tdwith semantic mnemonic and proper operand types - Add encoding case in
MCCodeEmitter.cpp(or reuse existing format) - Add decoding case in
TLCS900Disassembler.cppto emit semantic mnemonic - Build LLVM:
ninja -C /mnt/shared/llvm-project/build llc llvm-mc - Update all
.sfiles in both v9 and v10 (Python script with binary I/O) - Rebuild ROMs: verify 100% byte-match
- Run LLVM tests:
build/bin/llvm-lit llvm/test/CodeGen/TLCS900/
See Also
- TLCS-900 Instruction Encoding — Hardware instruction format reference
- ROM Reconstruction — Disassembly progress
- Source Code Map — Guide to every source file