TLCS-900/H Instruction Encoding Reference
TLCS-900/H Instruction Encoding Reference
This page documents the instruction encoding format of the Toshiba TLCS-900/H2 CPU (TMP94C241F) as used in the Technics KN5000. This reference was built through systematic reverse engineering of the KN5000 firmware ROMs and verification against both the MAME disassembler (unidasm) and our custom LLVM TLCS-900 backend.
Overview
TLCS-900/H instructions are variable-length (1–7 bytes) using a prefix-based encoding system. The first byte determines the instruction category, operand size, and addressing mode. Subsequent bytes encode the operation (sub-opcode), register operands, displacements, and immediates.
Register Encoding
All register classes share a consistent 3-bit encoding (0–7):
| Enc | 8-bit | 16-bit | 32-bit (GPR) | Address Reg | Q Reg (PrevBank) |
|---|---|---|---|---|---|
| 0 | W | WA | XWA | XWA | QWA |
| 1 | A | BC | XBC | XBC | QBC |
| 2 | B | DE | XDE | XDE | QDE |
| 3 | C | HL | XHL | XHL | QHL |
| 4 | D | IX | XIX | XIX | QIX |
| 5 | E | IY | XIY | XIY | QIY |
| 6 | H | IZ | XIZ | XIZ | QIZ |
| 7 | L | SP | XSP | XSP | QSP |
Note: SP (enc=7) is not a member of the GR8 or GR16 register classes. Instructions that specify a GR8/GR16 operand cannot use SP/L as the operand.
Instruction Format Categories
1. Compact 32-bit Immediate Loads (0x40–0x47)
5-byte instructions that load a 32-bit immediate into a GPR register.
Byte: [0x40+R] [imm_lo] [imm_b1] [imm_b2] [imm_hi]
R= register encoding (0–7)- Immediate is 32 bits, little-endian
Example: ld xbc, 0x01E0007F → 41 7F 00 E0 01
2. Register Source Prefix (0xC8–0xEF)
2-byte minimum instructions where the first byte encodes a source register and operand size, and the second byte is a sub-opcode that determines the operation and destination.
Byte: [prefix] [sub_opc]
| Prefix Range | Operand Size | Source Register |
|---|---|---|
| 0xC8–0xCF | 8-bit | R = prefix − 0xC8 |
| 0xD8–0xDF | 16-bit | R = prefix − 0xD8 |
| 0xE8–0xEF | 32-bit | R = prefix − 0xE8 |
Register-to-Register Sub-Opcode Table
The sub-opcode byte encodes both the operation and the destination register:
| Sub-Opc Range | Operation | Format | Direction |
|---|---|---|---|
| 0x04 | PUSH r | Unary (16-bit only) | — |
| 0x05 | POP r | Unary (16-bit only) | — |
| 0x06 | CPL r | Unary | — |
| 0x07 | NEG r | Unary | — |
| 0x12 | EXTZ r | Unary (16-bit only) | — |
| 0x13 | EXTS r | Unary (16-bit only) | — |
| 0x20+d | LD d, r | LD to register | d ← r |
| 0x28+d | LD r, d | LD from register (reverse) | r ← d |
| 0x40+d | MUL d, r | Multiply (16→32, uses GPR names) | — |
| 0x48+d | MULS d, r | Multiply signed | — |
| 0x50+d | DIV d, r | Divide (32/16, uses GPR names) | — |
| 0x58+d | DIVS d, r | Divide signed | — |
| 0x60+n | INC n, r | Increment by n (1–7) | — |
| 0x68+n | DEC n, r | Decrement by n (1–7) | — |
| 0x78+cc | SCC cc, r | Set if condition code | — |
| 0x80+d | ADD d, r | Add | d ← d + r |
| 0x88+d | LD d, r | Load (alternate encoding) | d ← r |
| 0x90+d | ADC d, r | Add with carry | d ← d + r + C |
| 0x98+d | LD r, d | Load reverse | r ← d |
| 0xA0+d | SUB d, r | Subtract | d ← d − r |
| 0xA8+n | LDS r, n | Load small immediate (0–7) | r ← n |
| 0xB0+d | SBC d, r | Subtract with borrow | d ← d − r − C |
| 0xC0+d | AND d, r | Bitwise AND | d ← d & r |
| 0xD0+d | XOR d, r | Bitwise XOR | d ← d ^ r |
| 0xD8+n | CPS r, n | Compare small immediate (0–7) | r − n |
| 0xE0+d | OR d, r | Bitwise OR | d ← d | r |
| 0xF0+d | CP d, r | Compare | d − r |
Where d = destination register encoding (3 bits), r = source register from prefix, n = small immediate (3 bits), cc = condition code (4 bits).
LD encoding note: Two encodings exist for register-to-register LD: sub-opcode 0x88+d and 0x20+d (for LD d, r), and 0x98+d and 0x28+d (for LD r, d). Both forms are semantically identical but produce different byte sequences. The LLVM assembler always uses the 0x88/0x20 forms.
Register Prefix + Immediate
When the sub-opcode indicates an immediate operand, additional bytes follow:
Byte: [prefix] [sub_opc] [imm_bytes...]
| Sub-Opc | Operation | Imm Size (8-bit/16-bit/32-bit prefix) |
|---|---|---|
| 0x03 | LD r, #imm | 1 / 2 / 4 bytes |
| 0xC8 | ADD r, #imm | 1 / 2 / 4 bytes |
| 0xC9 | ADC r, #imm | 1 / 2 / 4 bytes |
| 0xCA | SUB r, #imm | 1 / 2 / 4 bytes |
| 0xCB | SBC r, #imm | 1 / 2 / 4 bytes |
| 0xCC | AND r, #imm | 1 / 2 / 4 bytes |
| 0xCD | XOR r, #imm | 1 / 2 / 4 bytes |
| 0xCE | OR r, #imm | 1 / 2 / 4 bytes |
| 0xCF | CP r, #imm | 1 / 2 / 4 bytes |
Shift/rotate sub-opcodes (1-byte immediate count):
| Sub-Opc | Operation |
|---|---|
| 0xE8 | RLC count, r |
| 0xE9 | RRC count, r |
| 0xEA | RL count, r |
| 0xEB | RR count, r |
| 0xEC | SLA count, r |
| 0xED | SRA count, r |
| 0xEE | SLL count, r |
| 0xEF | SRL count, r |
BIT operations (16-bit prefix only, 1-byte bit number):
| Sub-Opc | Operation |
|---|---|
| 0x30 | RES bit, r |
| 0x31 | SET bit, r |
| 0x33 | BIT bit, r |
3. Compact Source Addressing Modes (0x80–0xAF)
These prefixes specify a memory source operand with the operand size and addressing mode encoded in the prefix byte:
| Prefix Range | Size | Addressing Mode | Additional Bytes |
|---|---|---|---|
| 0x80+R | 8-bit | (R) register indirect | sub_opc |
| 0x88+R | 8-bit | (R+d8) reg + displacement | d8, sub_opc |
| 0x90+R | 16-bit | (R) register indirect | sub_opc |
| 0x98+R | 16-bit | (R+d8) reg + displacement | d8, sub_opc |
| 0xA0+R | 32-bit | (R) register indirect | sub_opc |
| 0xA8+R | 32-bit | (R+d8) reg + displacement | d8, sub_opc |
| 0xB0+R | 32-bit | (R+d16) reg + 16-bit displacement | d16_lo, d16_hi, sub_opc |
Where R = address register encoding (0–7, mapped to XWA–XSP).
The sub-opcode table is the same as for register source prefix instructions (0x20+d = LD, 0x80+d = ADD, etc.), except the source is a memory location instead of a register.
4. Compact Destination Addressing Mode (0xB8–0xBF)
Encodes stores to memory and LDA (load effective address):
Byte: [0xB8+R] [d8] [sub_opc] [optional_imm...]
| Sub-Opc Range | Operation |
|---|---|
| 0x30+d | LDA d, (R+d8) — load effective address |
| 0x50+s | LD (R+d8), reg16 — store 16-bit register |
| 0x60+s | LD (R+d8), reg32 — store 32-bit register |
5. Extended Addressing Modes (0xC0–0xF7)
The first byte encodes both operand size and addressing mode:
| Prefix | Size | Mode | Addressing | Bytes After Prefix |
|---|---|---|---|---|
| 0xC0–0xC7 | 8-bit | 0–7 | See below | Varies |
| 0xD0–0xD7 | 16-bit | 0–7 | See below | Varies |
| 0xE0–0xE7 | 32-bit | 0–7 | See below | Varies |
| 0xF0–0xF7 | Store | 0–7 | See below | Varies |
Mode encoding (low 3 bits of prefix):
| Mode | Addressing | Data After Prefix |
|---|---|---|
| 0 | (R) register indirect | reg_byte, sub_opc |
| 1 | (R+d8) reg indirect + 8-bit disp | reg_byte, d8, sub_opc |
| 2 | (addr24) direct 24-bit address | addr_lo, addr_mid, addr_hi, sub_opc |
| 3 | (R+d16) reg indirect + 16-bit disp | reg_byte, d16_lo, d16_hi, sub_opc |
| 4 | (−R) predecrement | reg_byte, sub_opc |
| 5 | (R+) postincrement | reg_byte, sub_opc |
| 7 | Previous bank (D7 only) | mode_byte, sub_opc [, imm…] |
Register byte encoding (for modes 0, 1, 3, 4, 5):
reg_byte = 0xE0 + (register_enc × 4) + inner_mode
The inner_mode field provides additional addressing information.
6. Previous Register Bank (0xD7)
Accesses the previous register bank using Q registers (QWA–QSP):
Byte: 0xD7 [mode_byte] [sub_opc] [optional_imm...]
Mode byte encoding: 0xE0 + (reg_enc × 4) + 2
| Mode Byte | Q Register |
|---|---|
| 0xE2 | QWA |
| 0xE6 | QBC |
| 0xEA | QDE |
| 0xEE | QHL |
| 0xF2 | QIX |
| 0xF6 | QIY |
| 0xFA | QIZ |
| 0xFE | QSP |
Sub-opcode table same as register source prefix, plus additional formats for BIT/SET/RES, LD/CP with word immediate, and LDW/CPW.
LLVM Backend Support Status
As of February 2026, the following summarizes what the custom LLVM TLCS-900 backend supports for assembly (llvm-mc):
Fully Supported
| Category | Notes |
|---|---|
| Register-to-register ALU | ADD, SUB, ADC, SBC, AND, XOR, OR, CP |
| Register-to-register LD | Both 0x88 and 0x20 forms |
| Register prefix + immediate ALU | ADD/SUB/CP/AND/OR/XOR/ADC/SBC with 8/16/32-bit imm |
| Register prefix + immediate LD | 8-bit and 16-bit (32-bit uses compact form) |
| BIT/SET/RES with 16-bit register | Via register prefix |
| PUSH/POP (16-bit register) | Via register prefix |
| NEG/CPL (16-bit register) | Via register prefix |
| EXTS/EXTZ (16-bit register) | Via register prefix |
| INC/DEC with count (1–7) | Via register prefix |
| MUL/MULS/DIV/DIVS (reg-reg) | Uses GPR (32-bit) register names |
| SCC condition code set | 8-bit and 16-bit |
| Compact 32-bit immediate load | 0x40–0x47 |
| Compact (R) memory indirect | All sizes (8/16/32-bit) |
| Compact (R+d8) memory | All sizes, d8 must be 0–127 |
| LDA (load effective address) | d8 must be 0–127 |
| Memory store (R+d8) | reg16 and reg32, d8 must be 0–127 |
| Extended E2 direct memory load | 32-bit operand, 24-bit address |
| Extended F2 direct memory store | reg16 and reg32 stores |
| Previous bank (D7) operations | Full Q register support |
| LDS/LDS32/LDS8 small immediate | Register prefix form |
| CPS small immediate compare | All sizes |
Previously Unsupported (now all implemented)
As of March 2026, all instruction encodings needed for the KN5000 ROM disassembly have been implemented in the LLVM backend. The following were added during the .byte code elimination effort:
| Category | Resolution |
|---|---|
| (R+d16) 16-bit displacement | SRI prefix encoding (C3/D3/E3/F3) implemented |
| 16-bit direct memory | F0 8-bit direct and E2/F2 extended direct implemented |
| CALR (relative call) | Fixed for label-based targets |
| Shift/rotate operations | Full support for all variants |
| LD (addr), #imm16 via F2 | Sub-opcode fixed to 0x02 |
| Auto-increment addressing | Implemented |
| .word/.hword directives | Added for data emission |
Known Encoding Issues
-
Displacement is signed: The 8-bit displacement in
(R+d8)addressing modes is signed (range −128 to +127). This is confirmed by MAME’s TLCS-900 emulator ((int8_t)m_opcast in900tbl.hxx). The LLVM backend correctly handles both positive and negative displacements. Example:ld wa, (xsp-56)produces byte0xC8for the displacement (−56 in two’s complement). -
d8=0 optimization: When displacement is 0, LLVM optimizes
(R+0)to the shorter(R)form, producing different byte sequences than the firmware which uses explicit(R+0). -
LD immediate to memory sub-opcode: Previously LLVM used sub-opcode 0x00 for
LD (addr), #imm16but the hardware encoding uses 0x02. This has been fixed in the LLVM backend. -
32-bit LD immediate always compact:
LD XWA, #imm32always uses the compact 5-byte form (0x40+R) rather than the 6-byte prefix form (E8+R, 0x03, imm32). Cannot reproduce the prefix form.
Condition Codes
Used with SCC, JP, CALL, and other conditional instructions:
| Code | Value | Condition |
|---|---|---|
| F | 0 | False (never) |
| LT | 1 | Less than (signed) |
| LE | 2 | Less than or equal (signed) |
| ULE | 3 | Unsigned less than or equal |
| OV | 4 | Overflow |
| MI | 5 | Minus (negative) |
| Z | 6 | Zero |
| C | 7 | Carry |
| T | 8 | True (always) |
| GE | 9 | Greater than or equal (signed) |
| GT | 10 | Greater than (signed) |
| UGT | 11 | Unsigned greater than |
| NOV | 12 | No overflow |
| PL | 13 | Plus (positive) |
| NZ | 14 | Not zero |
| NC | 15 | No carry |