SubCPU Payload Loading
SubCPU Payload Loading Investigation
This page documents the investigation into SubCPU firmware payload loading in MAME emulation. The investigation uncovered multiple bugs in MAME’s TMP94C241 emulation that prevented DMA transfers and inter-CPU communication. All have been fixed, and the SubCPU payload now loads, executes, and communicates bidirectionally with the Main CPU.
Status: RESOLVED. As of 2026-02-17, the SubCPU payload transfer and inter-CPU communication are fully working. The “Sound Name Error” messages are gone. The display shows voice names (Piano, Bigband Brass, Modern E.P.1), rhythm patterns, mixer levels, and menu navigation works correctly. Button input is also functional. This required 12 fixes to MAME’s TMP94C241 emulation spanning DMA, interrupts, port I/O, serial communication, and interrupt timing. See Boot Sequence for the overall boot flow and Inter-CPU Protocol for latch communication details.
Background
After replacing the decompressed SubCPU payload ROMs with the compressed originals (LZSS SLIDE4K format at Custom Data Flash offset 0xE0000 = address 0x3E0000), the MAME driver originally showed “Sound Name Error” messages. This indicated the SubCPU never received its firmware payload, so sound commands failed.
The firmware’s SubCPU_Send_Payload function (address 0xEF068A) is responsible for:
- Sending 5 x 64KB of config data from Table Data ROM (
0x830000-0x870000) to SubCPU RAM - Decompressing the compressed payload from
0x3E0000to Main CPU RAM at0x050000 - Sending the decompressed data (preset data + entry point code) to SubCPU via E1 bulk transfers
- The last 256-byte E1 transfer writes to SubCPU address
0x000400-0x04FF, which includesSUBCPU_STATUS_FLAGSat0x04FE– when bit 6 of that byte is set by DMA, the boot ROM’s polling loop triggers payload execution
Boot Flow: Payload Transfer
The complete calling sequence during boot:
User_didnt_request_flash_mem_update (0xEF05E8)
│
├── SET 0, (PA) → Release Sub CPU from reset
├── SubCPU_Init_DMA_Channels → Initialize DMA for inter-CPU comm
├── EI 000h → Enable interrupts
├── SubCPU_Send_Payload → Transfer 192KB firmware payload
├── SubCPU_Payload_Verify → Verify checksums
├── ScreenGroup_Dispatch(0) → Display initial boot screen
└── ... (continues boot)
SubCPU_Send_Payload Detail (0xEF068A)
Source: maincpu/kn5000_v10_program.asm:134123
Phase 1: Transfer 5 x 64KB from Table Data ROM
0x830000 → SubCPU 0x050000 (64KB)
0x840000 → SubCPU 0x060000 (64KB)
0x850000 → SubCPU 0x070000 (64KB)
0x860000 → SubCPU 0x080000 (64KB)
0x870000 → SubCPU 0x090000 (64KB)
Phase 2: LZSS Decompression
Source: 0x3E0000 (Custom Data Flash)
Dest: 0x050000 (Main CPU RAM, temporary)
Calls SLIDE_Parse_Header (LZSS decompressor)
If decompression fails (HL=0xFFFF): falls back to Table Data ROM base
Phase 3: Transfer decompressed data to SubCPU
XIZ+0x100, 64KB → SubCPU 0x00F000
XIZ+0x10100, 64KB → SubCPU 0x01F000
XIZ+0x20100, 0xFF00 → SubCPU 0x02F000
XIZ, 0x100 (256B) → SubCPU 0x000400 (entry point!)
Confirmed Transfer Pattern (from MAME log analysis)
The following 9 E1 blocks were confirmed in the MAME log with 524,358 total HDMA transfers:
| Block | Source Address | SubCPU Dest | Size | Purpose |
|---|---|---|---|---|
| 1 | 0x830000 | 0x050000 | 64KB | Config data |
| 2 | 0x840000 | 0x060000 | 64KB | Config data |
| 3 | 0x850000 | 0x070000 | 64KB | Config data |
| 4 | 0x860000 | 0x080000 | 64KB | Config data |
| 5 | 0x870000 | 0x090000 | 64KB | Config data |
| 6 | 0x050100 | 0x00F000 | 64KB | Decompressed payload part 1 |
| 7 | 0x060100 | 0x01F000 | 64KB | Decompressed payload part 2 |
| 8 | 0x070100 | 0x02F000 | 65280B | Decompressed payload part 3 |
| 9 | 0x050000 | 0x000400 | 256B | Entry point code + status flags |
E1 Bulk Transfer Protocol
Source: maincpu/kn5000_v10_program.asm:139166
Each E1 transfer follows this protocol:
- Wait for
SSTAT1high (Sub CPU ready) - Clear
MSTAT0, set state to 2 (two-phase) - Write
0xE1to latch at0x140000→ triggers INT0 on Sub CPU - Wait for
SSTAT1low (Sub CPU acknowledged E1) - Set
MSTAT0, send 6-byte header (dest addr + byte count) viaAudio_DMA_Transfer - Wait for state transition, send actual data payload via
Audio_DMA_Transfer
The Main CPU sends data byte-by-byte in a tight software loop (Audio_DMA_Transfer at 0xEF3415). Each write to the latch triggers INT0 on the Sub CPU.
Sub CPU Reception (Boot ROM INT0 Handler)
Source: subcpu/boot/kn5000_subcpu_boot.asm:1159
The Sub CPU’s INT0 handler (InterCPU_RX_Handler at 0xFF881F) processes incoming bytes:
- First byte (command): DMAV0 not armed, so CPU INT0 handler fires
- Reads command byte from latch (
0x120000) - For E1: Sets up DMA channel 0 destination and count (6 bytes for header)
- Arms DMAV0 =
0x0A(DMA triggers on INT0 for subsequent bytes)
- Reads command byte from latch (
- Subsequent bytes: DMAV0 armed, so HDMA processes INT0
- Each latch write triggers INT0 → HDMA transfers one byte to destination
- DMA count decrements automatically
- DMA completion: Fires DMA completion interrupt
DMA_Complete_Handleradvances state machine (state 2→1→0)CMD_Dispatch_Handlersets up phase 2 DMA (actual data transfer)
Payload Execution Trigger
Critical finding: The Main CPU does NOT send a dedicated E3 command byte to trigger payload execution. Instead, the last 256-byte E1 transfer (block 9) writes to SubCPU addresses 0x000400-0x04FF. This range includes SUBCPU_STATUS_FLAGS at address 0x04FE.
When the HDMA transfer writes a byte with bit 6 set to address 0x04FE, the Sub CPU’s main loop polling detects this and jumps to the payload entry point:
; subcpu/boot/kn5000_subcpu_boot.asm:405
MAIN_LOOP:
res 6, (SUBCPU_STATUS_FLAGS) ; Clear ready flag
.wait_loop:
bit 6, (SUBCPU_STATUS_FLAGS) ; Check if payload ready
jr Z, .check_status
ei 6 ; Enable interrupt level 6
CALL_ABS24 PAYLOAD_ENTRY ; Call payload at 0x0400
Confirmed in MAME log: After HDMA writes to dst=0x0004FE, the SubCPU begins executing payload code at PC=0x01F929 (within the expected payload address range 0x00F000-0x03EE75).
The E3 Data Command (Separate from Payload Loading)
The 0xE3 byte in the boot ROM’s INT0 handler is a regular data command byte for the inter-CPU protocol, not related to payload loading:
0xE3=0b11100011= channel 7, count 4 bytes- This is used later during normal operation when the Main CPU sends audio data packets on channel 7
- The boot ROM’s E3 handler sets bit 6 of
SUBCPU_STATUS_FLAGS, which also triggers payload execution - However, this E3 path is NOT used during normal boot – the payload loading trigger comes from the DMA write to 0x04FE
Fixes Applied
Fix 1: LDC Control Register Mapping (Critical)
Root cause of HDMA failure: The TMP94C241 uses different control register (CR) numbers in the LDC instruction than the TMP96C141/TMP95C063. MAME’s shared TLCS900 instruction decoder only had the old numbering, causing LDC DMAD0, XWA and LDC DMAC0, WA to silently write to a dummy register. The DMA destination address and transfer count were never set!
TMP94C241 vs TMP96C141 CR numbers:
| Register | TMP96C141 CR | TMP94C241 CR | Size |
|---|---|---|---|
| DMAS0-3 | 0x00-0x0C | 0x00-0x0C | 32-bit (same) |
| DMAD0-3 | 0x10-0x1C | 0x20-0x2C | 32-bit (different!) |
| DMAC0-3 | 0x20-0x2C | 0x40-0x4C | 16-bit (different!) |
| DMAM0-3 | 0x22-0x2E | 0x42-0x4E | 8-bit (different!) |
Fix: Added TMP94C241 CR numbers as additional cases in the shared LDC instruction decoder (900tbl.hxx). Since the numbers don’t overlap within each register size class, this doesn’t affect other TLCS900 variants. Also updated the disassembler (dasm900.cpp) to recognize the new CR numbers.
Files modified:
mame_driver/src/devices/cpu/tlcs900/900tbl.hxx– Added CR cases for p_CR8, p_CR16, p_CR32 (both p1 and p2 operands)mame_driver/src/devices/cpu/tlcs900/dasm900.cpp– Added CR labels for O_CR8, O_CR16, O_CR32
Fix 2: DMAM Register Encoding
Second root cause: The DMAM (DMA Mode) register encoding in tlcs900_process_hdma() was wrong. The implementation used independent source/destination direction bits, but the actual TMP94C241 (like TMP95C061) uses a combined mode encoding:
| DMAM bits 4-0 | Source | Destination | Size |
|---|---|---|---|
| 0x00 | Fixed | Increment | Byte |
| 0x01 | Fixed | Increment | Word |
| 0x02 | Fixed | Increment | Long |
| 0x04 | Fixed | Decrement | Byte |
| 0x08 | Increment | Fixed | Byte |
| 0x10 | Fixed | Fixed | Byte |
Impact: The Sub CPU boot ROM sets DMAM0 = 0 during INIT_DMA_SERIAL, which means “byte transfer, source fixed (latch), destination incrementing (buffer)”. With the old decoding, DMAM=0 was interpreted as “both fixed”, causing all received bytes to overwrite the same address.
Fix: Replaced the generic bit-field decoding with a switch-based implementation matching the proven TMP95C061 code in MAME.
Fix 3: DMAR Register Implementation (0x109)
The DMAR register at SFR address 0x109 was missing from the MAME tmp94c241 device. This register provides software-triggered DMA: writing bit N triggers an immediate DMA transfer on channel N.
Fix: Added dmar_w() handler at address 0x109 in the SFR address map. When a bit is set, it calls tlcs900_process_hdma() for the corresponding channel, performing an immediate DMA transfer.
This register is used by the Main CPU’s INT0 handler (LD (DMAR), 001h at 0xEF352D) to trigger DMA channel 0 when receiving data from the Sub CPU.
Fix 4: HDMA Priority Over IRQs (Critical)
Third root cause of HDMA failure: In MAME’s execute_run() loop, check_irqs() ran BEFORE check_hdma(). When INT0 fired, check_irqs() would dispatch to the INT0 interrupt handler and clear the interrupt flag. By the time check_hdma() ran, the flag was gone – HDMA never fired even though DMAV0 was correctly set to 0x0A.
On real TMP94C241 hardware, when a DMA channel’s DMAV matches an active interrupt source, the DMA transfer takes priority – the interrupt is consumed by HDMA instead of being dispatched to the CPU’s interrupt handler.
Fix: Added HDMA priority check in check_irqs(). Before processing each interrupt, it checks if any active HDMA channel targets that interrupt source (via DMAV matching). If so, the interrupt is skipped in check_irqs(), allowing check_hdma() to process it as a DMA trigger instead.
// In check_irqs(), before processing an interrupt:
bool hdma_targeted = false;
for (int ch = 0; ch < 4; ch++)
{
if (m_dma_vector[ch] == tmp94c241_irq_vector_map[i].dma_start_vector)
{
hdma_targeted = true;
break;
}
}
if (hdma_targeted)
continue; // Let check_hdma() handle this instead
Fix 5: CPU Scheduling (perfect_quantum)
CPU scheduling issue: MAME runs CPUs in time slices. The Main CPU would write many bytes to the latch in a single time slice before the Sub CPU had a chance to process any of them, causing “latch written before being read” overflows.
Fix: Added machine().scheduler().perfect_quantum(attotime::from_usec(100)) to subcpu_latch_w(). This temporarily forces minimum quantum for tight CPU interleaving, ensuring the Sub CPU’s HDMA can process each byte before the next one arrives.
After this fix, “latch written before being read” warnings dropped from 52,815 to just 36.
Diagnostic Logging
Added comprehensive logging to trace the inter-CPU communication:
- Latch writes: Command bytes (E1/E2/E3) logged with PC address; data bytes logged at verbose level
- Handshake signals: MSTAT/SSTAT changes logged with old/new values
- Sub CPU reset: Port A bit 0 changes logged
- DMA vectors: DMAV register writes logged with channel and value
- DMA transfers: Each HDMA transfer logged with source, destination, count, and mode
- DMA completion: Logged when HDMA transfer count reaches zero
Logging is controlled by VERBOSE flags in kn5000.cpp and uses MAME’s logmacro.h system.
Investigation Log
Test Run 1: After DMAR fix + diagnostic logging
- E1 handshake works: SubCPU receives E1, arms DMA0V=0x0A, acknowledges
- Zero HDMA transfers: No “HDMA ch0 complete” messages in 81,969-line log
- “latch written before being read” warnings: Main CPU bytes overwrite each other
- Root cause identified: LDC instructions for DMA registers silently fail due to wrong CR mapping
Test Run 2: After LDC CR mapping fix + DMAM encoding fix
- HDMA still showed zero transfers
- Root cause identified:
check_irqs()consuming INT0 flag beforecheck_hdma()could use it
Test Run 3: After HDMA priority fix + perfect_quantum
- 524,358 HDMA transfers (up from zero!)
- Only 36 “latch written before being read” warnings (down from 52,815)
- 17 HDMA completions showing correct transfer pattern (9 blocks x ~2 completions each for header + data)
- All 9 E1 blocks transferred successfully
- SubCPU payload executes! PC values 0x01F929, 0x034C6F observed after payload load
- SubCPU initializes serial ports, configures interrupts from within payload code
- “Checking device” LED stops blinking (SubCPU left boot ROM polling loop)
Test Run 4: After DMAR burst DMA fix + INT0 level-detect re-assertion (545K line log)
- 0 INT0 dispatches – the level-detect re-assertion was not yet active in this test
- 524,374 HDMA transfers (payload loading phase works)
- SubCPU goes silent after tone generator init – no INT0 interrupts fire post-boot
- 36 latch overflow warnings
- “Sound Name Error” persists
- Theory confirmed: Without level-detect re-assertion, INT0 fires once (consumed by HDMA during boot) but never again after the payload sets DMA0V=0 and expects ISR-based INT0 handling
Test Run 5: After INT0 level-detect re-assertion fix (3M line log)
- 394,118 INT0 dispatches (up from 0!) – level-detect re-assertion IS working
- 524,374 INT0 ASSERT events
- INT0 storm detected: 91,391 INT0 dispatches from PC=
0x01FFF0(the NOP betweenEI 0andEI 6) - Only 5 latch overflow warnings (down from 36)
- Last HDMA ch0 complete at dst=
0x0010F1(payload’s own HDMA setup) - After that, INT0 storm begins and SubCPU is trapped in infinite ISR dispatch loop
- Blinking patterns of the SubCPU checking device slightly different vs previous test
- “Sound Name Error” still present
- Root cause identified: Missing interrupt shadow after RETI allows INT0 to re-dispatch before the return-address instruction executes, preventing the
EI 0; NOP; EI 6masking pattern from working. See Fix 8.
Test Run 6: After EI/RETI interrupt shadow fix (14.8M line log)
- EI/RETI shadow fix WORKED: 0 INT0 dispatches from PC=
0x01FFF0(old storm at boot ROM window eliminated) - NEW INT0 storm emerged: 12,203,435 INT0 dispatches (82% of all log lines), mostly from DSP_SEND_DATA loop (PCs
0x036828-0x03685E) - 524,353 INT0 ASSERT / 524,354 CLEAR events → ~23 dispatches per ASSERT (level-detect amplification)
- 22,719 INT0 dispatches from PC=
0x01FFF1(EI 6 address) — these are the correctly-working EI windows - INTTC0 = 18 (payload DMA completions), INTTC2 = 0 (SubCPU never sends data back)
- 14 latch overflow warnings (in two clusters: post-boot + E2 command attempt)
- Only 38 INTA + 82 INTRX1 + 157 INTTX1 on MainCPU (very little MainCPU activity)
- Payload loading succeeded (all 9 E1 blocks, 524K HDMA transfers)
- SubCPU payload booted (SSTAT 0→3 at PC=
0x01F986) - SubCPU stuck in
DSP_Send_Datatimeout loop — DSP not emulated,DSP_Read_Statusreturns 0
Root cause chain identified:
DSP_Read_Statusreads Port PH bit 0, which returns 0 (DSP “not ready”) becauseport_r()ignores the direction register- PH.0 is configured as OUTPUT (PHCR=0x07), so reading should return the output latch (1, set by
SET 0, (PH)) - The SubCPU’s DSP init has ~40
DSP_Send_Datacalls, each with an 8000-iteration timeout loop - During timeout loops, IFF=0 (EI 0 in the function), so INT0 fires on every instruction
- The INT0 ISR checks MSTAT0=1 (MainCPU in data phase), exits without reading the latch
- Level-detect re-assertion causes INT0 to fire 23x per assertion, creating massive overhead
- The MainCPU’s E2 command handshake check passes spuriously (SSTAT1 already 0 from a previous transfer, not from SubCPU acknowledgment)
- MainCPU sets MSTAT0=1 and starts sending data before SubCPU reads the command byte
- All data bytes overflow the latch (written before being read)
Test Run 7: After port_r direction-aware fix (2.7M line partial log — interrupted early)
- Fix 9 applied:
port_r()now returns(latch & dir) | (external & ~dir)— output bits from latch, input bits from external callback - INT0 dispatches: 113,017 (down from 12.2M — 99% reduction)
- Latch overflow: 0 (down from 14)
- INTTC0 dispatches: 0 (DMA ch0 completion never fired — transfer in progress or not yet started)
- INTTC2 dispatches: 0 (SubCPU never sent data back — still initializing)
- MSTAT values changed: 2↔3 (previously 0↔1 — CORRECT, see analysis below)
- Top INT0 PCs: 0x034C6F (7460x = DSP_System_Init memory clearing loop), 0xFFFEC0-D1 (~1023x each = boot ROM debug utilities)
- Log was interrupted before GUI loaded because file was growing too fast
Analysis of MSTAT 2↔3 behavior: The port_r fix affects BOTH Port D (PDCR=0x63: bits 0-1=output) AND Port Z (PZCR=0x03: bits 0-1=output). Previously, SET/RES operations on these ports would lose other output bits during read-modify-write because port_r returned all-external-callback. Now output bits correctly preserve from the latch:
- Old:
SET 0, (PZ)+SET 1, (PZ)→ MSTAT = 2 (bit 0 lost because SET 1 read PZ with bit 0 = 0) - New:
SET 0, (PZ)+SET 1, (PZ)→ MSTAT = 3 (bit 0 preserved from latch) - This is correct real hardware behavior — PZCR=0x03 is set during boot init (
LD (PZCR), 003hinshared/boot_hw_init.asm)
Key finding: SubCPU still initializing. The 7460 INT0 hits at 0x034C6F exactly match the DSP_System_Init_Clear2_Loop iteration count (clearing a 7,462-byte DSP state buffer). The boot ROM debug routine hits (~1023x at SUB_FEC1) are the payload calling boot ROM utility functions for diagnostic output. Both are normal initialization activity — the SubCPU hasn’t reached the main event loop yet.
Test Run 8: After logging reduction (Pending)
- Reduced INT0 dispatch logging to every 1000th occurrence
- Reduced HDMA per-transfer logging to every 256th transfer
- Expected: log small enough for full boot completion, allowing us to determine if Sound Name Error persists
Fixes Applied (continued)
Fix 6: DMAR Software-Triggered Burst DMA
Analysis of the SubCPU payload’s main loop revealed that InterCPU_DMA_Send_Chunk (at 0x020CF3) uses the DMAR register to trigger DMA channel 2 for sending data back to the Main CPU. After triggering, it waits in DMA_Chunk_Wait for DMA_XFER_STATE to be cleared by the MICRODMA_CH2_HANDLER interrupt (INTTC2 at vector 0x9C).
Bug: The previous dmar_w() implementation called process_hdma(), which requires a matching interrupt flag to be pending. Software-triggered DMA via DMAR should bypass the interrupt check and perform a burst transfer (all DMAC units at once), then fire INTTC on completion.
Fix: Replaced dmar_w() to call a new tlcs900_process_software_dma() function that transfers the entire block without checking interrupt flags and fires the INTTC completion interrupt when done.
Without this fix, every attempt by the SubCPU to send data back to the Main CPU would silently fail, and the DMA_Chunk_Wait loop would hang forever.
Fix 7: INT0 Level-Detect Re-assertion
Root cause: On real TMP94C241 hardware, when IIMC bit 1 = 0 (level-detect mode for INT0), the interrupt flag in INTE0AD is continuously driven by the external input level. MAME’s check_irqs() clears the flag when dispatching the interrupt, but on real hardware the flag immediately re-asserts if the input pin is still active.
During boot, this doesn’t matter because HDMA consumes INT0 (the flag is cleared but HDMA handles the data transfer). After boot, when the SubCPU payload takes over and configures DMA0V=0 (disabling HDMA for INT0), the ISR handles INT0 directly. Without level-detect re-assertion, INT0 fires once but never again – the SubCPU stops receiving commands from the Main CPU.
Fix: Added re-assertion logic in check_irqs(): after dispatching an INT0 interrupt and clearing its flag, if INT0 is in level-detect mode and the input is still ASSERT_LINE, immediately re-set the INTE0AD flag and schedule another check_irqs. An HDMA guard prevents re-assertion when any DMA channel is configured for INT0’s start vector (0x0A), since HDMA manages its own flag lifecycle.
// In check_irqs(), after clearing INT0's flag:
if (tmp94c241_irq_vector_map[irq].reg == INTE0AD &&
tmp94c241_irq_vector_map[irq].iff == 0x08 &&
!(m_iimc & 0x02) &&
m_level[TLCS900_INT0] == ASSERT_LINE)
{
// Skip re-assertion when HDMA is configured for INT0
bool hdma_steals_int0 = false;
for (int ch = 0; ch < 4; ch++)
if (m_dma_vector[ch] == 0x0a)
hdma_steals_int0 = true;
if (!hdma_steals_int0)
{
m_int_reg[INTE0AD] |= 0x08;
m_check_irqs = 1;
}
}
Note: This re-assertion relies on m_level[INT0] being accurate. See Fix 12 for the stale m_level problem this creates and the solution.
Fix 8: EI/RETI Interrupt Shadow
Root cause of INT0 storm: Fix 7 exposed a second bug. The SubCPU payload uses a deliberate EI 0; NOP; EI 6 pattern at address 0x01FFEE-0x01FFF1 to create a one-instruction interrupt window:
EI 0 ; 01FFEE - Enable all interrupts (IFF=0)
NOP ; 01FFF0 - One-instruction window for pending interrupts
EI 6 ; 01FFF1 - Re-mask low-priority interrupts (IFF=6)
On real TLCS-900 hardware, both EI and RETI defer interrupt acceptance until after the next instruction executes (a “1-instruction interrupt shadow”). This means:
EI 0enables interrupts, but the CPU executes NOP before accepting any- INT0 dispatches from
0x01FFF1(the address ofEI 6) - ISR handles INT0, returns via RETI
- RETI has its own shadow –
EI 6executes before another INT0 can fire - IFF is now 6, masking INT0 (priority 1) – storm prevented
In MAME, check_irqs() runs at the START of the execution loop, BEFORE the instruction executes. After RETI restores IFF=0, check_irqs fires before the return-address instruction (NOP) gets a chance to execute. The ISR checks MSTAT0=1, exits without reading the latch, RETI returns to 01FFF0, and INT0 re-dispatches immediately – an infinite storm of 91,391+ INT0 dispatches in the log.
Fix: Added m_irq_inhibit flag to the TLCS900 base class. Both op_EI() and op_RETI() set this flag. In execute_run(), when m_irq_inhibit is set, interrupt checking is deferred by one instruction:
// In execute_run():
if ( m_check_irqs )
{
if ( m_irq_inhibit )
{
// Interrupt shadow: defer until after next instruction
m_irq_inhibit = false;
}
else
{
tlcs900_check_irqs();
m_check_irqs = 0;
}
}
Files modified:
mame_driver/src/devices/cpu/tlcs900/tlcs900.h– Addedbool m_irq_inhibitmembermame_driver/src/devices/cpu/tlcs900/tlcs900.cpp– Shadow logic inexecute_run(), init indevice_start()/device_reset()mame_driver/src/devices/cpu/tlcs900/900tbl.hxx– Setm_irq_inhibit = trueinop_EI()andop_RETI()
Fix 9: Port Read Direction Awareness
Root cause of DSP timeout loops: The port_r() function in tmp94c241.cpp always returns the external callback value, ignoring the port direction register (PXCR). On real TMP94C241 hardware, reading a port returns:
- Output latch value for bits configured as output (PXCR bit = 1)
- External pin level for bits configured as input (PXCR bit = 0)
The SubCPU firmware configures Port PH bits 0-2 as output (LD (PHCR), 007h). The DSP_Read_Status function at 0x0383F7 does SET 0, (PH) then reads back PH.0 to check DSP ready status. Since PH.0 is output, the read should return 1 (what was just written). But MAME’s port_r() called the external callback (unconnected for Port PH), returning 0.
This caused all DSP_Send_Data and DSP_Send_Command calls to enter their timeout loops (0x1F40 = 8000 iterations each), severely delaying SubCPU initialization. During these long timeouts, INT0 level-detect re-assertion created a massive interrupt storm (12M+ dispatches).
Fix: Made port_r() direction-aware:
uint8_t dir = m_port_control[P];
uint8_t external = m_port_read[P](0);
return (m_port_latch[P] & dir) | (external & ~dir);
This fix improves ALL port reads:
- Port PH (SubCPU): DSP status reads back output latch → DSP init completes instantly
- Port D (SubCPU): SSTAT output bits read back from latch; MSTAT input bits from callback
- Port Z (MainCPU): MSTAT output bits read back from latch; SSTAT input bits from callback
Files modified:
mame_driver/src/devices/cpu/tlcs900/tmp94c241.cpp–port_r()direction-aware implementation
SubCPU Initialization Analysis
Detailed analysis of the SubCPU payload init sequence revealed that initialization does NOT hang – all init-phase loops are bounded by timeouts or complete harmlessly with unmapped hardware returning 0:
| Loop | Location | Reads From | Bounded? | Behavior with Unmapped HW |
|---|---|---|---|---|
DSP_Send_Command wait |
0x036331 | Port PH bit 0 | Yes (8,000 iter) | PH output latch reads back 1 – exits immediately |
DSP_Send_Data wait |
0x0367EE | Port PH bit 0 | Yes (8,000 iter) | Same as above |
ToneGen_Poll_Delay |
0x03D227 | Nothing (pure delay) | Yes (10,000 iter) | Runs 160,000 iterations total, completes |
ToneGen_Poll_Read |
0x03D239 | 0x110002 / 0x110000 | Single read | Returns 0, processes 16 fake note-off events |
After init, the SubCPU enters the main event loop (Audio_Main_Loop), which runs continuously. The main loop does NOT hang because:
ToneGen_Process_Notesreads 0 from unmapped 0x110002 (no notes available)- Ring buffers are empty (no serial data, no queued commands)
- No DMA transfers are triggered on the first iterations
Fix 10: DMAR Single-Unit Transfer (Critical — Final Fix)
Root cause of corrupted DMA destinations and the “Sound Name Error”: The main CPU’s INT0 handler writes LD (DMAR), 001h once per INT0 to transfer a single byte from the inter-CPU latch. Our software DMA implementation burst-transferred ALL remaining bytes at once, reading the same latch value repeatedly.
The E1 protocol header is 6 bytes (4-byte destination + 2-byte count). With burst DMA, the first INT0 would read the same latch byte 6 times, filling the header with garbage (e.g., 0x64646464). The INTTC0 handler then used this garbage as the DMA destination for the bulk transfer, writing all subsequent data to unmapped memory.
Fix: Changed tlcs900_process_software_dma() to transfer ONE unit per DMAR write (matching HDMA behavior), with INTTC fired only when the count reaches zero. Each INT0 delivers one byte, DMAR transfers one byte, and the count decrements by one.
Fix 11: SubCPU Peripheral Stubs
The SubCPU payload’s main loop polls several unmapped hardware devices, generating millions of MAME “unmapped memory” warnings that flooded the log (500MB+) and severely degraded emulation speed.
Fix: Added noprw stubs in the SubCPU memory map:
| Address | Device | Chip |
|---|---|---|
| 0x100000-0x100003 | DAC interface | Audio DAC |
| 0x110000-0x110003 | Tone generator | IC303 (TC183C230002) |
| 0x130000-0x130003 | DSP1 registers | IC311 |
| 0x1E0000-0x1EFFFF | Waveform/sample RAM |
Fix 12: INT0 Stale m_level Fix (clear_int0_level)
Root cause of boot timeout regression: Fix 7’s re-assertion logic checks m_level[INT0] == ASSERT_LINE to decide whether to re-assert the interrupt flag. However, set_input_line(CLEAR_LINE) in MAME goes through synchronize() (see diexec.cpp:663-691), which defers the actual execute_set_input() call until after the current timeslice ends. When the ISR reads the latch, generic_latch::read() calls set_input_line(INT0, CLEAR), but m_level stays ASSERT_LINE until the deferred callback runs.
The re-assertion code in check_irqs() runs within the same timeslice and sees stale m_level, causing ~20 spurious INT0 firings per latch read. Each spurious ISR reads garbage from the latch, corrupting the command protocol. During boot, this caused the SubCPU to misinterpret the E2 command header and hang waiting for bytes that never arrive.
Failed first approach: Removing the re-assertion code entirely (reverting Fix 7) fixed boot but broke button input — both CPUs need re-assertion for processing responses during normal operation.
Fix: Added clear_int0_level() public method to tmp94c241_device that synchronously clears m_level[TLCS900_INT0] and the INTE0AD interrupt flag. The driver’s latch-read wrappers call this immediately after generic_latch::read():
uint8_t kn5000_state::subcpu_latch_r()
{
uint8_t val = m_subcpu_latch->read();
m_subcpu->clear_int0_level(); // Bypass deferred synchronize()
return val;
}
This is safe because: (1) same-CPU context — modifying the CPU’s own state during its own memory read; (2) idempotent — the deferred callback becomes a no-op since m_level is already CLEAR; (3) re-assertion preserved — check_irqs() now checks the correct m_level; (4) new ASSERT works normally through the deferred path.
Files modified:
src/devices/cpu/tlcs900/tmp94c241.h—clear_int0_level()declarationsrc/devices/cpu/tlcs900/tmp94c241.cpp—clear_int0_level()implementationsrc/mame/matsushita/kn5000.cpp— Call fromsubcpu_latch_r()andmaincpu_latch_r()
“Sound Name Error” Root Cause (Resolved)
The “Sound Name Error” was triggered by the Main CPU’s MainGetSoundName() function (at 0xF98D3E). It sends a sound name request to the SubCPU via the inter-CPU latch, then waits for a 32-byte response with a timeout of ~60,000 iterations. When the SubCPU never responds, the timeout fires and displays the error.
All contributing factors have been fixed:
- LDC CR mapping (Fix 1): DMA config registers weren’t being set due to wrong control register numbers
- DMAM encoding (Fix 2): DMA mode bits were decoded incorrectly
- DMAR register (Fix 3): Software DMA trigger wasn’t mapped at all
- HDMA priority (Fix 4): DMA wasn’t firing — IRQ handler consumed the interrupt first
- CPU scheduling (Fix 5): SubCPU wasn’t getting enough cycles
- DMAR burst DMA (Fix 6): SubCPU response DMA silently failed without burst support
- INT0 level-detect (Fix 7): After boot, INT0 fired once and never again
- EI/RETI interrupt shadow (Fix 8): INT0 storm trapped SubCPU in infinite ISR loop
- Port read direction (Fix 9): Output bits read as 0, breaking DSP status and MSTAT/SSTAT handshake
- DMAR single-unit (Fix 10): Main CPU received garbage headers due to burst-reading same latch byte
- Peripheral stubs (Fix 11): Unmapped hardware warnings flooded the log
- INT0 stale m_level (Fix 12): Fix 7’s re-assertion checked stale m_level due to deferred synchronize(), causing ~20 spurious ISR firings per latch read
As of 2026-02-17, the “Sound Name Error” is GONE and boot completes without timeout. The display shows voice names, rhythm patterns, mixer levels, and menu navigation works correctly. Button input is also functional (both CPUs’ INT0 re-assertion paths work correctly).
Debugging Inner Thoughts
This section captures the detailed reasoning process behind each investigation step, preserving the “how we got there” alongside the results.
Reasoning: INT0 Level-Detect Re-assertion (Fix 7)
Starting observation: After applying DMAR burst DMA fix, log showed 0 INT0 dispatches post-boot. The payload loaded fine (524K HDMA transfers) but then the SubCPU went completely silent.
Key insight chain:
- During boot, INT0 is consumed by HDMA (DMA0V=0x0A). After boot, the payload calls
InterCPU_Latch_Setupwhich sets DMA0V=0x0A for receive and DMA2V for transmit – but later the main loop’s INT0 handler at0x01F929processes INT0 via ISR, NOT HDMA. When does DMA0V become 0? - Searched the payload code:
InterCPU_Latch_Setup(line 10964) setsLDC DMA0V, 000Ah– armed for HDMA. But the INT0 handler at line 11247 checks MSTAT0 and if conditions are right, reads the latch byte directly. This means DMA0V stays armed for HDMA throughout, and the HDMA path handles most bytes, but the ISR handles command bytes. - Wait – if HDMA handles INT0,
check_hdmaprocesses it and clears the flag. Butcheck_irqsre-assertion only fires when check_irqs dispatches INT0 to the ISR. In HDMA mode,check_hdmaclears the flag and… does the level-detect re-assertion fire? - Root cause found:
check_hdmacallsprocess_hdmawhich clears the INT0 flag at line 1106. But there’s no re-assertion logic incheck_hdma– only incheck_irqs. Since the latch is still pending (ASSERT_LINE), the flag should re-assert. Butexecute_set_input’supdate_int_reglambda only updates on level CHANGE (if (level != m_level[input])). Since the latch read triggers synchronous CLEAR followed by deferred ASSERT viasynchronize(), the level does toggle – but the ASSERT arrives as a callback AFTER the current instruction. - The real issue is simpler: when
check_irqsdispatches INT0, it clears the flag. On real hardware, level-detect means the flag stays set as long as the input is asserted. We need to re-assert immediately after clearing.
Reasoning: EI/RETI Interrupt Shadow (Fix 8)
Starting observation: After Fix 7, log exploded to 3M lines with 394K INT0 dispatches. 91K of them were from PC=0x01FFF0 – an INT0 storm.
Key insight chain:
- What’s at
0x01FFF0? It’s a NOP instruction, sandwiched betweenEI 0(01FFEE) andEI 6(01FFF1). This is a deliberate pattern: enable all interrupts for exactly one instruction, then re-mask. - Why does INT0 keep dispatching from
0x01FFF0instead of letting NOP execute and reaching EI 6? - Checked
execute_run()flow:check_irqsruns at the START of the loop, BEFORE instruction execution. After RETI restores SR (with IFF=0), the next loop iteration runscheck_irqsBEFORE executing the instruction at the return address. - So the flow is:
RETI→ returns to0x01FFF0→ loop starts →check_irqsfires (IFF=0, INT0 pending) → dispatches INT0 from0x01FFF0→ NOP never executes! - The ISR at this point checks MSTAT0 (PD bit 2). If MSTAT0=1, it exits without reading the latch (because the Main CPU is in the middle of a transfer phase). The latch stays pending, INT0 re-asserts (Fix 7), RETI returns to
0x01FFF0, and the cycle repeats. - On real TLCS-900 hardware: After RETI (and after EI), the CPU executes at least one instruction before accepting another interrupt. This is the “interrupt shadow” – identical to the Z80’s behavior after EI. The NOP absorbs one ISR call, then EI 6 at
0x01FFF1raises IFF to 6, masking INT0 (priority 1). - Checked
op_RETIin900tbl.hxx: it setsm_prefetch_clear = trueandm_check_irqs = 1but has NO interrupt shadow mechanism. - Fix: Add
m_irq_inhibitflag, set by EI and RETI, that causesexecute_runto skip onecheck_irqscall.
Reasoning: Port Read Direction & DSP Timeout (Fix 9)
Starting observation: Test Run 6 showed the EI/RETI shadow fix worked (0 dispatches from PC=01FFF0), but a NEW INT0 storm emerged from DSP_SEND_DATA loop PCs. 12.2M INT0 dispatches with only 524K ASSERTs = 23x amplification.
Key insight chain:
- The storm PCs (0x036828-0x03685E) are in
DSP_Send_Data_WaitLoopandDSP_Send_Data_Poll— a bounded timeout loop (8000 iterations) that polls DSP readiness viaDSP_Read_Status. DSP_Read_Statusat 0x0383F7 does:SET 0, (PH)→LDCF 0, (PH)→ return carry. It sets PH.0 high, then reads it back.- The SubCPU firmware sets
PHCR = 0x07(line 9318) — Port PH bits 0-2 are OUTPUT. - On real hardware, reading an output bit returns the output latch. Since
SET 0, (PH)just wrote 1, the read should return 1 (DSP “ready”). - But MAME’s
port_r()callsm_port_read[P](0)— the external callback. Port PH has no callback → returns 0. - So
DSP_Read_Statusalways returns 0 (not ready), causing 8000-iteration timeouts. DSP_Send_DatahasEI 0/EI 6windowing — IFF=0 during the timeout loop body.- With level-detect re-assertion, every pending latch byte causes INT0 to fire ~23 times per assertion (ISR checks MSTAT0=1, exits without reading, INT0 re-fires).
- The E2 handshake protocol race: MainCPU checks SSTAT1=0 (already low from a previous handshake), proceeds to set MSTAT0=1 before SubCPU reads the command.
The fix is fundamental: make port_r() respect the direction register, combining output latch for output bits and external callback for input bits. This is correct for ALL ports, not just PH.
Reasoning: MSTAT 2↔3 and PZCR Configuration (Test Run 7 Analysis)
Starting observation: Test Run 7 showed MSTAT values 2↔3 instead of the previous 0↔1. Is the handshake broken?
Key insight chain:
- Checked
COM_SELECTioport default: 0xE0 = bits 0-1 are ZERO. com_select doesn’t bleed into MSTAT. - Searched MainCPU firmware for MSTAT usage:
SET 0, (PZ)/RES 0, (PZ)for MSTAT0 ANDSET 1, (PZ)/RES 1, (PZ)for MSTAT1. Both MSTAT bits are actively used in the protocol. - Critical discovery:
LD (PZCR), 003hexists inshared/boot_hw_init.asm— the MainCPU configures PZ bits 0-1 as OUTPUT during boot. Similarly,LD (PDCR), 063hconfigures PD bits 0-1 as OUTPUT on SubCPU. - With port_r fix + PZCR=0x03:
SET 0, (PZ)reads PZ → bits 0-1 from latch (preserving MSTAT), bits 2-7 from callback. This is CORRECT — on real hardware output bits read back from the latch. - Without port_r fix:
SET 1, (PZ)reads PZ → bit 0 from callback (=0), losing the previously-written MSTAT0. This is WRONG. - Conclusion: MSTAT 2↔3 means MSTAT1 is set (signaling “transfer complete”) and MSTAT0 is toggling. This is correct protocol behavior that was masked before because SET/RES clobbered the other bit.
Verified DMA ch2 infrastructure: The SubCPU’s InterCPU_DMA_Send_Chunk configures DMA ch2 with Timer 2 triggering (DMA2V=0x16, T23MOD=0x1D with T1 clock, TREG2=0x14). Timer 2 fires INTT2 every 2560 cycles (128µs), each triggering one HDMA byte transfer from RAM to maincpu latch. INTTC2 fires on completion, dispatching to MICRODMA_CH2_HANDLER at 0x020F01. All infrastructure is correctly implemented — INTTC2=0 simply means the SubCPU hasn’t reached InterCPU_DMA_Send yet during the interrupted test.
DMA Macro Name Confusion (Side Investigation)
During the INT0 investigation, discovered that several DMA macros in tmp94c241.inc had misleading names:
LDC_DMAM0_WA(bytes D8,2E,40) actually writes to CR 0x40 = DMAC0 (count register, NOT mode)LDC_DMAC0_A(bytes C9,2E,42) actually writes to CR 0x42 = DMAM0 (mode register, NOT count)
The names had DMAC and DMAM swapped. Fixed in commit b4ed825 – all macros now correctly named, duplicates removed, and all call sites updated across all assembly files. Build still produces 100% byte-matching ROMs.
Key Files
| File | Purpose |
|---|---|
mame_driver/src/mame/matsushita/kn5000.cpp |
Main MAME driver (latches, ports, memory map) |
mame_driver/src/devices/cpu/tlcs900/tmp94c241.cpp |
CPU emulation (DMA, SFR, interrupts, INT0 re-assertion) |
mame_driver/src/devices/cpu/tlcs900/tmp94c241.h |
CPU header (DMA state) |
mame_driver/src/devices/cpu/tlcs900/tlcs900.cpp |
TLCS900 base class (execute_run, interrupt shadow) |
mame_driver/src/devices/cpu/tlcs900/tlcs900.h |
TLCS900 base header (m_irq_inhibit) |
mame_driver/src/devices/cpu/tlcs900/900tbl.hxx |
Shared instruction decoder (LDC CR, EI/RETI shadow) |
mame_driver/src/devices/cpu/tlcs900/dasm900.cpp |
Disassembler (CR label fix) |
maincpu/kn5000_v10_program.asm:134123 |
SubCPU_Send_Payload |
maincpu/kn5000_v10_program.asm:139166 |
InterCPU_E1_Bulk_Transfer |
maincpu/kn5000_v10_program.asm:139115 |
Audio_DMA_Transfer |
maincpu/kn5000_v10_program.asm:140484 |
LZSS decompressor (SLIDE_Parse_Header) |
subcpu/kn5000_subprogram_v142.asm:10964 |
InterCPU_Latch_Setup (payload DMA config) |
subcpu/kn5000_subprogram_v142.asm:11247 |
Payload INT0 handler |
subcpu/boot/kn5000_subcpu_boot.asm:1159 |
InterCPU_RX_Handler (boot ROM INT0 handler) |
subcpu/boot/kn5000_subcpu_boot.asm:718 |
INIT_DMA_SERIAL (DMA setup) |
subcpu/boot/kn5000_subcpu_boot.asm:405 |
Main loop (payload ready polling) |