3rd handler
Next up is the 3rd handler, MacII_3rd:
(0x3F94) in our case. Actually it’s called with JSR 8(A6)
, but that’s an 8 byte offset to the ‘base-address’ of any handler. Clever stuff, huh (Google for ‘pointer-table’)?
This subroutine contains serious magic and was a real hard nut to crack. Especially because it tricked me into believing that I’ve found the ‘crashsite’… which, to spoil the tension, isn’t.
It just kept on killing Macsbug, because it’s so low-level.
What this routine does is replacing the Vector Base Register (VBR) which ‘lives’ at address 0x00000000. Evil stuff.
- After disabling interrupts and switching to 32bit-mode a field with 6 long-words (
data107
) will be populated with data generated in other routines.
For now I can only guess what these entries are (Values from my SE/30 given in brackets). We’ll discuss all that further down. - 0x3FC6 to 0x3FD8 calculates the size of the chunk of code starting at
data106
(0x4008) to the beginning ofMacII_4th
(i.e. the end of Mac_3rd), which is 180 bytes. - Using this length, the routine first saves the current VBR onto the stack using the system call
_BlockMove
.
Then the original VBR (+some more) will be replaced by the new version beginning atdata106
. (Killing Macsbug – more on that later) BSR 53_cmd_1x
is been called. This brings the Carrera040 into life most likely using the just copied VBR (This is discussed in much detail further down).- Now the contents of the stack (= copy of the original VBR) will be copied back into its place, this time using a classic
DBRA
loop (0x3FF4). My guess, no Toolbox call possible at the moment. - Adjust the stack, back to 16bit mode, restore Registers and return-from-subroutiene. Done.
Here’s the code doing all this:
3F94:MacII_3rd: MOVE SR,-(A7) ; 3rd call from MacII handler 3F96: ORI #$700,SR ; Set bit 9-11 of SR (disable Interrupts) 3F9A: MOVEM.L D0-D2/A0-A2,-(A7) 3F9E: MOVEQ #1,D0 3FA0: _SwapMMUMode 3FA2: PUSH.B D0 3FA4: SUBA.L A2,A2 ; faster movea.l #0,a2 3FA6: LEA data107,A0 ; Filling the data into the 6x32 field 3FAA: MOVE.L 96(A5),D0 3FAE: MOVE.L D0,(A0)+ ; SE30: 9FE00 3FB0: LEA data69,A1 3FB4: MOVE.L A1,(A0)+ ; SE30: 9D6E2 (User/Supervisor Rootpointer?) 3FB6: MOVE.L $64(A5),(A0)+ ; 807FC040 3FBA: MOVE.L $6C(A5),(A0)+ ; 807FC040 3FBE: MOVE.L $68(A5),(A0)+ ; 00000000 3FC2: MOVE.L $70(A5),(A0)+ ; 00000000 3FC6: LEA MacII_4th,A0 3FCA: MOVE.L A0,D2 3FCC: LEA data106,A0 3FD0: SUB.L A0,D2 ; 'distance' from data106 to MacII_4th 3FD2: SUBA.L D2,A7 3FD4: MOVEA.L A2,A0 3FD6: MOVEA.L A7,A1 ; save the current VBR to the stack 3FD8: MOVE.L D2,D0 ; A0 = SE30: 00000000 (src) - IIci: $FBB08000 ; A1 = SE30: 027ff34c (dest) - IIci: $3BF9FC6 ; D0 = B4 (count) - SAME on the IIci! 3FDA: _BlockMove ; (A0/srcPtr,A1/destPtr:Ptr; D0/byteCount:Size) ; write my own VBR... ; This copies 180 bytes into 0x000000000 replacing the original VBR. ; ... and kills Macsbug if not circumvented properly. 3FDC: LEA data106,A0 3FE0: MOVEA.L A2,A1 3FE2: MOVE.L D2,D0 ; A0 = 9F900 (src) - IIci 10C4EA (data88) ; A1 = 00000 (dest) - IIci FBB08000 ; D0 = B4 (count) - IIci same 3FE4: _BlockMove ; (A0/srcPtr,A1/destPtr:Ptr; D0/byteCount:Size) 3FE6: BSR 53_cmd_1x ; Bring the C040 to life 3FEA: MOVEA.L A7,A0 ; SP to A0 3FEC: MOVEA.L A2,A1 ; SE30: 00000000 3FEE: MOVE.L D2,D0 ; the code length (B4 again) 3FF0: BRA.S lae_163 3FF2: lae_162 MOVE.B (A0)+,(A1)+ ; Write the VBR back from the stack 3FF4: lae_163 DBRA D0,lae_162 3FF8: ADDA.L D2,A7 ; adjust the stack 3FFA: POP.B D0 3FFC: _SwapMMUMode 3FFE: MOVEM.L (A7)+,D0-D2/A0-A2 4002: MOVE (A7)+,SR 4004: MOVEQ #0,D0 4006: RTS ; Start of VBR replacement- and 040-Code being copied to 0x0 the by line 0x3FE4 ; /if/ theses are the Vectors 0-17, then their meaning would be: 4008: data106: DC.L #$00001000 ; Reset initial Stack Pointer 400C: DC.L #$00000050 ; Reset initial Program Counter ; - ALL of these Vectors point to addr 4050 (offset 0x48) - 4010: DC.L #$00000048 ; Buserror 4014: DC.L #$00000048 ; Adress Error 4018: DC.L #$00000048 ; Illegal Instruction 401C: DC.L #$00000048 ; Zero Divide 4020: DC.L #$00000048 ; CHK, CHK2 instruction 4024: DC.L #$00000048 ; cpTRAPcc, TRAPcc, TRAPV instruction 4028: DC.L #$00000048 ; Privilige Violation 402C: DC.L #$00000048 ; Trace 4030: DC.L #$00000048 ; LINE 1010 Emulation 4034: DC.L #$00000048 ; LINE 1111 Emulation ; THESE are definitely no vectors, they are dynamically written by the code above ; and to be used to setup the 040 MMU registers. 4038: data107: DC.L #$0009FE00 ; 403C: DC.L #$0009D6E2; 4040: DC.L #$807FC040 ; 4044: DC.L #$807FC040 ; SE30: 4048: DC.L #$00000000 ; SE30: 00000000 404C: DC.L #$00000000 ; SE30: 00000000 4050: CLR.L $53000000 ; Poke 0 to $53000000 4056: BRA lae_164 ; This points to itself... I'm lost at the moment. 4058: LEA data107,A0 ; SE30: 9F900 405C: MOVE.L (A0)+,D1 ; SE30: 0009FE00 (User/Supervisor Rootpointer) 405E: MOVEA.L (A0)+,A1 ; 0009D6E2 4060: MOVE.L (A0)+,D4 ; 807FC040 4062: MOVE.L (A0)+,D5 ; 807FC040 4064: MOVE.L (A0)+,D6 ; 00000000 4066: MOVE.L (A0)+,D7 ; 00000000 4068: MOVE.L #$C000,D0 406E$ MOVEC D0,ITT0 ; Set Instruction Transparent Translation 4072$ MOVEC D0,DTT0 ; Set Data Transparent Translation 4076$ MOVEC D1,SRP ; Set Supervisor Rootpointer 407A$ MOVEC D1,URP ; Set User Rootpointer 407E: MOVE.L #$C000,D0 4084$ PFLUSHA ; Invalidates all entries in the address translation cache 4086$ MOVEC D0,TC 408A: LEA data108,A0 408E: ADDA.L #$53002000,A0 ; (=0x530A1900) 4094: JMP (A0) ; JuMP to data108 code (below) in C040 RAM range? 4096: data108: MOVEQ #0,D0 4098$ MOVEC D0,ITT0 409C$ MOVEC D0,DTT0 40A0$ MOVEC D4,ITT0 40A4$ MOVEC D5,DTT0 40A8$ MOVEC D6,ITT1 40AC$ MOVEC D7,DTT1 40B0$ CINVA BC 40B2: NOP 40B4: MOVEQ #0,D0 40B6$ MOVEC D0,CACR 40BA: JMP (A1) ; 0009D6E2 ; END 040 Code being copied to somewhere by line 3FE4 40BC: MacII_4th: MOVEM.L D1-D7/A0-A4,-(A7) ; 4th subroutine called my MacII_handler [...]
The Vector Base Register
I wasn’t precise when I initially said “replacing the VBR”. What actually happens is that this routine uses what I’d call an interim-VBR for the moment it initializes the 68040 on the C040. You’ve probably saw the link referring to what the VBR is in the 1st post of this series, but let me go a bit more into detail.
The VBR is a list of addresses (aka vectors) the CPU refers to in case of an exception – and this is true for every 68k system out there, e.g. Mac, SUN, NeXT, Amiga or Atari. Some of them might do some relocation using their MMU, but even the virtual address will be 0x00000000 and the order is the same. There are 16 basic vectors as listed here:
If for example a divide-by-zero happens, the CPU would call a handler which address is stored in 0x14. Pretty simple.
So let’s have a look what MacII_3rd
left in the VBR (and below that) when the ‘interim VBR’ is in place:
0000: data106: DC.L #$00001000 ; Reset initial Stack Pointer 0004: DC.L #$00000050 ; Reset initial Program Counter ; - ALL of these Vectors point to addr 0x48 - 0008: DC.L #$00000048 ; Buserror 000C: DC.L #$00000048 ; Adress Error 0010: DC.L #$00000048 ; Illegal Instruction 0014: DC.L #$00000048 ; Zero Divide 0018: DC.L #$00000048 ; CHK, CHK2 instruction 001C: DC.L #$00000048 ; cpTRAPcc, TRAPcc, TRAPV instruction 0020: DC.L #$00000048 ; Privilige Violation 0024: DC.L #$00000048 ; Trace 0028: DC.L #$00000048 ; LINE 1010 Emulation 002C: DC.L #$00000048 ; LINE 1111 Emulation ; - THESE are definitely no vectors, they are dynamically written by the ; code above and to be used to setup the 040 MMU registers. 0030: data107: DC.L #$00000000 ; SE30: 0009FE00 (12) 0034: DC.L #$00000000 ; SE30: 0009D6E2 (13) 0038: DC.L #$00000000 ; SE30: 807FC040 (14) 003C: DC.L #$00000000 ; SE30: 807FC040 (15) 0040: DC.L #$00000000 ; SE30: 00000000 0044: DC.L #$00000000 ; SE30: 00000000 0048: CLR.L $53000000 ; Poke 0 to $53000000 ; C040 off 004C: blocker3 BRA blocker3 ; Points to itself... probably a "blocker" 0050: LEA data107,A0 ; initial Program Counter (SE30: 9F900) 0054: MOVE.L (A0)+,D1 ; SE30: 0009FE00 (User/Supervisor Rootpointer) 0058: MOVEA.L (A0)+,A1 ; 0009D6E2 005C: MOVE.L (A0)+,D4 ; 807FC040 0060: MOVE.L (A0)+,D5 ; 807FC040 0064: MOVE.L (A0)+,D6 ; 00000000 0068: MOVE.L (A0)+,D7 ; 00000000 006C: MOVE.L #$C000,D0 0070: MOVEC D0,ITT0 ; Set Instruction Transparent Translation 0074: MOVEC D0,DTT0 ; Set Data Transparent Translation 0078: MOVEC D1,SRP ; Set Supervisor Rootpointer 007C: MOVEC D1,URP ; Set User Rootpointer 0080: MOVE.L #$C000,D0 0084: PFLUSHA ; Invalidates all entries in the address translation cache 0088: MOVEC D0,TC 008C: LEA data108,A0 0090: ADDA.L #$53002000,A0 ; (=0x530A1900) 0094: JMP (A0) ; JuMP to data108 code (below) in C040 RAM range? 009C: data108: MOVEQ #0,D0 00A0: MOVEC D0,ITT0 ; 0 00A4: MOVEC D0,DTT0 ; 0 00A8: MOVEC D4,ITT0 ; 807FC040 00AC: MOVEC D5,DTT0 ; 807FC040 00B0: MOVEC D6,ITT1 ; 00000000 00B4: MOVEC D7,DTT1 ; 00000000 00B8: CINVA BC 00BC: NOP 00C0: MOVEQ #0,D0 00C4: MOVEC D0,CACR 00C8: JMP (A1) ; 0009D6E2
Farewell, old friend
At this point, my SE/30 always froze and I thought this must be the point where to find incompatibilities between the IIci and SE/30. |
Okayyyyy. After that’s been written, 53_cmd_1x
is called, presumably telling the C040 to come to life.
And keen as it is, it’ll look up the “Reset initial Program Counter” (VBR: 0xC) and starts executing code from 0x50. Any other occurring exception will call the ‘handler’ at 0x48, simply switching the C004 off and sit in an endless loop (0x4C) – probably making the 68030 to take over again.
EmEmYou!
Given everything’s fine, the code at 0x50 will start reading the previously populated data from data107
into several registers.
Then some serious 68040 MMU table setup happens – so this is some kind of ‘040 initialization routine… and the ‘040 is actually running. Woohoo!
Time for some special register explanation:
As we all know, the 68040 has two in-build 4k caches and an MMU. The latter can be programmed how and what to cache. This is defined in 4 registers of which only 2 are of interest here: ITT0
and DTT0
, the Instruction and Data Transparent Translation registers, both sharing the same bit-fields following this pattern:
BBBBBBBBMMMMMMMMESS000UU0CC00W00
- B – Logical Address Base – compared with address bits A31-A24. Addresses that match in this comparison are transparently translated
- M – Logical Address Mask – setting a bit in this field causes corresponding bit in Base field to be ignored
- E – Enable Bit – 1 – translation enabled; 0 – disabled
- S – Supervisor Mode – 00 – match only in user mode 01 – match only in supervisor mode 1x – ignore mode when matching
- U – User Page Attributes – ignored by 040
- C – Cache mode – 00 – Cacheable, Write-through 01 – Cacheable, Copyback 10 – Noncacheable, Serialized 11 – Noncacheable
- W – Write protect – 0 – write permitted; 1 – write disabled
Here’s an example:
807FC040 = 10000000011111111100000001000000 BBBBBBBBMMMMMMMMESS000UU0CC00W00
which means:
- a bit less than 2GB transparently translated (2032MB)
- translation enabled
- Supervisor Mode: ignore mode when matching
- Cache mode: Noncacheable, Serialized
- Write permitted
So let’s have a look at the code again:
At 0x70/0x74 the MMU is set to 0xC000
, i.e. Enable translation, apply for user & supervisor mode, write-though cache, for logical address space 0x80000000-0x00ffffff (2GB minus the bottom 16MB).
Then Supervisor & User Rootpointer are set to 0x9FE00, then the address translation cache is flushed to finally set the Translation Control register to Enable & 8K page size (0x88)… up to here this was pretty much ‘by the book’ of how to set-up MMU tables.
Having its MMU all set, the 68040 now gets something to chew on:
The address of data108
is added to 0x53002000 and jumped to!
💡 Does 0x53002000 equal 0x00000000 for the C040?
Let’s assume the C040 executes the code at data108
for now. That is:
- Clear the ITT/DTT registers
- Set the MMU to 0x807FC040 (see decoding example above)
- invalidate caches and wait’a’NOP to have that happened
- then disable all caches
- and jump to where
A1
points to. In my SE/30 that’s 0x9D6E2, previously loaded fromdata107
in 0x58
Writing all this from the top of my head, I’m not 100% sure where this address is pointing to. I must be somewhat back into MacII_3rd
(0x3FEA), because this is where the program execution resumes (Need to check this with Macsbug and will update).
For now, I’m tempted to call MacII_3rd something like ‘C040_MMU_setup
‘… but I’d love to have this confirmed 💡 by somebody who knows more than me 😉
Next up will be continuing working further through the main:
procedure again… so move over here.
Hi,
Not knowing much about Mac hardware, it looks like bit 0 of $53000000 controls which CPU is active. The other CPU will likely be stalled on a memory access. This explains why the switch sequence includes NOPs, because the CPU may have prefetched some instructions and the code must make sure no changes happen to the CPU state which should be visible for the other CPU. With this knowledge in mind, I speculate 53_cmd_1x is a 030to040 switch routine, which starts on 030, switches to the 040 and stalls and will continue running on 030 when $53000000 bit 0 is cleared again. So all the code in this routine runs 030. 53_cmd_0 would then be the 040to30 counterpart. So all the code runs on 040 and the 040 is stalled somewhere around $1e00 while the 030 is active.
The very first time the 040 is switched to, its internal state is the reset state, so the MMU etc needs to be setup. That’s what the code you mentioned here does and it ends by jumping into the 040to030 switch routine, as if a normal switch to 030 was performed. Only the state save does not need to be done, because the 040 did not have any state yet. I do wonder how the code distinguishes between a succesful 040 init and a failed one where it switches back to 030 at $0048 and puts the 040 in a permanent infinite loop. So if the 040 would ever be switched to, the system would just hang.
Hey Peter,
Thanks for commenting!
Yes, it’s pretty clear, that “poking” a 1 to 0x5300k is the low-level command for the 040 to take over (as mentioned in the post: “After that’s been written, 53_cmd_1x is called, presumably telling the C040 to come to life”), while a “0” is switching it off again.
Also the whole call-chain to switch between the 030 and 040 is well documented, see the full source in my github repository (search for ‘switchto030’ and ‘switchto040’ both called by ‘SetCPUmode’).
The “infinitve loop” is IMHO a typical ‘sit and wait for interrupt’ thing.
Another question: doesn’t 0x807FC040 as TTR value mean all virtual addresses from 0x80000000 to 0xffffffff are translated 1:1 to a physical address?
Yeah, you’re right. Makes more sense, updated the post.
You’re likely right about MacII_3rd being C040_MMU_Setup. I think the reason for ADDA.L #$53002000,A0 is simply that this routine has to be independent of the final MMU setup at least to the extent possible. So the code cannot rely on how the first 16MB will be mapped in the final setup, using either the page table or a transparent translation. The only guarantee is probably that the address space required for the C040 itself is mapped. Hence the procedure is as follows:
1) setup temporary transparent translation for the first 16MB so we can continue to run when the MMU is on.
2) load page table pointers for user and supervisor mode
3) enable MMU
4) jump to known good mapping (ie in the C040 address space)
5) load final transparent translation setup
6) switch back to 030
about your question ‘Does 0x53002000 equal 0x00000000 for the C040?’ Realizing that 0x2000 is the page size, I wouldn’t be surprised the page table contains a mapping of 0x53002000 to 0x0. This can be verified by inspecting the page table ofcourse.