The T2C=

It had to be done… and now, 12 years later 😱, it is done:
Finally the T2C64 was poured into a proper PCB and some bells’n’whistles had been added – so it’s now on par with the T2A2 for the Apple II series. Say hello to the T2C=!

TLDR;

Here’s a quick intro for those being to lazy to follow the link to the T2C64.

This card enables your Commodore 8-bit computer to communicate with a Transputer Module piggybacked onto the card.
A Transputer is a 32bit RISC(ish) CPU from the 80’s that has the unique ability to connect to other Transputers by a very simple 2-wire protocol making it possible to create large, powerful computing networks – at least by 1980+ measures 😉

What can I do with it?
How to talk to the Transputer?
Can I do something useful with that?
How fast can I move data back and forth?
Ok, how much?

After many years the Commodore 8-bit bug had bitten me again and it was due time to put some love into my C64 Transputer interface.
But while at it, I thought it would be handy to use this card not just with my C64 but also on all other Commodore machines featuring an expansion port.

This led to the (to my knowledge) first 8-bit Commodore ‘flipper card’, i.e. it has a port connector on each end. One for the C64 & C128 and one for the C264 family, namely the C16, C116 and Plus/4. Yes, it works with all of them. Pull it from your C64, flip it 180° and plug it into e.g. your Plus/4. Cool, huh?
So here’s a quick feature list:

Edge connectors to connect to the Commodore
- C64
- C128
- C16/C116
- Plus/4
Each edge connector offers 2 I/O address ranges to be set by a jumper (0xDE00/0xDF00 and 0xFD90/0xFDF0)
Offers two TRAM (TRAnsputer Module) slots to connect either 2 Size-1 or one Size-2 TRAM
External Transputer-Link connector to connect the T2C= to larger external Transputer networks (pinout is the same as on the T2A2) – not populated on the pictures here.
The data-bus is fully buffered to prevent interference with other cards when used in e.g. an expander
3.3V CPLD used to reduce power-consumption as much as possible.

Here’s the card in full glory… without a TRAM plugged in:

….and with one size-1 TRAM which itself provides a 32bit T800 Transputer and 128K RAM. The 2nd slot is still free.

TRAMs came in a wild range of variations. Be it CPUs used on them and/or the amount of RAM. But there also been peripherals like SCSI controllers or graphic cards – check my little TRAM page if you like to get an idea.
Yes, TRAMs are quite vintage and thus hard (or expensive) to get… but don’t despair… I’ve designed my own and also have some old ones in stock – probably enough to serve the hand-full of interested nerds 😉

What can I do with these?

I knew that was coming 😜
Well it mostly depends on you. The T2C= is an accelerator running its own code in its own RAM and can exchange data with your Commodore 8-bit machine – everything is possible.

All code examples and sources are available in this archive.
Commodore files are in a .D64 disk image.

Personally I always have the initial reflex to run a Mandelbrot fractal on everything’s slightly capable to do so. Most of the time, that’s where my euphoria ends and my project-ADHD kicks in… but that doesn’t stop you from having cool ideas.

Technically this setup isn’t much different from slapping a Raspberry Pi to your Commie and let that do stuff… but there’s something I’d like to call the “5 connoisseurs C’s” which might not be everyone’s cup of tee but very tempting to others:

Contemporary: Transputers are from the same era like your Commodore machine while being much more powerful – we’re talking about ~15MIPS here.
Completely different: Transputers are natively programmed in OCCAM, a very interesting, different language than the one you might be used to.
That said, no worries, there are C, Pascal and Fortran compilers, too. Here’s a page offering a little “SDK” I created – it’s a VirtualBox image coming with everything you need to start coding.
Connected: Transputers are made to be networked into a parallel network… making your well programmed application running even faster as benchmarks show.
Challenging: “Well programmed” means wrapping your genius brain around multi-threaded, parallel paradigms or use the fast 2 or 4K on-chip RAM the most clever way.
Communicate: And finally, find a clever way to communicate with the host (i.e. your Commodore) and vice versa.

Ideas for using it could be raytracing, do complex calculations, heavily compress/manipulate data, use it as a simple storage (“Stupid-REU”) or write a Helios server for it and use your C= as a terminal [Helios is a UNIXish OS running on 1 to infinitive Transputers]

A final word of warning: While the T2C= uses very little power, a Transputer (and the RAM on a TRAM) does use quite some juice.
Depending on the TRAM this can be as little as 500mA up to 1A – which means your power-supply should be a stronger one.

Communication

Let’s start with the most simple and actually useful code:
Detect a connected T2C=/Transputer and check if it’s working correctly. This code was already shown on my T2C64 posts but now it’s enhanced for newly added machines and runs in BASIC V2 as well as V3.5 or V7.

After telling the base-address it does the following:

Init/Reset the Transputer to a sane state
Read & display the statuses of the Link interface
Write some data into the Transputers RAM and read it back
Finally, send a small program to the Transputer which makes it possible to find out its model (16bit T2xx, 32bit T4xx/T8xx or just a C004 programmable link switch)

So here’s the new TDETECT code:

100 SY=peek(65534):print chr$(147);"This seems to be a";
110 if peek(1177)=63 then poke1177,62:sy=peek(65534):poke1177,63
120 if sy=72 then print " c64": goto 160
130 if sy=23 then print " c128": goto 160
140 if sy=179 then print " plus 4 or c16":goto 160
150 print"n unknown model";print
160 print "select T2C= base address"
170 print "1: c64/c128 $de00 (56832, default)"
180 print "2: c64/c128 $df00 (57088)"
190 print "3: c264 $fd90 (64912, default)"
200 print "4: c264 $fdf0 (65008)"
210 print "5: enter your own"
220 input m
230 if m=1 then ba=56832: goto 290
240 if m=2 then ba=57088: goto 290
250 if m=3 then ba=64912: goto 290
260 if m=4 then ba=65008: goto 290
270 if m>5 goto 160
280 input "base address:";ba
290 print"initializing transputer"
300 do=ba+1:rem data out
310 is=ba+2:rem in status
320 os=ba+3:rem out status
330 re=ba+8:rem reset/error
340 an=ba+12:rem analyze
350 rem ------------------
360 poke re,1
370 poke an,0
380 poke re,0
390 rem clear i/o enable
400 poke is,0
410 poke os,0
420 print"reading statuses"
430 print"i status: ";(peek(is)and1)
440 print"o status: ";(peek(os)and1)
450 print"error: ";(peek(re)and1)
460 print"sending poke command"
470 pokedo,0
480 print"o status: ";(peek(os)and1)
490 :
500 print"sending test-data to t. (12345678)"
510 poke do,0:poke do,0:poke do,0:poke do,128
520 poke do,12:poke do,34:poke do,56:poke do,78
530 print"i status: ";(peek(is)and1)
540 :
550 print"reading back from t."
560 poke do,1:rem peeking
570 poke do,0:poke do,0:poke do,0:poke do,128
580 print peek(ba);peek(ba);peek(ba);peek(ba)
590 :
600 dimr(4)
610 print"sending program to transputer..."
620 forx=1to24
630 readt:poke do,t
640 wait os,1
650 nextx
660 print:print"reading result:"
670 c=0
680 n=ti+50
690 ifc=10 goto 760
700 if ti>n then ee=ee+1:if ee=10 goto 760
710 if(peek(is)and1)=0 goto 700
720 r(c)=peek(ba)
730 c=c+1
740 goto 680
750 rem ------------------------
760 if c=1 then print"c004 found"
770 if c=2 then print"16 bit transputer found"
780 if c=4 then print"32 bit transputer found"
790 data 23,177,209,36,242,33,252,36,242,33,248
800 data 240,96,92,42,42,42,74,255,33,47,255,2,0

Do something useful?

So now that we have detected a connected Transputer on our Commodore, it should do something useful like… adding numbers.
While this is way beneath his dignity, it’s a good example of uploading code to the Transputer and how to send and read data.

For this, I’d like to redirect you to the 2nd code example I’ve posted for the T2C64…

And finally in this former post I coded a Mandelbrot fractal (Video inside! 😉) for the C64 using cc65 and the TGI graphics library which calculates and displays the initial fractal within a minute or so.

Now having the whole family of C264 machines added, I thought it would be nice to have a demo for them, too.
So, just because it can do graphics out of the box, I wrote the Mandelbrot “frontend” in BASIC 3.5. It worked but it was brutally slow…it takes like 10 minutes or so to get this screen 🐌

That is – of course – because BASIC is darn slow in doing the IO and plotting. Looking at one of the above examples, reading a byte from the Transputer means read a byte, set “pen” to the next coordinate, decide if to plot or not, repeat – in code (provided in the D64 disk image) this looks like that:

390 for y=0 to 199
400 :for x=0 to 319
420 ::px=peek(ba)
430 ::if px=32 then draw 0,x,y:else draw 1,x,y
440 :next x
450 next y

Because it’s so slow, I even didn’t need to check the input-status of the link-interface as the Transputer delivers the data much quicker than BASIC can say “next”…

This of course will be the ultimate show-stopper. What’s the sense of such a fast number cruncher, if you can’t get the data out of it fast enough?

Speed?

Mhh, so how long does it take to (just) read data from the T2C=?
Let’s start with BASIC to have milestone. This is “BAS-SPEEDTEST”, a very simple benchmark.
It loads a tiny Program into the Transputer which makes him spitting out an endless loop of counting from 1 to 10. Then we read the amount of 4KB and stop the time on that.

NB: As seen on the examples above, there’s an automatic handshaking in the way that the C012 link-interface chip on the T2C= sets a flag (Out-Status) each time there’s a byte ready to be fetched. But BASIC is so slow, that there’s always new data available the next round reading.

100 ba=56832:rem Adjust your base accordingly
110 dd=ba+1:rem data out
120 is=ba+2:rem in status
130 os=ba+3:rem out status
140 re=ba+8:rem reset/error
150 an=ba+12:rem analyze
160 rem ------------------
170 poke an,0
180 poke re,0
190 poke re,1
200 for d=1 to 500:next d
210 poke re,0
220 print"sending program to transputer..."
230 forx=1to33
240 readt:pokedd,t
250 waitos,1
260 nextx
270 print"reading incoming data..."
280 zeit=ti
290 for l=1 to 4096
300 in=peek(ba)
310 next
320 print"time for 4k:";(ti-zeit)/60
370 rem --- for the transputer
380 data 32,181,36,242,33,248,36,242,33,252,37,247
390 data 34,249,70,33,251,36,242,74,251,96,7,1,2,3
400 data 4,5,6,7,8,9,10

That showed it clearly… Basic is an IO-sloth.
Even without waiting for the Input-State ready it took the C64 16.5 seconds to read 4KB – nearly twice as long if we check for the input-status.

Machine	without WAIT IS	with WAIT IS
C64	16.5	29.4
C128	24.28*	40.5
Plus/4	19.8	33

*) It’s strange, that Basic V7 is even slower – ~~investigation is ongoing~~
Talking to “the 128 Master” (Johan Grip) the mystery was solved.
Basic 7 also does a “long” fetch through extra vectors and code in ram. That does add quite a bit of overhead.
You could say that BASIC 7 has a “bad peek performance” 🙂

More speed, please!

Ok, let’s use something more mature… like the cc65 creating nice code for all our beloved Commodore machines.

[expand title=”This is a longer one… so please expand” ]

#pragma static-locals(1);
 
#include <stdlib.h>
#include <time.h>
#include <conio.h>
#include <peekpoke.h>
#include "trproc.h" // that's in the provided archive
 
#define TOBEREAD 4096 // how many bytes should be read
 
static char tcode[33] = {
0x20, 0xB5, 0x24, 0xF2, 0x21, 0xF8, 0x24, 0xF2, 0x21, 0xFC, 0x25, 0xF7,
0x22, 0xF9, 0x46, 0x21, 0xFB, 0x24, 0xF2, 0x4A, 0xFB, 0x60, 0x07, 0x01,
0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A
};
 
int main (void)
{
clock_t t;
unsigned long sec, kbps;
unsigned sec10;
 
int i;
char onechar;
 
clrscr ();
 
#if defined(__C64__) || defined(__C128__)
cprintf ("Expecting T2C= at 0xde00\r\n"); 
#elif defined(__PLUS4__) || defined(__C16__)
cprintf ("Expecting T2C= at 0xfd90\r\n"); 
#endif
 
/* Init Transputer - fixed to plus/4 dfeault for now */ 
init_t(); 
 
/* upload Transputer code */
cprintf ("Sending code to Transputer\r\n");
puttr(tcode, (sizeof(tcode)));
cprintf ("start reading %d bytes...", TOBEREAD);
t = clock ();
 
/* reading X KB byte by byte*/
for(i=0; i < TOBEREAD; i++) {
gettrchar(onechar);
}
cprintf ("done\r\n");
t = clock () - t;
 
/* Calculate stats */
sec = (t * 10) / 50;
sec10 = sec % 10;
sec /= 10;
kbps = TOBEREAD / sec;
 
/* Output stats */
cprintf ("\r\nDuration: %lu.%us (%lu byte/s)\n\r", sec, sec10, kbps);
 
/* Done */
return EXIT_SUCCESS;
}

[/expand]

But this time it just took 4.4/4.6/3.6 seconds on a 64/128/+4! 🏍💨 Four times faster than BASIC.

Are we there yet?

That looks promising and there’s still a C-compiler which we can optimize… read: replacing it with assembly code super-power.

For that I wrote a little macro-library for KickAssembler. Any other assembler will do, too, of course.
Besides the Transputer initialization and detection stuff there are macros for reading and writing a single byte, up to a “page” (256bytes) and the full 64K using two zero-page adresses… this is how we read the 4KB in the benchmark:

.label base = $de00 // define according to setting
.label inreg = base
.label outreg = inreg + 1
.label instat = inreg + 2
.label outstat = inreg + 3
.label reset = inreg + 8
.label analyse = inreg + $c
.label errflag = reset
 
		* = $C000 // or wherever you like
start:		
		inittr()  // macro init_transputer
 
		lda #$2E  // print a "." at 1/1 for debugging ;)
		sta $0400
 
		puttr_1page(bench, 34)  // upload the benchmark code
 
		lda #$00  // Set destination pointer to base ($8100)
		sta $FB   // in zero page
		lda #$81
		sta $FC
 
		gettr_big($FB, $1000) // get 4KB data and write it to $8100
 
		rts
 
		// code for benchmarking & testing (33 bytes)
		// The first byte is always 'sizeof(code)'
  bench:	.byte $21, $20, $B5, $24, $F2, $21, $F8, $24, $F2, $21, $FC, $25, $F7
		.byte $22, $F9, $46, $21, $FB, $24, $F2, $4A, $FB, $60, $07, $01
		.byte $02, $03, $04, $05, $06, $07, $08, $09, $0A

“Wrapped” as a SYS call into a BASIC program to stop the time – including the Transputer code upload and checking the input-status – this code takes 0.5 seconds to read 4KB and even with writing the read data to a defined memory area! 🚀
That’s 33 times faster than BASIC and still 8 times faster than cc65.

Getting one?

Still with me and you’re really interested in getting one of these?
Please go through this checklist first:

You’re aware that there’s no software for it yet
You’re aware that you have to code for the Transputer and are up to learning new things
You also need to write code on the Commodore side – assembly needed for maximum speed
Besides the T2C= you will need a TRAM. So you need to own/purchase one, too.

If you can answer all of them with “Yes”, “fine with me” and/or “sure!” I can provide you with a T2C= for €40 plus shipping.

I also have TRAMs available in different configurations:

My own design, the AM-B404 (size-1, 2MB SRAM): 45€
Various manufacturers: size-1, 1MB DRAM: Ask me.
“bargain offer”: original INMOS IMS-B404 (size-2, 2MB): 25€
(given their size, they will clog your T2C= completely)

available CPUs are:

T425-20: 7€
T800-20: 12€
T800-25: 20€

For example, if you like to have a T2C=, my AM-B404 TRAM and a 20MHz T800 that would be 40 + 45 + 12 = 97€ plus shipping

Shipping with tracking is
European Union 9€ (Just in case, for Germany it’ll be 6€ as “Paket”)
UK/Switzerland 13€
USA 15€

⇒ drop me a mail to
(Sorry, you have to type that into your mail-client – nobody likes SPAM, so do I)

GeekDot