Meet The Cube – this is the Transputer Power-House successor to the Tower of Power, which was a bit of a hacked frame-case and based on somewhat non-standard TRAM carriers with a max. capacity of just 24 size-1 TRAMs…
The Cube hardware
This time I went for something slightly bigger 😎 …A clear bow towards the Parsytec GigaCube within a GigaCluster.
The Cube uses genuine INMOS B012 double-hight Euro-card carriers, giving home to 16 size-1 TRAMs – Parsytec would call this a cluster and so will I.
Currently The Cube uses 4 clusters, making a perfect cube of 4x4x4 Transputers… 64 in total. Wooo-hooo, this seems to be the biggest Transputer network running on this planet (to my knowledge)
If not, there still room left for more 😯
Just to give you a quick preview, this is what ispy responds when ran against the Cube:
Using 150 ispy 3.23 | mtest 3.22
# Part rate Link# [ Link0 Link1 Link2 Link3 ] RAM,cycle
0 T800d-24 276k 0 [ HOST ... ... 1:1 ] 4K,1 1024K,3;
[expand title="Display all 64 lines"]1 T800d-25 1.7M 1 [ ... 0:3 2:1 3:0 ] 4K,1 2048K,3;
2 T800d-24 1.8M 1 [ ... 1:2 4:1 5:0 ] 4K,1 2048K,3;
3 T800d-25 1.8M 0 [ 1:3 6:2 5:1 7:0 ] 4K,1 2048K,3;
4 T800d-24 1.8M 1 [ ... 2:2 6:1 8:0 ] 4K,1 2048K,3;
5 T800d-25 1.8M 0 [ 2:3 3:2 8:1 9:0 ] 4K,1 2048K,3;
6 T800d-24 1.8M 2 [ ... 4:2 3:1 10:0 ] 4K,1 2048K,3;
7 T800d-24 1.8M 0 [ 3:3 10:2 9:1 11:0 ] 4K,1 2048K,3;
8 T800d-25 1.8M 0 [ 4:3 5:2 10:1 12:0 ] 4K,1 2048K,3;
9 T800d-25 1.8M 0 [ 5:3 7:2 12:1 13:0 ] 4K,1 2048K,3;
10 T800d-24 1.8M 0 [ 6:3 8:2 7:1 14:0 ] 4K,1 2048K,3;
11 T800d-24 1.8M 0 [ 7:3 14:2 13:1 15:0 ] 4K,1 2048K,3;
12 T800d-25 1.8M 0 [ 8:3 9:2 14:1 16:0 ] 4K,1 2048K,3;
13 T800d-25 1.8M 0 [ 9:3 11:2 16:1 17:0 ] 4K,1 2048K,3;
14 T800d-24 1.8M 0 [ 10:3 12:2 11:1 18:0 ] 4K,1 2048K,3;
15 T800d-25 1.8M 0 [ 11:3 ... 17:1 19:0 ] 4K,1 2048K,3;
16 T800d-24 1.8M 0 [ 12:3 13:2 18:1 20:0 ] 4K,1 2048K,3;
17 T800d-25 1.8M 0 [ 13:3 15:2 20:1 21:0 ] 4K,1 2048K,3;
18 T800d-25 1.8M 0 [ 14:3 16:2 ... 22:0 ] 4K,1 2048K,3;
19 T800d-25 1.8M 0 [ 15:3 22:2 21:1 23:0 ] 4K,1 2048K,3;
20 T800d-25 1.8M 0 [ 16:3 17:2 22:1 24:0 ] 4K,1 2048K,3;
21 T800d-25 1.8M 0 [ 17:3 19:2 24:1 25:0 ] 4K,1 2048K,3;
22 T800d-25 1.8M 0 [ 18:3 20:2 19:1 26:0 ] 4K,1 2048K,3;
23 T800d-25 1.8M 0 [ 19:3 26:2 25:1 27:0 ] 4K,1 2048K,3;
24 T800d-24 1.8M 0 [ 20:3 21:2 26:1 28:0 ] 4K,1 2048K,3;
25 T800d-25 1.8M 0 [ 21:3 23:2 28:1 29:0 ] 4K,1 2048K,3;
26 T800d-25 1.7M 0 [ 22:3 24:2 23:1 30:0 ] 4K,1 2048K,3;
27 T800d-24 1.8M 0 [ 23:3 30:2 29:1 31:0 ] 4K,1 2048K,3;
28 T800d-25 1.8M 0 [ 24:3 25:2 30:1 32:0 ] 4K,1 2048K,3;
29 T800d-25 1.8M 0 [ 25:3 27:2 32:1 33:0 ] 4K,1 2048K,3;
30 T800d-25 1.8M 0 [ 26:3 28:2 27:1 34:0 ] 4K,1 2048K,3;
31 T805d-20 1.7M 0 [ 27:3 ... 33:1 35:0 ] 4K,1 1024K,3;
32 T800d-24 1.8M 0 [ 28:3 29:2 34:1 36:0 ] 4K,1 2048K,3;
33 T800d-20 1.8M 0 [ 29:3 31:2 36:1 37:0 ] 4K,1 1024K,3;
34 T800d-24 1.8M 0 [ 30:3 32:2 ... 38:0 ] 4K,1 2048K,3;
35 T800c-20 1.8M 0 [ 31:3 38:2 37:1 39:0 ] 4K,1 1024K,3;
36 T805d-20 1.7M 0 [ 32:3 33:2 38:1 40:0 ] 4K,1 1024K,3;
37 T800c-20 1.6M 0 [ 33:3 35:2 40:1 41:0 ] 4K,1 1024K,3;
38 T800d-20 1.6M 0 [ 34:3 36:2 35:1 42:0 ] 4K,1 1024K,3;
39 T800d-20 1.7M 0 [ 35:3 42:2 41:1 43:0 ] 4K,1 1024K,3;
40 T800d-20 1.8M 0 [ 36:3 37:2 42:1 44:0 ] 4K,1 1024K,3;
41 T800d-20 1.7M 0 [ 37:3 39:2 44:1 45:0 ] 4K,1 1024K,3;
42 T800d-20 1.8M 0 [ 38:3 40:2 39:1 46:0 ] 4K,1 1024K,3;
43 T800d-20 1.8M 0 [ 39:3 46:2 45:1 47:0 ] 4K,1 1024K,3;
44 T800d-20 1.8M 0 [ 40:3 41:2 46:1 48:0 ] 4K,1 1024K,3;
45 T800d-20 1.8M 0 [ 41:3 43:2 48:1 49:0 ] 4K,1 1024K,3;
46 T800d-20 1.7M 0 [ 42:3 44:2 43:1 50:0 ] 4K,1 1024K,3;
47 T800d-20 1.8M 0 [ 43:3 ... 49:1 51:0 ] 4K,1 1024K,3;
48 T800d-20 1.8M 0 [ 44:3 45:2 50:1 52:0 ] 4K,1 1024K,3;
49 T800d-20 1.6M 0 [ 45:3 47:2 52:1 53:0 ] 4K,1 1024K,3;
50 T800d-20 1.8M 0 [ 46:3 48:2 ... 54:0 ] 4K,1 1024K,3;
51 T800d-20 1.8M 0 [ 47:3 54:2 53:1 55:0 ] 4K,1 1024K,3;
52 T800d-20 1.8M 0 [ 48:3 49:2 54:1 56:0 ] 4K,1 1024K,3;
53 T800d-20 1.8M 0 [ 49:3 51:2 56:1 57:0 ] 4K,1 1024K,3;
54 T800d-20 1.6M 0 [ 50:3 52:2 51:1 58:0 ] 4K,1 1024K,3;
55 T800d-20 1.8M 0 [ 51:3 58:2 57:1 59:0 ] 4K,1 1024K,3;
56 T800d-20 1.7M 0 [ 52:3 53:2 58:1 60:0 ] 4K,1 1024K,3;
57 T800d-20 1.8M 0 [ 53:3 55:2 60:1 61:0 ] 4K,1 1024K,3;
58 T800d-20 1.8M 0 [ 54:3 56:2 55:1 62:0 ] 4K,1 1024K,3;
59 T800d-20 1.8M 0 [ 55:3 ... 61:1 ... ] 4K,1 1024K,3;
60 T800d-20 1.7M 0 [ 56:3 57:2 62:1 63:0 ] 4K,1 1024K,3;
61 T800d-20 1.6M 0 [ 57:3 59:2 63:1 ... ] 4K,1 1024K,3;
62 T800d-20 1.8M 0 [ 58:3 60:2 ... 64:0 ] 4K,1 1024K,3;
63 T800d-20 1.8M 0 [ 60:3 61:2 64:1 ... ] 4K,1 1024K,3;
64 T800d-20 1.7M 0 [ 62:3 63:2 ... ... ] 4K,1 1024K,3;
[/expand]
Here are some more figures:
- 32 x T800@25Mhz/2MB (my very own AM-B404 TRAMs)
- 32 x T800@20MHz/1MB (mainly TRAMs from MSC and ARADEX)
- -> 96MB of total RAM
- -> 70-130 MFLOPS (single precision)
- ~800MIPS combined integer power
- ~60Amps @5V needed (That’s 300W 😯 )
So we’re talking about 70-130 MFLOPS here – depending which documentation you trust and what language (OCCAM vs. Fortran) and/or OS you’re using. That was quite a powerhouse back in 1990 (Cray XM-P class!)… and dwarfed by a simple Pentium III some years later 😉
Just for to give you an comparison with recent hardware (Linpack MFlops):
Raspberry Pi Model B+ (700 MHz) | ~40 DP Mflops |
Raspberry Pi 2 Model B (1000 MHz – one core) | ~134 DP Mflops |
Raspberry Pi 3 Model B (1200 MHz – one core) | ~176 DP Mflops |
Short break for contemplation about getting old…
Ok, let’s go on… you want to see it. Here it is – the front, one card/cluster pulled, 3 still in. On the left the mighty ol’ 60A power supply:
Well, this is the evaluation version in a standard case, i.e. this is meant for testing and improving. I’m planning for a somehow cooler and more stylish case for the final version (read: Blinkenlights etc.).
And here’s the IMHO more interesting view… the backside. It shows the typical INMOS cabling.
As usual, I color coded some of the cables.
The green arrow points to the uplink to the host system to which The Cube is connected to. Red are the daisy-chained Analyse/Reset/Error (ARE) signals. The yellow so-called jumper-cables connect some of the IMSB004 links back into the boards network. And in the upper row (blue) four ‘edge-links’ of each board are connected to its neighbor.
This setup connects four 4×4 matrices (using my C004 dummies as discussed here) into a big 4×16 matrix. Finally I will ‘wrap’ that matrix into a torus. Yeah, there might be more clever topologies, but for now I’m fine with this.
Building up power
For completeness, here’s a quick look at how things came together.
The 4 carriers/clusters with lots of size-1 TRAMs… upper right one is the C004-dummy test board (now also fully populated). Upper left is pure AM-B404 love <3
Fixing/replacing the broken power-supply (in the back), including the somewhat difficult search for a working cooling solution:
The Cube software
Well there isn’t any specific software needed to run The Cube, but it definitely cries out loud for some heavily multi-threaded stuff.
So the first thing has definitely to be a Mandelbrot zoom. As usual, I used my very own version with a high-precision timer, available in my Transputer Toolkit.
Here’s the quick run in real-time – you can still figure out visually each Transputer delivering its result:
Other Transputer and x86 results of this benchmark can be seen in this post over here.
We need (even) more power, Igor!
So this is running fine – using internal RAM only. On the other hand, it seems that the current power supply has some issues with, well, the electric current.
When booting Helios onto all 64/65 Transputers which uses all of the external RAM, very soon some of them do crash or go into a constant reboot-loop.
By just reducing the network definition (i.e. not pulling any Transputers) to 48, Helios boots and runs rock-solid.
Because measuring the voltage during a 64-T boot shows a solid 5.08V on all TRAM-slots it most likely means the power supply either can’t deliver the needed amount of Amps (~60) or produces noise etc. 😥
So this is the next construction site I have to tackle.
To be continued…
I’m really interested to have found this site!
I got here from a puzzle game by Zachtronics called “TIS-100”, an esoteric assembly language programming game which tasks the player with programming the fictional TIS-100 microcomputer, the specs for which can be viewed here.
I’d found the TIS-100’s esoteric and unusual architecture – a grid of multiple very small communicating parallel processors with an extremely minimal machine language – absolutely fascinating, and I only just realized that it’s barely fictionalized at all – it’s clearly directly inspired by Transputer systems (the node classes are even named T(number), just simplified and made obtuse and esoteric enough that programming it to do anything useful became a tricky puzzle.
Considering how much fun I had programming the TIS-100, it’s very pleasantly surprising to know that a more practical relative actually exists, in the form of real hardware, and has a small community of enthusiasts!
Hi Axel – fantastic achievement. This is an amazing machine you have built up.
Thanks. It’s still not finished yet and supplying the system with sufficient power is still an issue. But I’m working on it.