Lies, Damned Lies and Benchmarks.

Using, learning, programming and modding the Gigatron and anything related.
Forum rules
Be nice. No drama.
at67
Site Admin
Posts: 647
Joined: 14 May 2018, 08:29

Lies, Damned Lies and Benchmarks.

Post by at67 »

*Update* To get on the list, you can use any language that supports the algorithm posted and any HW that uses the Gigatron as a foundation, but you must not modify the algorithm itself, (unless needed to for the syntax of your language), i.e. no memcpy's ;P Your language must support the basics of this algorithm, for loops, conditional statements/looping, arrays, etc.

One of the things I did very early on whilst coding gtBASIC was create a test suite of examples and unit tests, to exercise and verify the many commands and their subtle interactions.

One of these tests is a classic benchmark from the early 80's, Sieve of Eratosthenes published in Byte in 1981, (I still have my original copy lying around somewhere): https://en.wikipedia.org/wiki/Byte_Sieve

Original Byte article: https://archive.org/details/byte-magazi ... ew=theater

This is a rather simple test that calculates the first 1899 Prime numbers without using multiply/divide or floating point and was converted to many different languages of the time, (both interpreted and compiled), as well as many different architectures, (8bit, mini, mainframe, etc).

There is a list of results for various languages and architectures and I thought it was finally time that the Gigatron made it's debut, so without further ado, I present:
sieve_results.png
sieve_results.png (129.48 KiB) Viewed 1737 times
Here is the code for ROMvX0, to build for ROMv5a just change the line '_codeRomType_ ROMvX0' to '_codeRomType_ ROMv5a':

Code: Select all

_runtimePath_ "../runtime"
_runtimeStart_ &hFFFF
_arraysStart_ &hFFFF
_codeRomType_ ROMvX0

const SIZE = 8190

DIM FLAGS(SIZE)

cls : mode 3

init TIME

PRINT "10 iterations"

set TIMER, 0

FOR IT=0 to 9
    COUNT = 0
    FOR I=0 TO SIZE
        FLAGS(I) = 1
    NEXT I

    FOR I=0 TO SIZE
        IF FLAGS(I)
            PRIME = I + I + 3
            K = I + PRIME
            WHILE K <= SIZE 
                FLAGS(K) = 0
                K = K + PRIME
            WEND
            INC COUNT
        ENDIF
    NEXT I
NEXT IT
    
timer = get("TIMER")

PRINT "PRIMES: ";COUNT
PRINT "TIME: ";timer/60;" ";timer % 60;"/60"
Last edited by at67 on 03 May 2022, 10:53, edited 4 times in total.
lb3361
Posts: 360
Joined: 17 Feb 2021, 23:07

Re: Lies, Damned Lies and Benchmarks.

Post by lb3361 »

This is a very cool idea!

Here are a couple more benchmark lines. These times are for the Gigatron 512k because I was too lazy to remove the board. Multiply by about 1.3 to give approximate execution times for a normal Gigatron. You can also run them yourself but you need at least 64K RAM and ROMv5a.

Code: Select all

                                              GT1 size       Execution time
					       (bytes)        (seconds)

Gigatron 512K, gtBASIC, ROMv5a, NOCPU          1072             30.5

Gigatron 512K, GLCC, ROMv5a, NOCPU (v1)        6849             23.7
Gigatron 512K, GLCC, ROMv5a, NOCPU (v2)        4769             17.0
Notes:
  • The gtBASIC program is at67 program above compiled with the gtbasic version that comes with the devrom.
  • The GLCC v1 numbers are obtained with the unmodified official C code published in Byte. I just added "include <stdio.h>" in the beginning. This is compiled with options -map=64k,hionly and -rom=v5a.
  • The GLCC v2 numbers are obtained with a slightly optimized version of the C code. The main differences are using "memset" to clear the sieve array and moving the sieve calculation into its own function (a leaf function, the kind GLCC likes.) This version also has a rudimentary timing that only works when each iteration lasts less than 4 seconds (which is the case here). It also avoids stdio but still uses cprintf.
I believe that the gtBASIC version with ROMv5a suffers because the program does not contain any of the gtBASIC & and && hints. Therefore I tried to add a couple such hints and that reduced the execution time to 24 seconds with virtually no effort. But this is still far from the ROMvX0 speedup reported by at67.

The bulk of the code in the GLCC versions comes from function printf and from stdio. This is a recurrent problem with GLCC. If I remove all printing, the code size of v2 is 671 bytes. But as soon as I add basic console support, the code size inflates to 2k bytes. The gtBASIC runtime is amazingly compact!

I was surprised by the timing difference between the two GLCC versions of the sieve benchmark. I guess this makes a point about the "Lies, Damned Lies" part of the benchmark. I checked the assembly code and I could not see a substantial difference other than using memset to clear the array. Memset uses the very fast SYS_SetMemory. Is it all to it?
Attachments
sieve2_glcc_v5a_64k.gt1
(4.66 KiB) Downloaded 114 times
sieve2.c
(969 Bytes) Downloaded 119 times
sieve1_glcc_v5a_64k.gt1
(6.69 KiB) Downloaded 119 times
sieve1.c
(673 Bytes) Downloaded 137 times
at67
Site Admin
Posts: 647
Joined: 14 May 2018, 08:29

Re: Lies, Damned Lies and Benchmarks.

Post by at67 »

Threads like these have almost started intercontinental wars over the last 40 years or so, (especially in the 80's and 90's), that's why they are so much fun!

Just to add more fuel to the fire, I ran the benchmark I posted, (without any optimisations whatsoever), on a real ROMv5a hardware and I get the following:

Gigatron 64K, gtBASIC v1.1.0R, ROMv5a, NOCPU 1057 37.3

Does the 512K Gigatron run 23% faster because it is effectively running in a 'no video' mode compared to my Gigatron's mode 3? I should have specified in the caveats that the benchmark was run in mode 3.

The other interesting observation is that my run of ROMv5a shows 40.3seconds, which was a snafu by me, as even though I built the code for ROMv5a, I ran it on ROMvX0. ROMvX0 runs non ROMvX0 code anywhere from 0% to 8% slower than a native ROM, (depending on the app); so that explains the 40.3 to 37.3 discrepancy

WRT the memcpy, profiling the first for loop is the only way to truly know, I ran one iteration of the same code with the timer started after the first loop and got 2.97seconds, compared to 3.73seconds including it. This means that the first loop takes up approximately 25% of the total runtime, which surprised me, I would have thought it was around 5% to 10% max.

I spent a lot of time hand tuning the gtBASIC runtime, probably close to 1000 hours; some of the modules have had many ten's of iterations getting them in shape. The big drawback is that they are extremely intertwined, there is almost no saving and restoring of registers or state when entering and exiting the runtime modules and each module knows where each other module's data is, so there is very little copying or duplicating of data as well. It's a bit of a house of cards that is not easily expandable, but it gets very good performance in terms of code density and execution speed.

P.S. I ran your sieve2.gt1 and got 21.48seconds, but when I run sieve1.gt1 it runs for a while and then flashes a pixel in the top left hand corner continuously, (jumped to the ROM version checking routine somehow?). It does this for both ROMvX0 and ROMv5a, is this the timer issue you mentioned? I ran them in mode 3.

P.P.S. I used a stop watch as your sieve1.gt1 prints "1899 primes" before it jumps off into lala land, on ROMv5a I get ~29seconds, (sieve2.gt1 gets ~21.5seconds). It seems the memcpy makes a huge difference! I would expect your sieve1.gt1 to push around 24seconds if it was emitting all the ROMvX0 instructions it could on ROMvX0 64K hardware in mode3.
lb3361
Posts: 360
Joined: 17 Feb 2021, 23:07

Re: Lies, Damned Lies and Benchmarks.

Post by lb3361 »

The flashing pixel means that main() has returned. The horizontal position of the pixel is the return value. Both programs print the prime count when they're finished. Sieve2 also prints a time (but sieve1 does not because it is the unaltered official c code : no timer.)

Yes the Gigatron 512k runs vCPU during all four scanlines, minus a few cycles that are used to implement the high vertical resolution and forward the sound to the 8bit pwm. This is why multiplying the time by 4/3 is just a bit below target.
at67
Site Admin
Posts: 647
Joined: 14 May 2018, 08:29

Re: Lies, Damned Lies and Benchmarks.

Post by at67 »

Good point, my VBlank timer is stealing vCPU cycles, I should probably disable it and time by stopwatch a few times and take an average.
lb3361
Posts: 360
Joined: 17 Feb 2021, 23:07

Re: Lies, Damned Lies and Benchmarks.

Post by lb3361 »

Because I want GLCC to follow ANSI C89, I have to deal with all the formatting complications of function printf. All these rarely used features cost over 2KB in code size. And stdio adds another 2KB code. I already had a function cprintf() which bypasses stdio but still implements all the standard formatting whistles and bells. I just added mincprintf() which only understands %s and %d. One just has to compile with option -Dprintf=mincprintf.

With this option, the new GT1 sizes are:

Code: Select all

         ROMv5a   ROMvX0
sieve0    2816     2384
sieve1    2807     2376
There is still a some fat in the runtime but this was by far the main offender. One could gain 200 more bytes by reducing the number of control characters recognized by the console. The rest is mostly useful because it makes the runtime quite modular. Good for now.

Source code:
https://github.com/lb3361/gigatron-lcc/ ... tuff/sieve
https://github.com/lb3361/gigatron-lcc/ ... ncprintf.c

ROMv5a binaries attached (64K required)
sieve1.gt1
(2.74 KiB) Downloaded 121 times
sieve0.gt1
(2.75 KiB) Downloaded 124 times
As an aside, I would like to thank at67 for always creating the momentum to improve the Gigatron toolbox. Both compilers produce rather good code on this example. One could maybe shave a couple instructions by manual optimization, but that's about it.
Last edited by lb3361 on 24 Mar 2022, 13:16, edited 1 time in total.
gfernval
Posts: 40
Joined: 24 Jan 2022, 00:39

Re: Lies, Damned Lies and Benchmarks.

Post by gfernval »

The file https://github.com/lb3361/gigatron-lcc/ ... e/sieve1.c has a syntax error in line:

printf("\n%d %d/60 seconds", ticks/60, ticks % 60, count);

The third parameter (count) should not be there
gfernval
Posts: 40
Joined: 24 Jan 2022, 00:39

Re: Lies, Damned Lies and Benchmarks.

Post by gfernval »

In files https://github.com/lb3361/gigatron-lcc/ ... e/sieve0.c. sieve1.c and sieve2.c, where is the mincprintf() function?
at67
Site Admin
Posts: 647
Joined: 14 May 2018, 08:29

Re: Lies, Damned Lies and Benchmarks.

Post by at67 »

gfernval wrote: 24 Mar 2022, 10:16 The file https://github.com/lb3361/gigatron-lcc/ ... e/sieve1.c has a syntax error in line:

printf("\n%d %d/60 seconds", ticks/60, ticks % 60, count);

The third parameter (count) should not be there
Technically that's not a syntax error, (I would call it a logical error), this may sound like nitpicking but it highlights very clearly some of the peculiarities and drawbacks of C/C++ varargs and variadic functions.

e.g. Some C/C++ standards will define excess format specifiers to behave in an undefined manner, while excess arguments are to be processed as normal and the results ignored. So if the standard that lb3361 has adhered to is similar to this example, then this is not a syntax error and results in defined behaviour, i.e. more of a logical error resulting in wasted processing time and probably a slight increase in resultant code size.

It can be very difficult for a compiler to pick up logical errors such as these, e.g. imagine if the format string was contained in a variable.
lb3361
Posts: 360
Joined: 17 Feb 2021, 23:07

Re: Lies, Damned Lies and Benchmarks.

Post by lb3361 »

gfernval wrote: 24 Mar 2022, 10:47 In files https://github.com/lb3361/gigatron-lcc/ ... e/sieve0.c. sieve1.c and sieve2.c, where is the mincprintf() function?
The Makefile compiles with option "-Dprintf=mincprintf" which is equivalent to adding "#define printf mincprintf" at the top of the file. This redefinition affects both the prototype of printf in stdio.h and the uses of printf later in the file. This is a good way to change the printf function without changing the file.

At67 is right about the extra printf argument in sieve1. This is not a syntax error but a logic error with, fortunately, no consequence other than wasting a couple microseconds setting up a fourth argument. I just fixed it. This does not change the benchmark time, but it reduces the file size by a couple bytes, nothing very significant.

Code: Select all

         ROMv5a   ROMvX0
sieve0    2801     2375
sieve1    2793     2366
I should also point out how minor is the difference in timing between the gtBasic version and the GLCC version. It boils down to the way the while loop is implemented. When compiling "while condition { something }", glcc produces code that looks like

Code: Select all

    bra L2
L1: do something
L2: eval condition
    bne L1

whereas gtBasic does

Code: Select all

L1: eval condition
    beq L2
    do something
    bra L1
L2: ...
The C method removes a non-taken branch from the body of the loop. I deserve no credit for this. It was implemented by the machine independent core of the LCC compiler. I am pretty sure that at67 will have this implemented in a couple days, maybe a couple hours in fact ;-)
Post Reply