Benchmark BASIC is an interesting idea, but I love the optimisations. Magervalp's idea of using a REM statement to hold machine code is incredibly clever.
You could also store the machine code in a print statement between the quotes. It would print the "garbage" on the screen and you could actually execute it from two locations, the Basic ram (somewhere around $0800) or from the screen ram ($0400). ;)
RND(x) where x>0 calculates a random number based on the last random number generated, stores the result for the next RND call and returns the result. In whatever way it does that, some calculations are performed and something is stored. If x=0, nothing is calculated or stored but the return value is directly constructed from the CIA 1 Timer and the Realtime Clock value of the CIA 6526. This results in a very bad random value as all randomness is based upon two time values but therefore you have no calculation or storage overhead. And for the sake of completeness, if x0, but the calculation is not based on the last random value generated but on the value of x itself, so RND(-x) will always return exactly the same value for the same value of x.
11:38 That's pretty clever. I often time things and never thought to soft patch the kernel to get a more accurate result. 27:00 Last time I had something make it to hacker news, the number of people who thought they could unroll better than anyone in the history of the world was staggering. I'm going on record and saying yours is excellent for educational purposes. 28:56 Your point here is so important. Learning, fun, and community over who is the best
Regarding "copy/paste" on the C64, I recall more then one terminal program featuring this in the mid-80s, they just called it a 'buffer' and you could activate the buffer with a command, type whatever, close the buffer, then paste it all you liked. I have no idea if any of these were written in BASIC or not, I was a 14 year old gaming geek and didn't care much about the underlying tools that made my hot rod run. ;-) Very cool to see all this optimization of what at the first time I saw it I thought was a cool effect and something to type into the store's display computers as a kid, but nothing beyond that.
If he wants to factor that in, then it might be worth considering stopping drawing to the screen until complete. In Atari BASIC, the program will run about 30% less time. For this testing, I don't think that that kind of savings is worth it, though.
Wow, I really would have expected the integer variables to be faster. Thanks for the insight into the unoptimized integer implementation, that's an interesting find!
Integer variable values are actually _stored_ as integers (though with the high byte first for some reason); they're just converted to floats for all arithmetic operations. That doesn't matter much for individual variables, but is very helpful for arrays, since an array of N integers only takes up 2N bytes instead of the 5N required for floats.
@@JohnnyWednesday On Apple II, you instead use a separate "Integer BASIC" that's less capable but faster if you optimize your code for it. There may be a similar alternative here.
Hat's off to the MID$ with a de Bruijn sequence, that's really clever! My SYS to a REM statement was always cheating, and I typed it in using the monitor. It's possible to tweak it so it works when typed in though. Not sure how to share the listing, but a hex dump looks like this: >C:0801 26 08 00 00 9e 32 30 36 31 3a 8f 22 a0 df c8 8c >C:0811 15 08 20 be a0 a2 cd 8a a4 8b 30 02 49 03 20 d2 >C:0821 ff 4c 13 08 00 00 00
You could write it as 9t and so on. I typed in your solution in Basic, but it never worked. Even made sure to reverse the checkerboard pattern...no dice.
this works. i tested it using the VICE emulator and also using CBM prg Studio. and i entered it using just the keyboard--no need to use the monitor 0a$="{sh pound}@*"+cH(141)+"{ct o}T"+cH(141)+"{reverse on}T{cm z};T){ct a}"+cH(105)+cH(109)+" R{pi}P"+cH(244):sY40852 it's case-sensitive. "sh" is the shift key, "cm" is the Commodore key, "ct" is the control key. note there's a space before the "R", other than that, no spaces. the sequences in { } (eg, "{sh pound}") should be replaced by the actual keystroke, ie, press Shift+Pound. "{reverse on}" is Ctrl+R. note: it needs a C64, not a C128 (since the C128 SID is different). asm: .C:9f94 A9 40 LDA #$40 .C:9f96 2A ROL A .C:9f97 8D 0F D4 STA $D40F .C:9f9a 8D 12 D4 STA $D412 .C:9f9d AD 3B D4 LDA $D43B .C:9fa0 29 01 AND #$01 .C:9fa2 69 6D ADC #$6D .C:9fa4 20 D2 FF JSR $FFD2 .C:9fa7 D0 F4 BNE $9F9D
I appreciate the joke thumbnail (even if unintentional) and that you used two exclamation marks instead of one or three. That makes it authentic. Just missing a red arrow pointing at something random. Somehow I never tire of your 10 PRINT antics. Making an Apple version drawn with pixels on the C64 would be interesting.
Ha, I'm glad you noticed the 2 exclamation marks :) Okay, now I'm tempted to add the random red arrow and actually use that as the thumbnail. I wonder where I should point it. Maybe at the F5 key, that just seemed funny to me. You know, I actually did a pixel version of 10 PRINT for both VIC-20 and C64 that somehow turned into that Apple video last month with no mention of either of those. I'm not kidding that for every three or four things I play around with only one gets turned into a video. There's a huge trail of good ideas that I need to get back to.
@@8_Bit it's all the rage to include a photo of yourself looking aghast; apparently that works as clickbait. That would be funny but I know you prefer not to show yourself.
Great video again Robin! I haven't had the chance to read all the comments yet so maybe my contribution is redundant. I used C64 VICE in NTSC mode to compare the different versions. The original for/next version was measured as: J:246 S: 4.1 Here are my one liner versions for even less jiffies spent: (Best being 239 jiffies) What works for the program version: instead of using chr$(a+rnd(.)) I swapped the two operands of the addition. The idea behind this was that in some cases multiplications, divisions etc can take more or less iterations depending on the number of bits on the left and right side. Without knowing exactly how the C64 FP arithmetic internals work I still had this assumption. So the chr$(rnd(.)+a) version ended up with overall 1-2 jiffies less than chr$(a+rnd(.)). J:245 S:4.083 What only works in immediate mode: Two more areas where we can speed up this program: 1) Doing less FP arithmetic 2) Unrolling more iterations somehow For 1): I looked for a solution which can branch without doing arithmetic operations. NEXT does FP addition and comparsion against STEP and TO which takes time. From all the branching instructions I found only a few which are not based on line numbers (so works for the one liner versions): END - Not usable as branches into input mode STOP - Same as END RETURN - Requires a GOSUB and heavy trickery on the stack CONT - Hm.. we have a winner What CONT does is that it simply restores the basic program counter and line numbers to where a STOP or BREAK occured. So I just needed to force it to somehow break before the ? so that CONT jumps back. Obviously breaking is not an option. But the way BASIC works is that it stores the address of the current basic instruction in the (61,62) zero page locations before it executes it. CONT uses this saved address to resume the program. Sadly it is not usable in a stored program because BASIC automatically updates this for every instruction, so a simple CONT in the program would jump back to the same CONT. However in immediate mode you can not CONT, so basic does not modify (61,62) location. So all you need to do is store the address of the first statement of the loop in (61,62) then any CONT can jump back to that location. The immediate mode maze generator using CONT is: a=205.5:pO61,20:pO62,2:?cH(rN(.)+a)cH(rN(.)+a)cH(rN(.)+a)cH(rN(.)+a);:cO J:240 S:4 Sadly this is one less iteration unrolled than the original version, but still faster. For 2): To add more statements into the same line these must be crunched even more. The idea here was to enter the already tokenized basic code somehow because the functions and statements take 1 byte instead of 2 or 3. At the end of your video there was a version which contained a machine code after the REM. Using the similar tech but for basic we can write the following immediate mode maze generator: (If you see a that means you need to press Commodore key with the key after it) a=205.5:pO61,24:pO62,2:cO:":#6G(#f(.)#na)G(#f(.)#na)G(#f(.)#na)G(#f(.)#na)G(#f(.)#na);:#7 J:239 S:3.983 Explanation: 61,62 is set to the location of the first : after the ". The first CONT jumps to that location so the " is skipped this way. The one and only quote mark here is needed to store the special token codes. Basic directly stores the string without altering it. No need to close the string. Sadly there is no way to enter the token code of NEXT (130, 0x81) in the string so the for/next version can't be used this way. P.S.: I had to make a patch for your BBASIC to measure immediate basic lines too.
If apps nowadays would be written with the same care and optimization, they would run perfectly fine on 100 Mhz machines too. ;) Unfortunately the virtually unlimited CPU power, memory and storage, has created many lazy programmers.
I used to own the C128 back in mid 1988, but I stared out on an Atari 1300 home computer system back in the summer of 1986. I just loved that C128. I played around with its 6502 Assembler Language, after understanding peeking and poking, via invoking into machine code, but it still wasn't as fast as full 6502 machine code. Peeking and poking were from advanced basic near the back of the thick white book. They don't make computers like that anymore. Educational machines at one point in time. I was lucky enough to get into it back then. I was 21 years very young when I started.
I love the concept of these videos. It's very similar to speedrunning video games. Who can get the same thing done fastest? It's awesome to see the ingenuity it takes.
Another boost: 0 a=4:a$="...MNMNNNNMMMMNNMNMMNM":forb=16to.step.:printmid$(a$,a+b*rnd(.),a);next The M is the graphic backslash, N the graphic slash (shifted M, N on the keyboard). The sequence gives all 16 possible variations (2^4) of 2 symbols on 4 places by indexing from 4 to 19 (16 + 3 symbols). This index is randomized, selecting one of the 16 different 4 character sequences. The dots are a filler to be able to use variable A (primarily needed as string length) also as some offset, because MID$ needs something > 0. A is defined earlier for faster access, because it is referenced twice, A$ and B only once. This takes on my PAL C64 171 jiffies or 2,85 seconds. This seems the maximum possible if keyed in via screen editor (with abbreviations). If we dismiss the line limit of the screen editor this game could played further as far as we keep A$ limited to the maximum of 255 characters (the line length is unlimited, but a re-chaining of the program must not happen). If an appropriate A$ can be composed, the a possible for A and B would be 7 and 128, giving a length of 140 for A$. This task is left to the readers (or Robin), to push it to the extremes ... ;)
Just for comparison and analysis: Saw @csbruce's MID$ approach later skimming through the comments, but it is limited to a 3 character sequence. The above one accomplishes a 4 character sequence by using a overlapping structure of all needed 4 char sequences which have 1 index distance (saving a multiplication too).
@@8_Bit Did it manually. My first attempt for triple sequance came fast. I'm not expected to be successful for quads, but found also a solution quite fast. I thought it would take much more time to develop an algorithm for this task ... I didn't check, if there is already a known general solution. However,, a contest to search for a Basic solution would came into my mind ... 😉
Programming the C64, first in basic and then in Machine Code with this visceral direct connection to the hardware being able to modify registers and address spaces directly is something that is missing from today’s first computer experiences.
Very cool video. Love the benchmark basic. I also saw the thing you mention at 29:04. I was going to mention it before I saw you did. You can stack CHR$ in there as much as you like, not just one. Good stuff.
This basic had a PI constant, and it was the actual character? Wow. I'm beyond impressed. (the float vs. int thing is a bummer, but as someone who took their first walking steps with AmigaBASIC, this is something I can still appreciate, as I had seen my brother do these weird "BASIC" things when I could still barely write) The "crunch" by removing whitespace thing is absolutely barbaric, but it explains how the parser works in 8k.
@@thygrrr It's actually a bit weird: Commodore 8-bits have two character sets, one upper-case/graphics, the other upper-case/lower-case. Pi is only "π" in the upper-case/graphics set, but is displayed as a checkerboard character in the upper-case/lower-case set. To make this even a bit more interesting, this appears twice in the character set, at 0xDE (222) and at 0xFF (255), and Pi as a BASIC token is 0xFF. It get's even weirder: using the BASIC command 'VAL()' we can evaluate the numeric value contained in a string. E.g., 'PRINT VAL("1")' prints 1, as does 'PRINT VAL(CHR$(49))' - 49 being the PETSCII value of the character "1". So which is the real "π", 222 (0xDE) or 255 (0xFF)? Turns out, neither of them: both 'VAL(CHR$(222))' and 'VAL(CHR$(255))' return zero! 'π' is only recognized by the routine that tokenizes a line of BASIC text present in the BASIC input buffer - when you hit return to enter a line of BASIC -, but nowhere else. It's just kind of a magical patch. Or the other way round, as a BASIC token, it doesn't have any character value, just like the token for PRINT doesn't have one.
I used to write code in a Z80 BASIC for a machine known variously as the LASER 200 and Dick Smith VZ200. It had a total of 24k RAM including expansion module. I wrote a text adventure game in BASIC and crunched everything I could to fit it in. Free memory when I was done was 6 bytes. Those blank spaces were memory munchers I just could not afford, let alone REM statements or any other niceties. But it worked and it sold quite well when it was published.
Wow, how cool that we can compact the typing of one BASIC line into what's still "80" logical characters, or originally 2 screen lines, but then it uncompacts into what ends up being most of 3 screen lines! OK, your new challenge: Write a C64/128 mode (but still 40-column) noticeably working BASIC program with the longest unfurled logically "80"-character line! IOW, use the longest BASIC commands and operands, etc. with the shortest abbreviations that you can assemble to do something that's noticeably functional.
Suggestion for improvement: Create two internal variables, a line number L as a breakpoint, and a number N as repetitions. When the program hits line L for the Nth time, stop and print the jiffies. That'll allow you to benchmark non-halting programs that never scroll. (Like, updating a number at the top left corner of the screen.) You could combine this with your "array of time points" suggestion setting a third variable, NE (number of external repetitions).
I remember my computer studies teacher gave us a challenge to draw a circle full screen in the fastest time. It was a great lesson in optimisation and the tricks you need to do. Could still be a good interview question.
Can you do faster than Marvin Minsky? NEW X = OLD X - epsilon * OLD Y NEW Y = OLD Y + epsilon * NEW X where epsilon is a (very) small number and this is essentially a division. Which is, why we can use right-shifts, instead, here using 1 / 2^8 for epsilon: X := X − ( Y >> 8 ) Y := Y + ( X >> 8 ) Discovered by Marvin Minsky in the early 1960s on the PDP-1 (actually the first pre-production model). David Mapes made the same discovery, also on a PDP-1, the other pre-production model at Lawrence Livermore National Laboratory (LLNL). (It's actually not a perfect circle, but a fat ellipse, but with a small epsilon it's "good enough" for an approximation.)
@@FadkinsDiet It's actually a quick sine/cosine approximation for small angles, where we use 1 for cos(x) and epsilon for sin(x), gone wrong. For the traditional approach, we'd use both the old X and old Y to derive the new coordinates. But this approximation isn't good enough and the curve will spiral out. By accidentially using the newly derived X, instead, the function somewhat magically auto-corrects. (To see why that is, we may have a look at the determinant of the matrix representation of the function: this must be 1, in order for a conic to meet up with itself as it wraps around a cone. With the original formular, this is greater than 1, it's off by epsilon^2, and the ends do not meet, resulting in the spiral. The accidental version introduces just the right amount of error in order to make this 1. But it's a fat ellipse rotated by 45 degrees.) Rich Schroeppel commented in HAKMEM, the collection of MIT AI-Lab memos, on this: "PROBLEM: Although the reason for the circle algorithm's stability is unclear, what is the number of distinct sets of radii? (Note: algorithm is invertible, so all points have predecessors.)" (HAKMEM, Item 150) Gene Salamin observed in HAKMEM, Item 152: "With exact arithmetic, the circle algorithm is stable if |epsilon| < 2. In this case, all points lie on the ellipse X^2 - epsilon X Y + Y^2 = constant, where the constant is determined by the initial point. This ellipse has its major axis at 45 degrees (if epsilon > 0) or 135 degrees (if epsilon < 0) and has eccentricity sqrt(epsilon/(1 + epsilon/2))." The really interesting thing about this is that this is a discovery, which would have been hardly made without a computer with a display attached to it to plot the function. But, as soon as there was such a thing available, particularly the PDP-1 with its X/Y plotting display, it was made several times about at once, since it's based on an error easily to be made and the result is as easily observable.
I grew up with a C64, but i was just beginning to code. The lessons of space and logic optimizations paid off when i got a HP 48GX calculator for college and began coding lots of helper functions for math and science classes.
Awesome stuff, brings back memories of the '80s! Such a shame that young programmers these days have no insight of the inner workings of the machines the write programs for. Thanks for making these videos!
I encourage young programmers to look into any free Coursera computer science courses. There used to be a great Computer Science 101 course. There are probably UA-cam videos, too.
Back in the day, one of the TRS80 magazines ran a one line game challenge, where you had to write a game in 251 characters. All the concepts you talk about here were used to get the best performing game 😄
Ah, so that's the basis of the "backrooms" portal field generator. LOL :) Kidding! I remember this simple little code, back when I had the C-64. Was always cool to watch that random maze generation. Thanks for sharing.
Yeah, that ending SYS example _is_ a cool idea, even though it is technically a cheat according to the spirit of BASIC. So what does that system call to memory location 2016 (decimal) do: read whatever's in the REM statement and take it in as ML bytes? And what does SYSing that without the REM statement do?
2061 is where the source code of the REM statement is stored in memory, so SYS just jumps to it as machine language, yes. Without the REM statement it would just run whatever junk happens to be in memory. It'll probably crash.
Oh, interesting, @@codahighland, thanks! I could try the SYSing myself, but first I have to find or replace my A/V cable. I suppose I could just download the VICE and try it on there, but I'm never really sure how picture-perfect the VICE is.
@@HelloKittyFanMan it works fine in VICE, but it looks like you can't actually _type_ the line perfectly like it's displayed. There's a couple of characters that don't come out right when typed. See the thread by @judgegroovyman below.
Great video! It would have been nice to have a scoreboard in the upper-right corner, as my short-term memory isn't what it used to be (and it never was much, to begin with).
In some languages a print statement itself is very slow. In those cases it helps to add characters to a string var and print that. Dunnu if that would work here.
PRINT is fairly fast on the Commodore computers. String variables have a limit of 255 characters in this (and most 8-bits) BASIC and we need to print 1000 characters to fill the screen, so I'm fairly certain it'd be slower to loop through multiple times.
Here is a mid$ version that calculates the "lookup table" s$ with overlapping strings of length l (here 7 chars). 10 s=0:l=7:n=2^l:dima(n-1) 11 fori=1tol:s$=s$+"M":next 12 : 13 fori=1ton:a(s)=1:s=s*2+1:s=s-n*int(s/n):s$=s$+chr$(206-a(s)):s=s-a(s):next 14 : 15 fori=.ton-1:k=k+a(i):next 16 ifknthenprint"checksum error":stop 17 : 18 fori=nto.step.:printmid$(s$,rnd(.)*n+1,l);:next The precalculation takes a while but you can print s$ and write a new program that uses its value directly.
@csbruce UA-cam/google has been censoring more of your comments lately, maybe because there's more code than english? I don't know. When I find them in the "Held for review" I of course approve them, but if anything goes missing completely that's why. Sometimes UA-cam just deletes stuff and doesn't even give me a chance to do anything about it.
@@andymanaus1077 I think that a good rule is that infinite loops are not allowed, but 1 loop to print the code is allowed. There could be several categories. * single line 1 loop * unlimited lines 1 loop * single line unlimited loops * unlimited lines unlimited loops
Not sure if this counts since it doesn't use print or CHROUT, but it fills the screen in about 6 seconds: 10 a=77.5:fori=1024to2024:pokei,(a+rnd(0)):next
Hi. Your code needs 377Jiffies on a NTSC machine. You can speed it up with this changes to 317 Jiffies. That's almost 1 second faster :) 0 A=77.5:FORI=1024TO2023:POKEI,RND(.)+A:NEXT
Are you familiar with "Waterloo Structured Basic"? Back in the day it was available on the PET computer. My Mom and Sister learned about and worked with it at the University of Waterloo. I heard rumours that it was available on both VIC 20 and C-64. Have you heard of it on the VIC 20 or C 64?
Yes, I have heard of Waterloo Structured BASIC, and I've seen cartridge images of it online. I've got it on my long list of things to eventually look into. I'd love to get an original copy of it but I'll use a copy if I haven't found an original by then.
HI Robin. Thanks for these videos. You can further optimize by changing A+RND(.) to RND(.)+A. Using the unrolled version at 28 minutes, I was able to save an additional jiffy. It's not much, but it does make me wonder what the interpreter is doing to make that even the slightest bit faster.
The machine language in rem at 30:34 doesn't work as typed. Mine matches exactly in lowercase AND uppercase but doesn't do anything when you run it. Is there something else required that is over my head or does it just not work? Regardless of all that this is a great video and project. Thank you!
I'll try to get an actual program file from the author and see if I can figure out what's wrong. I'm only guessing but certain characters appear twice in the character set so it might be that it's using the alternate one.
If you're using an emulator, or a C64 with a freezer cartridge, then with the BASIC program entered, go into the machine language monitor and enter the command: m 0801 Then compare the hex dump (formatting might be a bit different) with the following: >C:0801 28 08 00 00 9e 32 30 36 31 3a 8f 22 a2 ff 8e 0e >C:0811 d4 8e 0f d4 a2 80 8e 12 d4 ad 1b d4 29 01 18 69 >C:0821 cd 20 d2 ff d0 f3 00 00 00 I suspect there's one or more bytes that don't match up.
@@8_Bit Thanks for the hex dump! I was able to shave off nine bytes by removing a few unnecessary instructions, as well as the REM statement since I don't think it's possible to type in the entire line using just the C64 screen editor... >C:0801 0b 08 00 00 9e 32 30 36 31 00 00 00 >C:080d a2 80 8e 0f d4 8e 12 d4 ad 1b d4 29 >C:0819 01 69 cd 20 d2 ff d0 f4 *=$801 ; basic header !pet $0b,$08,$00,$00,$9e,"2061",$00,$00,$00 start: ldx #$80 stx $d40f stx $d412 - lda $d41b and #$01 adc #$cd jsr $ffd2 bne -
@@8_Bit I had the same problem and eventually came to the same conclusion. After painstakingly figuring out how to type it in with a poorly keymapped VICE setup, things weren't working. Looking in the monitor it quickly became apparent why. The incorrect bytes is a C9 instead of 69 at 0820 and a B3 instead of F3 at 0826. Fixing it in the monitor, then list'ing and hitting return on the displayed line in the listing breaks it again, so wouldn't that mean that it's not actually type'able?
@@8_Bit thank you! lets see ... mine says C:0801 28 08 00 00 9e 32 30 36 31 3a 8f 22 a2 ff 8e 0e C:0811 d4 8e 0f d4 a2 80 8e 12 d4 ad 1b d4 29 01 18 c9 C:0821 cd 20 d2 ff d0 b3 00 00 00 clearly there were differences there (c9 instead of 69 (noice) and b3 instead of f3) so now I will do two things: 1. I will try to change it in the monitor and get it working (Update: changing those bytes in the monitor works great!) 2. I will try to see how I could have typed it in differently
Great video, Robin :) I totally agree with You, even if it's not the fastest solution, one-line is totally sexier :) Thank You for Benchmark BASIC, it will be a very handy tool to have. Cheers!
Real C64 nerds should go even further: Next step is to write an Assembly version of this (which was done in a previous video, if I recall correctly), then arguing how using SID's voice 3 as a random number generator is for "noobs", so not good enough for Assembly 10PRINT. And then arguing once again what is the best sophisticated RNG to use, and how can we squeeze ONE more jiffy by tweaking the Assembly code again and again for an eternity. 😁
I typed in the program with the REM line, but it doesn't seem to work. For example the SHIFT+I character is always interpreted as $C9 and hence CMP, but it should be interpreted as $69 which stands for the mnemonic ADC. Or is there a way to fix that?
Unrelated to this video in particular, I've been curious whether the colors your Commodore 64s and 128s produce in these videos are what your capture device spits out by default or if you've had to do some fine tuning to make it match what you've come to expect it to look like on a vintage TV or monitor. I ask because-while I haven't had the pleasure of seeing a Commodore 64 screen in person in quite some time-it feels more accurate to my own memories than anything VICE has been able to produce, both before and after the multiple tweaks that have been made recently to its color handling.
I don't do any video processing for the particular 64C, VIC-20, or C128 that I usually use. Part of why I choose those particular machines (I have multiples of each available) is because I like their video output. I'm not particularly happy with the output of any of my breadbin C64s so I will often tweak those in post, and when I use more obscure computers or consoles, especially with only RF output, I will usually tweak those too.
Would poke to frame buffer be less overhead over print. Also with poke you write 8-bit value which maybe could be bitwise done. I don't know if c64 basic supports this. Also maybe random value could read from ROM with AND 0X01
when i was young i wrote a programm for someone. I wonder how much faster it could have been with those tricks you showed here which i had no idea about. One trick i used was ... since it was a C128 (i used the C64 mode for my programm dont know why anymore) i Poked the speed up to the C128 Clock. While the calculation was done you could not "see" anything on screen but once it ends i Poked the speed back to the C64 clcock and the videooutput was ok again :)
Now I want to see more assembler source sticking 75 (?) bytes of machine code at a time in REM statements. 😀 (Ok I actually checked and you can get 75 bytes after 1REM” I guess for a line that can actually be typed in.)
And I think it'd be 68 bytes left in a one-liner that includes the SYS2061:REM" so it can be RUN. There's also the limitation that not all possible bytes are type-able or compatible with the REM listing. I'll have to study this sometime if I'm to make a video on it, because I think there's a lot of "gotchas" with this trick. But it's still very cool.
Department of cheating: You can type a lot of data on screen before hitting return, and the last two screen lines will be interpreted by BASIC. Then your "one liner" could read the data above. This cheat is broken by doing a clear or list, though. However, if the real line extracted data from the screenful and stored it as a line in BASIC's memory, this could make a super long line that *could* be listed and saved, but not entered. Hmm. Maybe I should stop meta-meta-programming...
I'm not an expert, but I think that POKE does time consuming stuff due to BASIC being an interpreted language whereas PRINT is a genuine short cut. I vaguely recall running my own tests. That's a great question, though. In Atari BASIC, you can treat characters as a colour, in the text based graphics modes, if I recall correctly, but nothing seems to beat PRINT for speed and general convenience. When I say, "colour", I mean that you can draw them across the screen.
Robin, you had demonstrated that integer math was slower on the Commodore BASIC, but has anyone ever optimized BASIC to make integer basic the faster solution?
There are some BASIC compilers that will speed up integer calculations quite a bit, but I'm not aware of any extensions that add full integer support to the interpreted language.
I believe you can use shortened 'shift' commands and, as long as you don't list the program first, it will run fractionally faster since the commands are shorter. If you list the program it's destructively expanded back into memory, so you lose the benefits. That may be a VIC 20 quirk, though, as it was a way to get a little more out of your 4K. In that case, if you listed, you'd get an out of memory error as it ran out of room expanding.
Nope. The shifted abbreviations are only seen by the tokenizing routine. Abbreviated and spelled-out keywords result in identical code being stored in the program. LISTing it or not has no effect.
One of my old videos is called "About Commodore 64 BASIC Abbreviations" and it covers all this; it applies exactly to the VIC-20 as well. The short version is @bxdanny is correct; once you type the line and hit return, all commands are tokenized to single bytes whether they were typed in full or abbreviated.
First I don’t know the c64 can you not store the A value (int 205) in memory along with the RND val (1) before you read the with poke in an unused part of ram then in your line peek it out as it will already have been phrased so 10 print chr$(peek(stored add)) + rnd(peek (stored other add)). Of course assuming c64 has poke and peek. The other way would be to look through ram for a value of 205 and 1 then use those addresses Bob
@eugenetswong I like his style of presenting all these topics that concisely. All the topics are selected very well and are always on the upper edge of C64 knowledge. 😀
Was there ever a BASIC compiler for the C64? That was one of the reasons that I loved QBasic 4.5 because it could compile a binary and was so much faster that way, but I'd imagine that the memory constraints of the C64 would make that a significant challenge. For instance, how much memory would the program listing take up versus the binary and could you actually store both without eating all your memory too quickly. Which I guess naturally leads to a second question. How many instructions could you usefully store in memory and still get work done?
I'm sure there were BASIC compilers for all the popular 8-bits. I'm not sure how they worked, but I'm sure they didn't have such a hard time as the FORTRAN compiler for an extremely early IBM mainframe with hardly any RAM and no ROM, disk or tape. The only IO was punched cards. The way it worked was to keep the program in memory and divide the compiler into programs tiny enough to fit in the remaining space -- 46 of them! The programs were loaded one at a time, and each would transform the program a little bit towards its final state. 8-bit compilers had it much easier, they could just require the user to have a disk drive and transform one file into another.
I liked QuickBasic 4.5 as well, even though when I finally got my hands on a copy I had already gotten Borland C/C++ and Turbo Pascal compilers. Still, I loved tinkering with different languages, it was mainly for *fun.* So... Have you checked out QB64?
@@robsku1 Yeah, it's a neat project, even if it's not yet complete. It has a nice interface, but it doesn't seem to handle mouse input very well if you resize it. And it doesn't seem to implement the quirky usage of `def` that Gorillas makes use of. I'll have to rewrite that function and see if it works.
Astounding, something like A$(0)=CHR$(205):A$(1)=CHR$(206):FORA=.5TO-STEP.:PRINTA$(A+RND(.));:NEXT is roughly 20 % slower ,,, Ok, it's fairly plausible that an array access seems to be more complex than the CHR$ function ... ;)
RUN is a little slower than GOTO without a line number. Presumably RUN takes a bit of extra time to re-initialize various pointers, like the variables.
Perhaps, but by that rule, half of the interesting things in DOS would count as code injection attacks. :) Many TSRs hooked into the keyboard interrupt so they could see what keys had been pressed before even the BIOS did.
Yes. RND behaves exactly the same with any parameter greater than zero, so (1) or (205.5) or (65535) are all equivalent. The difference is that a variable is parsed slightly faster (especially in a program with very few variables) so A+RND(A) executes a little faster than 205.5+RND(1).
Haven't watched the addendum yet, but there is another trick you can do while unrolling. You're repeatedly calling PRINT, which is unnecessary. PRINT CHR$(...);CHR$(...);CHR$(...) should be faster, and as a bonus you can squeeze in more unrolling into one line. Another thought is using strings. Construct a string using unrolling, and then PRINT it. I'm not sure this would save time. Stringa are kinda wonky in C64 BASIC. You could absolutely make a C program faster by doing something like that, because printf() is slow. Then you could also combine unrolling with FOR, because you can amortize the extra cost of doing FOR over several unrolled statements. This way you could combine the one liner AND the A=205.5 without the RUN 1 stuff. Edit: Lol, obviously other people had already suggested this. I should watch the entire video before commenting.
question would be more 'how fast the basic random function is' compared to just getting a free running counter from a 6526, noise from the sid or just the scanline AND $01 with peek. ya know. peek is probably faster than anything 'basic' :P lol. just like using 'stdout' 'print' :P (parsing the whole thing, going through basic, going through the kernal, going through the cbm editor rom) is probably hell a lot slower than just poking it into video ram. :P not entirely sure what the 'rules' would be before it would no longer be considered 'basic' or 'one line' :P does this have to run on any cbm/mickeysoft basic compatible machine? or just and only c64's, that sorta thing... :P cuz if it doesn't need to run on 'antyhing' there probably are a lot faster ways.(actually to this very day... the slowest part of any program is and always has been the output to a screen for some reason. it's even slower than most networks :P so 'most of the speedup' can probably be made there. (not limited to c64's. also applies to your pc on your desk :P also options such as simply changing the entire character set to just / or \ and not even having to do the and and add spring to mind. :P would things like 'simply entering the machinecode as petscii into the basic line and then jumping to it' still count as 'basic' :P ya know.. :P at which point does it stop being 'basic' ;P (even basic loads to fixed addresses, so this is no problem).
As far as I can tell, the machine code version in this video does not work if typed in from BASIC. Any opcode or data in the ranges of $60-$7F and $E0-$FE cannot be used because in PETSCII, they are copies of codes at $C0-$DF and $A0-$BE respectively, so if you type the correct character in it changes the opcode. In the case of this code, ADC and BNE cannot be used because ADC gets changed to CMP and the BNE relative jump back changes from F3 to B3. I made another version that has assembly that utilizes ASL, BCC, and INY to switch between 205 and 206. This can be typed in from BASIC. Here is the assembly of just the executable machine code. 080d ldx #$ff 080f stx $d40e 0812 stx $d40f 0815 ldx #$80 0817 stx $d412 081a ldy #$cd //load y with 205 081c lda $d41b //load a with random # from noise waveform 3 081f clc 0820 asl a //shift hi bit of random number to carry 0821 bcc $0824 //skip increment of y if carry is clear 0823 iny //if carry is set then increment y to 206 0824 tya //transfer y to a for printing to screen 0825 jsr $ffd2 //print to screen 0828 jmp $081a //loop back
You could save the random number command by omitting it, and wiggle the power cord just enough to induce random numbers in memory. lol (jk) I really like the machine code packed in the REM line, now that's really clever, course the actual routine is no longer running in Basic. heh
Your obsession with the 10? code is adorable.
And the amount of stuff we learn about the innards of C64 Basic is astounding.
I really like how these simple exercises lead to a much deeper understanding of the whole system.
I love how these videos take me back to my teens to early 20s when programming in BASIC was my favourite hobby.
Some guys were chasing after girls, while the smart ones were chasing after shortening and speeding up a one liner as much as possible. lol
Benchmark BASIC is an interesting idea, but I love the optimisations. Magervalp's idea of using a REM statement to hold machine code is incredibly clever.
I was about to say the same thing about storing the machine code in the REM statement. Very clever indeed!
You could also store the machine code in a print statement between the quotes. It would print the "garbage" on the screen and you could actually execute it from two locations, the Basic ram (somewhere around $0800) or from the screen ram ($0400). ;)
It’s not. It’s existed since the beginning of basic listings.
so clever. I'm working to get it running right now. Its really cool
@@judgegroovyman let us know how...
RND(x) where x>0 calculates a random number based on the last random number generated, stores the result for the next RND call and returns the result. In whatever way it does that, some calculations are performed and something is stored. If x=0, nothing is calculated or stored but the return value is directly constructed from the CIA 1 Timer and the Realtime Clock value of the CIA 6526. This results in a very bad random value as all randomness is based upon two time values but therefore you have no calculation or storage overhead. And for the sake of completeness, if x0, but the calculation is not based on the last random value generated but on the value of x itself, so RND(-x) will always return exactly the same value for the same value of x.
11:38 That's pretty clever. I often time things and never thought to soft patch the kernel to get a more accurate result.
27:00 Last time I had something make it to hacker news, the number of people who thought they could unroll better than anyone in the history of the world was staggering. I'm going on record and saying yours is excellent for educational purposes.
28:56 Your point here is so important. Learning, fun, and community over who is the best
Simply the best way to enjoy my morning coffee. Thanks for another great 10 PRINT video Robin.
Thanks Glen, nice to hear from you.
Regarding "copy/paste" on the C64, I recall more then one terminal program featuring this in the mid-80s, they just called it a 'buffer' and you could activate the buffer with a command, type whatever, close the buffer, then paste it all you liked. I have no idea if any of these were written in BASIC or not, I was a 14 year old gaming geek and didn't care much about the underlying tools that made my hot rod run. ;-)
Very cool to see all this optimization of what at the first time I saw it I thought was a cool effect and something to type into the store's display computers as a kid, but nothing beyond that.
Wow seeing the basic of the c64 drawing so fast is incredible, it s like seeing a total different computer, very nice!
I was giggling all the way through this, at the amazing, ingenious improvements that you made. Clever stuff, and to me, it's a programming comedy.
I love these kind of videos. It'd be perfect if you ran that machine code so we would see how fast it draws compared to some other method.
If he wants to factor that in, then it might be worth considering stopping drawing to the screen until complete. In Atari BASIC, the program will run about 30% less time. For this testing, I don't think that that kind of savings is worth it, though.
I was hoping he would show that.
Wow, I really would have expected the integer variables to be faster. Thanks for the insight into the unoptimized integer implementation, that's an interesting find!
Integer variable values are actually _stored_ as integers (though with the high byte first for some reason); they're just converted to floats for all arithmetic operations. That doesn't matter much for individual variables, but is very helpful for arrays, since an array of N integers only takes up 2N bytes instead of the 5N required for floats.
That's correct, thanks.
High byte first is basically irrelevant on an 8-bit machine because there aren't any meaningful 16-bit operations that would actually care.
@@codahighland Well, except the ones that read an address from memory. :)
So are there any extensions to BASIC that wrap up Integer instructions for pure integer based calculations?
@@JohnnyWednesday On Apple II, you instead use a separate "Integer BASIC" that's less capable but faster if you optimize your code for it. There may be a similar alternative here.
Hat's off to the MID$ with a de Bruijn sequence, that's really clever!
My SYS to a REM statement was always cheating, and I typed it in using the monitor. It's possible to tweak it so it works when typed in though. Not sure how to share the listing, but a hex dump looks like this:
>C:0801 26 08 00 00 9e 32 30 36 31 3a 8f 22 a0 df c8 8c
>C:0811 15 08 20 be a0 a2 cd 8a a4 8b 30 02 49 03 20 d2
>C:0821 ff 4c 13 08 00 00 00
You could write it as 9t and so on. I typed in your solution in Basic, but it never worked. Even made sure to reverse the checkerboard pattern...no dice.
this works. i tested it using the VICE emulator and also using CBM prg Studio. and i entered it using just the keyboard--no need to use the monitor
0a$="{sh pound}@*"+cH(141)+"{ct o}T"+cH(141)+"{reverse on}T{cm z};T){ct a}"+cH(105)+cH(109)+" R{pi}P"+cH(244):sY40852
it's case-sensitive. "sh" is the shift key, "cm" is the Commodore key, "ct" is the control key. note there's a space before the "R", other than that, no spaces.
the sequences in { } (eg, "{sh pound}") should be replaced by the actual keystroke, ie, press Shift+Pound. "{reverse on}" is Ctrl+R.
note: it needs a C64, not a C128 (since the C128 SID is different).
asm:
.C:9f94 A9 40 LDA #$40
.C:9f96 2A ROL A
.C:9f97 8D 0F D4 STA $D40F
.C:9f9a 8D 12 D4 STA $D412
.C:9f9d AD 3B D4 LDA $D43B
.C:9fa0 29 01 AND #$01
.C:9fa2 69 6D ADC #$6D
.C:9fa4 20 D2 FF JSR $FFD2
.C:9fa7 D0 F4 BNE $9F9D
I appreciate the joke thumbnail (even if unintentional) and that you used two exclamation marks instead of one or three. That makes it authentic. Just missing a red arrow pointing at something random.
Somehow I never tire of your 10 PRINT antics. Making an Apple version drawn with pixels on the C64 would be interesting.
Ha, I'm glad you noticed the 2 exclamation marks :) Okay, now I'm tempted to add the random red arrow and actually use that as the thumbnail. I wonder where I should point it. Maybe at the F5 key, that just seemed funny to me.
You know, I actually did a pixel version of 10 PRINT for both VIC-20 and C64 that somehow turned into that Apple video last month with no mention of either of those. I'm not kidding that for every three or four things I play around with only one gets turned into a video. There's a huge trail of good ideas that I need to get back to.
@@8_Bit it's all the rage to include a photo of yourself looking aghast; apparently that works as clickbait. That would be funny but I know you prefer not to show yourself.
@@8_Bitplease add the arrow!! You should point it at the run stop key 😂
Great video again Robin!
I haven't had the chance to read all the comments yet so maybe my contribution is redundant.
I used C64 VICE in NTSC mode to compare the different versions.
The original for/next version was measured as:
J:246 S: 4.1
Here are my one liner versions for even less jiffies spent:
(Best being 239 jiffies)
What works for the program version:
instead of using chr$(a+rnd(.)) I swapped the two operands of the addition. The idea behind this was
that in some cases multiplications, divisions etc can take more or less iterations depending on the number of bits on the left and right side.
Without knowing exactly how the C64 FP arithmetic internals work I still had this assumption.
So the chr$(rnd(.)+a) version ended up with overall 1-2 jiffies less than chr$(a+rnd(.)).
J:245 S:4.083
What only works in immediate mode:
Two more areas where we can speed up this program:
1) Doing less FP arithmetic
2) Unrolling more iterations somehow
For 1):
I looked for a solution which can branch without doing arithmetic operations.
NEXT does FP addition and comparsion against STEP and TO which takes time.
From all the branching instructions I found only a few which are not based on line numbers (so works for the one liner versions):
END - Not usable as branches into input mode
STOP - Same as END
RETURN - Requires a GOSUB and heavy trickery on the stack
CONT - Hm.. we have a winner
What CONT does is that it simply restores the basic program counter and line numbers to where a STOP or BREAK occured.
So I just needed to force it to somehow break before the ? so that CONT jumps back. Obviously breaking is not an option.
But the way BASIC works is that it stores the address of the current basic instruction in the (61,62) zero page locations before it executes it.
CONT uses this saved address to resume the program.
Sadly it is not usable in a stored program because BASIC automatically updates this for every instruction, so a simple CONT in the program would jump back to the same CONT.
However in immediate mode you can not CONT, so basic does not modify (61,62) location.
So all you need to do is store the address of the first statement of the loop in (61,62) then any CONT can jump back to that location.
The immediate mode maze generator using CONT is:
a=205.5:pO61,20:pO62,2:?cH(rN(.)+a)cH(rN(.)+a)cH(rN(.)+a)cH(rN(.)+a);:cO
J:240 S:4
Sadly this is one less iteration unrolled than the original version, but still faster.
For 2):
To add more statements into the same line these must be crunched even more.
The idea here was to enter the already tokenized basic code somehow because the functions and statements take 1 byte instead of 2 or 3.
At the end of your video there was a version which contained a machine code after the REM.
Using the similar tech but for basic we can write the following immediate mode maze generator:
(If you see a that means you need to press Commodore key with the key after it)
a=205.5:pO61,24:pO62,2:cO:":#6G(#f(.)#na)G(#f(.)#na)G(#f(.)#na)G(#f(.)#na)G(#f(.)#na);:#7
J:239 S:3.983
Explanation:
61,62 is set to the location of the first : after the ".
The first CONT jumps to that location so the " is skipped this way.
The one and only quote mark here is needed to store the special token codes. Basic directly stores the string without altering it.
No need to close the string.
Sadly there is no way to enter the token code of NEXT (130, 0x81) in the string so the for/next version can't be used this way.
P.S.: I had to make a patch for your BBASIC to measure immediate basic lines too.
The speed savings are wild! Especially that assembler one, geesh.
Yep! Literally 10 times faster than the fastest previous attempt. Long live the 65xx processor!
If apps nowadays would be written with the same care and optimization, they would run perfectly fine on 100 Mhz machines too. ;) Unfortunately the virtually unlimited CPU power, memory and storage, has created many lazy programmers.
From all the discussions in the comments, it looks like you've got a follow-up video on your hands 😉😄
I used to own the C128 back in mid 1988, but I stared out on
an Atari 1300 home computer system back in the summer of
1986. I just loved that C128. I played around with its 6502 Assembler
Language, after understanding peeking and poking, via invoking
into machine code, but it still wasn't as fast as full 6502 machine code.
Peeking and poking were from advanced basic near the back of the thick
white book. They don't make computers like that anymore. Educational
machines at one point in time. I was lucky enough to get into it back then.
I was 21 years very young when I started.
I love the concept of these videos. It's very similar to speedrunning video games. Who can get the same thing done fastest? It's awesome to see the ingenuity it takes.
Another boost:
0 a=4:a$="...MNMNNNNMMMMNNMNMMNM":forb=16to.step.:printmid$(a$,a+b*rnd(.),a);next
The M is the graphic backslash, N the graphic slash (shifted M, N on the keyboard). The sequence gives all 16 possible variations (2^4) of 2 symbols on 4 places by indexing from 4 to 19 (16 + 3 symbols). This index is randomized, selecting one of the 16 different 4 character sequences.
The dots are a filler to be able to use variable A (primarily needed as string length) also as some offset, because MID$ needs something > 0. A is defined earlier for faster access, because it is referenced twice, A$ and B only once.
This takes on my PAL C64 171 jiffies or 2,85 seconds.
This seems the maximum possible if keyed in via screen editor (with abbreviations).
If we dismiss the line limit of the screen editor this game could played further as far as we keep A$ limited to the maximum of 255 characters (the line length is unlimited, but a re-chaining of the program must not happen).
If an appropriate A$ can be composed, the a possible for A and B would be 7 and 128, giving a length of 140 for A$.
This task is left to the readers (or Robin), to push it to the extremes ... ;)
Just for comparison and analysis: Saw @csbruce's MID$ approach later skimming through the comments, but it is limited to a 3 character sequence. The above one accomplishes a 4 character sequence by using a overlapping structure of all needed 4 char sequences which have 1 index distance (saving a multiplication too).
Really nice, great idea to have all the combinations overlap like that. I guess there's an algorithm that will generate an optimal sequence like that?
Wow, each "bit" combination is present in the string exactly once. You could *almost* fit a second unrolling of the MID$().
@@8_Bit Did it manually. My first attempt for triple sequance came fast. I'm not expected to be successful for quads, but found also a solution quite fast.
I thought it would take much more time to develop an algorithm for this task ... I didn't check, if there is already a known general solution. However,, a contest to search for a Basic solution would came into my mind ... 😉
@@8_BitIt's called a de Bruijn sequence.
This is a gem, this taught me some new things about Basic, 35 years after I ditched my C-16. But never too late they say right?
Programming the C64, first in basic and then in Machine Code with this visceral direct connection to the hardware being able to modify registers and address spaces directly is something that is missing from today’s first computer experiences.
Very cool video. Love the benchmark basic. I also saw the thing you mention at 29:04. I was going to mention it before I saw you did. You can stack CHR$ in there as much as you like, not just one. Good stuff.
Dang that MV guy had the same idea that immediately came to my mind about how to speed it up 🤣
and yes I was totally thinking the same thing about embedding assembly. hats off
Wow, either I had totally forgotten about the "step" part of Commodore BASIC or never learned it!
This basic had a PI constant, and it was the actual character? Wow. I'm beyond impressed. (the float vs. int thing is a bummer, but as someone who took their first walking steps with AmigaBASIC, this is something I can still appreciate, as I had seen my brother do these weird "BASIC" things when I could still barely write)
The "crunch" by removing whitespace thing is absolutely barbaric, but it explains how the parser works in 8k.
It's not only the actual character, it's also a BASIC token of its own. 🙂
@@noland65Yeah thinking about it, it makes a lot of sense in a memory constrained environment.
@@thygrrr It's actually a bit weird: Commodore 8-bits have two character sets, one upper-case/graphics, the other upper-case/lower-case. Pi is only "π" in the upper-case/graphics set, but is displayed as a checkerboard character in the upper-case/lower-case set. To make this even a bit more interesting, this appears twice in the character set, at 0xDE (222) and at 0xFF (255), and Pi as a BASIC token is 0xFF.
It get's even weirder: using the BASIC command 'VAL()' we can evaluate the numeric value contained in a string. E.g., 'PRINT VAL("1")' prints 1, as does 'PRINT VAL(CHR$(49))' - 49 being the PETSCII value of the character "1".
So which is the real "π", 222 (0xDE) or 255 (0xFF)? Turns out, neither of them: both 'VAL(CHR$(222))' and 'VAL(CHR$(255))' return zero!
'π' is only recognized by the routine that tokenizes a line of BASIC text present in the BASIC input buffer - when you hit return to enter a line of BASIC -, but nowhere else. It's just kind of a magical patch.
Or the other way round, as a BASIC token, it doesn't have any character value, just like the token for PRINT doesn't have one.
I used to write code in a Z80 BASIC for a machine known variously as the LASER 200 and Dick Smith VZ200. It had a total of 24k RAM including expansion module. I wrote a text adventure game in BASIC and crunched everything I could to fit it in. Free memory when I was done was 6 bytes. Those blank spaces were memory munchers I just could not afford, let alone REM statements or any other niceties. But it worked and it sold quite well when it was published.
"It's able to do this with BASICally no overhead at all."
Haha, nice pun!
Some very cool optimizations there! Will have to give Benchmark Basic a try. Your lowercase crunched BASIC lines reminds me of obfuscated Perl lol
Thanks, Robin & Alex!
Wow, how cool that we can compact the typing of one BASIC line into what's still "80" logical characters, or originally 2 screen lines, but then it uncompacts into what ends up being most of 3 screen lines! OK, your new challenge: Write a C64/128 mode (but still 40-column) noticeably working BASIC program with the longest unfurled logically "80"-character line! IOW, use the longest BASIC commands and operands, etc. with the shortest abbreviations that you can assemble to do something that's noticeably functional.
I was today years old when I learnt you can add a line number after run!
Suggestion for improvement: Create two internal variables, a line number L as a breakpoint, and a number N as repetitions. When the program hits line L for the Nth time, stop and print the jiffies. That'll allow you to benchmark non-halting programs that never scroll. (Like, updating a number at the top left corner of the screen.) You could combine this with your "array of time points" suggestion setting a third variable, NE (number of external repetitions).
called a loop by any chance?
I remember my computer studies teacher gave us a challenge to draw a circle full screen in the fastest time. It was a great lesson in optimisation and the tricks you need to do. Could still be a good interview question.
Can you do faster than Marvin Minsky?
NEW X = OLD X - epsilon * OLD Y
NEW Y = OLD Y + epsilon * NEW X
where epsilon is a (very) small number and this is essentially a division.
Which is, why we can use right-shifts, instead, here using 1 / 2^8 for epsilon:
X := X − ( Y >> 8 )
Y := Y + ( X >> 8 )
Discovered by Marvin Minsky in the early 1960s on the PDP-1 (actually the first pre-production model). David Mapes made the same discovery, also on a PDP-1, the other pre-production model at Lawrence Livermore National Laboratory (LLNL).
(It's actually not a perfect circle, but a fat ellipse, but with a small epsilon it's "good enough" for an approximation.)
This is actually a numerical methods approximation to the differential equation for the circle.
@@FadkinsDiet It's actually a quick sine/cosine approximation for small angles, where we use 1 for cos(x) and epsilon for sin(x), gone wrong.
For the traditional approach, we'd use both the old X and old Y to derive the new coordinates. But this approximation isn't good enough and the curve will spiral out. By accidentially using the newly derived X, instead, the function somewhat magically auto-corrects.
(To see why that is, we may have a look at the determinant of the matrix representation of the function: this must be 1, in order for a conic to meet up with itself as it wraps around a cone. With the original formular, this is greater than 1, it's off by epsilon^2, and the ends do not meet, resulting in the spiral. The accidental version introduces just the right amount of error in order to make this 1. But it's a fat ellipse rotated by 45 degrees.)
Rich Schroeppel commented in HAKMEM, the collection of MIT AI-Lab memos, on this:
"PROBLEM: Although the reason for the circle algorithm's stability is unclear, what is the number of distinct sets of radii? (Note: algorithm is invertible, so all points have predecessors.)"
(HAKMEM, Item 150)
Gene Salamin observed in HAKMEM, Item 152:
"With exact arithmetic, the circle algorithm is stable if |epsilon| < 2. In this case, all points lie on the ellipse
X^2 - epsilon X Y + Y^2 = constant,
where the constant is determined by the initial point. This ellipse has its major axis at 45 degrees (if epsilon > 0) or 135 degrees (if epsilon < 0) and has eccentricity
sqrt(epsilon/(1 + epsilon/2))."
The really interesting thing about this is that this is a discovery, which would have been hardly made without a computer with a display attached to it to plot the function. But, as soon as there was such a thing available, particularly the PDP-1 with its X/Y plotting display, it was made several times about at once, since it's based on an error easily to be made and the result is as easily observable.
As soon as I saw "an addendum" I knew it was about print :) I actually fired up vice while you were talking and tried it myself :D
I grew up with a C64, but i was just beginning to code. The lessons of space and logic optimizations paid off when i got a HP 48GX calculator for college and began coding lots of helper functions for math and science classes.
Awesome stuff, brings back memories of the '80s! Such a shame that young programmers these days have no insight of the inner workings of the machines the write programs for. Thanks for making these videos!
I encourage young programmers to look into any free Coursera computer science courses. There used to be a great Computer Science 101 course. There are probably UA-cam videos, too.
Back in the day, one of the TRS80 magazines ran a one line game challenge, where you had to write a game in 251 characters. All the concepts you talk about here were used to get the best performing game 😄
2032: Humanity is ready for the PC and C64 finally runs Crysis
Ah, so that's the basis of the "backrooms" portal field generator. LOL :) Kidding! I remember this simple little code, back when I had the C-64. Was always cool to watch that random maze generation. Thanks for sharing.
Yeah, that ending SYS example _is_ a cool idea, even though it is technically a cheat according to the spirit of BASIC. So what does that system call to memory location 2016 (decimal) do: read whatever's in the REM statement and take it in as ML bytes? And what does SYSing that without the REM statement do?
2061 is where the source code of the REM statement is stored in memory, so SYS just jumps to it as machine language, yes.
Without the REM statement it would just run whatever junk happens to be in memory. It'll probably crash.
Oh, interesting, @@codahighland, thanks! I could try the SYSing myself, but first I have to find or replace my A/V cable. I suppose I could just download the VICE and try it on there, but I'm never really sure how picture-perfect the VICE is.
@@HelloKittyFanMan it works fine in VICE, but it looks like you can't actually _type_ the line perfectly like it's displayed. There's a couple of characters that don't come out right when typed. See the thread by @judgegroovyman below.
Oh, that's interesting, @@tirsek. Then it doesn't work in VICE after all, I guess.
Great video! It would have been nice to have a scoreboard in the upper-right corner, as my short-term memory isn't what it used to be (and it never was much, to begin with).
In some languages a print statement itself is very slow. In those cases it helps to add characters to a string var and print that. Dunnu if that would work here.
PRINT is fairly fast on the Commodore computers. String variables have a limit of 255 characters in this (and most 8-bits) BASIC and we need to print 1000 characters to fill the screen, so I'm fairly certain it'd be slower to loop through multiple times.
Oh! I either never learned or had forgotten that the "ready" prompt could be replaced!
Here is a mid$ version that calculates the "lookup table" s$ with overlapping strings of length l (here 7 chars).
10 s=0:l=7:n=2^l:dima(n-1)
11 fori=1tol:s$=s$+"M":next
12 :
13 fori=1ton:a(s)=1:s=s*2+1:s=s-n*int(s/n):s$=s$+chr$(206-a(s)):s=s-a(s):next
14 :
15 fori=.ton-1:k=k+a(i):next
16 ifknthenprint"checksum error":stop
17 :
18 fori=nto.step.:printmid$(s$,rnd(.)*n+1,l);:next
The precalculation takes a while but you can print s$ and write a new program that uses its value directly.
but... its 18 lines...
You've made it fast, now try and make the the slowest 10print
Add an empty FOR A=1 TO 2 STEP 0:NEXT loop. You can make a program infinitely slow without much effort.
It would be an interesting challenge to make the slowest one-line #10print possible that actually will (or at least, could) finish.
@@8_Bit: 10 PRINT CHR$(205.5+RND(1))MID$(CHR$(157)+CHR$(0),RND(1)+1+1E-9,1);:GOTO10
@csbruce UA-cam/google has been censoring more of your comments lately, maybe because there's more code than english? I don't know. When I find them in the "Held for review" I of course approve them, but if anything goes missing completely that's why. Sometimes UA-cam just deletes stuff and doesn't even give me a chance to do anything about it.
@@andymanaus1077 I think that a good rule is that infinite loops are not allowed, but 1 loop to print the code is allowed. There could be several categories.
* single line 1 loop
* unlimited lines 1 loop
* single line unlimited loops
* unlimited lines unlimited loops
Not sure if this counts since it doesn't use print or CHROUT, but it fills the screen in about 6 seconds:
10 a=77.5:fori=1024to2024:pokei,(a+rnd(0)):next
Hi. Your code needs 377Jiffies on a NTSC machine. You can speed it up with this changes to 317 Jiffies. That's almost 1 second faster :) 0 A=77.5:FORI=1024TO2023:POKEI,RND(.)+A:NEXT
I'd love to see a video digging into that REM statement with the machine language in it.
Quite interesting! However the main question is: is it jiffies or giffies?
Are you familiar with "Waterloo Structured Basic"? Back in the day it was available on the PET computer. My Mom and Sister learned about and worked with it at the University of Waterloo. I heard rumours that it was available on both VIC 20 and C-64.
Have you heard of it on the VIC 20 or C 64?
Yes, I have heard of Waterloo Structured BASIC, and I've seen cartridge images of it online. I've got it on my long list of things to eventually look into. I'd love to get an original copy of it but I'll use a copy if I haven't found an original by then.
HI Robin. Thanks for these videos.
You can further optimize by changing A+RND(.) to RND(.)+A. Using the unrolled version at 28 minutes, I was able to save an additional jiffy. It's not much, but it does make me wonder what the interpreter is doing to make that even the slightest bit faster.
Cool credits music! What is the song?
Good question. I'd like to know that too.
The machine language in rem at 30:34 doesn't work as typed. Mine matches exactly in lowercase AND uppercase but doesn't do anything when you run it. Is there something else required that is over my head or does it just not work?
Regardless of all that this is a great video and project. Thank you!
I'll try to get an actual program file from the author and see if I can figure out what's wrong. I'm only guessing but certain characters appear twice in the character set so it might be that it's using the alternate one.
If you're using an emulator, or a C64 with a freezer cartridge, then with the BASIC program entered, go into the machine language monitor and enter the command: m 0801
Then compare the hex dump (formatting might be a bit different) with the following:
>C:0801 28 08 00 00 9e 32 30 36 31 3a 8f 22 a2 ff 8e 0e
>C:0811 d4 8e 0f d4 a2 80 8e 12 d4 ad 1b d4 29 01 18 69
>C:0821 cd 20 d2 ff d0 f3 00 00 00
I suspect there's one or more bytes that don't match up.
@@8_Bit Thanks for the hex dump! I was able to shave off nine bytes by removing a few unnecessary instructions, as well as the REM statement since I don't think it's possible to type in the entire line using just the C64 screen editor...
>C:0801 0b 08 00 00 9e 32 30 36 31 00 00 00
>C:080d a2 80 8e 0f d4 8e 12 d4 ad 1b d4 29
>C:0819 01 69 cd 20 d2 ff d0 f4
*=$801
; basic header
!pet $0b,$08,$00,$00,$9e,"2061",$00,$00,$00
start:
ldx #$80
stx $d40f
stx $d412
-
lda $d41b
and #$01
adc #$cd
jsr $ffd2
bne -
@@8_Bit I had the same problem and eventually came to the same conclusion. After painstakingly figuring out how to type it in with a poorly keymapped VICE setup, things weren't working. Looking in the monitor it quickly became apparent why. The incorrect bytes is a C9 instead of 69 at 0820 and a B3 instead of F3 at 0826.
Fixing it in the monitor, then list'ing and hitting return on the displayed line in the listing breaks it again, so wouldn't that mean that it's not actually type'able?
@@8_Bit thank you! lets see ... mine says
C:0801 28 08 00 00 9e 32 30 36 31 3a 8f 22 a2 ff 8e 0e
C:0811 d4 8e 0f d4 a2 80 8e 12 d4 ad 1b d4 29 01 18 c9
C:0821 cd 20 d2 ff d0 b3 00 00 00
clearly there were differences there (c9 instead of 69 (noice) and b3 instead of f3) so now I will do two things:
1. I will try to change it in the monitor and get it working (Update: changing those bytes in the monitor works great!)
2. I will try to see how I could have typed it in differently
Awesome video, thank you 😊 What band is in the end credits? 🎉
I believe it's www.youtube.com/@BedfordLevelExperiment that Robin also performs in.
@@tirsek thank you so much for your kind response
Great video, Robin :) I totally agree with You, even if it's not the fastest solution, one-line is totally sexier :) Thank You for Benchmark BASIC, it will be a very handy tool to have. Cheers!
Real C64 nerds should go even further:
Next step is to write an Assembly version of this (which was done in a previous video, if I recall correctly), then arguing how using SID's voice 3 as a random number generator is for "noobs", so not good enough for Assembly 10PRINT. And then arguing once again what is the best sophisticated RNG to use, and how can we squeeze ONE more jiffy by tweaking the Assembly code again and again for an eternity. 😁
14:01 Are the 17 consecutive forward slashes 7 lines above the Benchmark results bound to happen whatever the seed for RND() or just with the pi seed?
The parameter to rnd() is only a seed (resetting the internal state) if it's negative.
Great job man
The outro song is really cool. Are you on Spotify by any chance, can we find it there?
"A whole nother video..."
* "A whole _other_ video" or _"another whole_ video." ("Nother" isn't a word.)
Did it give your brain a syntax error?
Yep, haha, @@TheUtuber999!
I wonder if a peek to the SID's voice 3 oscillator and an AND to grab the lowest bit might work in place of RND, and might be faster.
If you mean something like this, it's still pretty slow...
0 poke54287,128:poke54290,128:a=205
1 printchr$((peek(54299)and1)+a);:goto1
@@TheUtuber999Is it possible to store an array with an integer, but store in screen memory?
Sid noise waveform is still a lfsr, i think it is actually less random than basic's rnd
I typed in the program with the REM line, but it doesn't seem to work. For example the SHIFT+I character is always interpreted as $C9 and hence CMP, but it should be interpreted as $69 which stands for the mnemonic ADC. Or is there a way to fix that?
x 2 is the first thing after having decided to watch those overlength videos. 😊
Unrelated to this video in particular, I've been curious whether the colors your Commodore 64s and 128s produce in these videos are what your capture device spits out by default or if you've had to do some fine tuning to make it match what you've come to expect it to look like on a vintage TV or monitor. I ask because-while I haven't had the pleasure of seeing a Commodore 64 screen in person in quite some time-it feels more accurate to my own memories than anything VICE has been able to produce, both before and after the multiple tweaks that have been made recently to its color handling.
VICE by default uses a PAL palette, Robin has an NTSC machine, maybe that's that?
@@vytah Nope, VICE's NTSC palette looks like the screenshots starting at 30:15.
I don't do any video processing for the particular 64C, VIC-20, or C128 that I usually use. Part of why I choose those particular machines (I have multiples of each available) is because I like their video output. I'm not particularly happy with the output of any of my breadbin C64s so I will often tweak those in post, and when I use more obscure computers or consoles, especially with only RF output, I will usually tweak those too.
Great! Is it possible to give goto a rnd(2) adress?
How fast would a print be with the complete screen hardcoded in it (1000 c) ?
Would poke to frame buffer be less overhead over print. Also with poke you write 8-bit value which maybe could be bitwise done. I don't know if c64 basic supports this. Also maybe random value could read from ROM with AND 0X01
when i was young i wrote a programm for someone. I wonder how much faster it could have been with those tricks you showed here which i had no idea about. One trick i used was ... since it was a C128 (i used the C64 mode for my programm dont know why anymore) i Poked the speed up to the C128 Clock. While the calculation was done you could not "see" anything on screen but once it ends i Poked the speed back to the C64 clcock and the videooutput was ok again :)
that was a very interessting video, i like it
Hehe, "RUN 1."
Yeah, it makes sense, it's just a bit funny though.
Now I want to see more assembler source sticking 75 (?) bytes of machine code at a time in REM statements. 😀
(Ok I actually checked and you can get 75 bytes after 1REM” I guess for a line that can actually be typed in.)
And I think it'd be 68 bytes left in a one-liner that includes the SYS2061:REM" so it can be RUN. There's also the limitation that not all possible bytes are type-able or compatible with the REM listing. I'll have to study this sometime if I'm to make a video on it, because I think there's a lot of "gotchas" with this trick. But it's still very cool.
@@8_Bit Yeah sorry I wasn’t think of a one liner. I was thinking of a several liner. But yeah the first one has to have the sys for sure.
Department of cheating: You can type a lot of data on screen before hitting return, and the last two screen lines will be interpreted by BASIC. Then your "one liner" could read the data above. This cheat is broken by doing a clear or list, though.
However, if the real line extracted data from the screenful and stored it as a line in BASIC's memory, this could make a super long line that *could* be listed and saved, but not entered. Hmm. Maybe I should stop meta-meta-programming...
Great video! Would it be faster to poke to screen memory rather than using print?
1 PRINT"{clear}";:TI$="000000"
2 N=77.5:FORI=1024TO2023:POKEI,RND(.)+N:NEXT
3 PRINTTI
Run time: 314 jiffies = 5.23 seconds.
I'm not an expert, but I think that POKE does time consuming stuff due to BASIC being an interpreted language whereas PRINT is a genuine short cut. I vaguely recall running my own tests. That's a great question, though.
In Atari BASIC, you can treat characters as a colour, in the text based graphics modes, if I recall correctly, but nothing seems to beat PRINT for speed and general convenience. When I say, "colour", I mean that you can draw them across the screen.
That was fun!
Title suggestion: You Won't Believe How FAST This 10 PRINT is!!!
10 PRINTs you never knew existed
Optimize your 10 PRINT code with this one weird trick!!
10 PRINT secrets doctors don't want you to know!!
New 10 print discovered. You won't believe what comes after 2.
Awesome!
Could you convert 205.5 to hex? Might parse faster. Not as fast as the variable version of course.
Commodore BASIC doesn't support hexadecimal.
I'd love to know the name of the song that played at the end, their voice is interesting and I'm curious to see what else exists for them.
Nice one \o/
Robin, you had demonstrated that integer math was slower on the Commodore BASIC, but has anyone ever optimized BASIC to make integer basic the faster solution?
There are some BASIC compilers that will speed up integer calculations quite a bit, but I'm not aware of any extensions that add full integer support to the interpreted language.
Strangely exiting 👍
Is it crashing?
I LOVE C64 ❤❤❤
I believe you can use shortened 'shift' commands and, as long as you don't list the program first, it will run fractionally faster since the commands are shorter. If you list the program it's destructively expanded back into memory, so you lose the benefits.
That may be a VIC 20 quirk, though, as it was a way to get a little more out of your 4K. In that case, if you listed, you'd get an out of memory error as it ran out of room expanding.
Nope. The shifted abbreviations are only seen by the tokenizing routine. Abbreviated and spelled-out keywords result in identical code being stored in the program. LISTing it or not has no effect.
One of my old videos is called "About Commodore 64 BASIC Abbreviations" and it covers all this; it applies exactly to the VIC-20 as well. The short version is @bxdanny is correct; once you type the line and hit return, all commands are tokenized to single bytes whether they were typed in full or abbreviated.
First I don’t know the c64 can you not store the A value (int 205) in memory along with the RND val (1) before you read the with poke in an unused part of ram then in your line peek it out as it will already have been phrased so 10 print chr$(peek(stored add)) + rnd(peek (stored other add)). Of course assuming c64 has poke and peek.
The other way would be to look through ram for a value of 205 and 1 then use those addresses
Bob
Thanks!
Thank you, and nice optimizations!
Hi. Thanks for supporting Robin. I like learning his programming hacks.
@eugenetswong I like his style of presenting all these topics that concisely. All the topics are selected very well and are always on the upper edge of C64 knowledge. 😀
Was there ever a BASIC compiler for the C64? That was one of the reasons that I loved QBasic 4.5 because it could compile a binary and was so much faster that way, but I'd imagine that the memory constraints of the C64 would make that a significant challenge. For instance, how much memory would the program listing take up versus the binary and could you actually store both without eating all your memory too quickly. Which I guess naturally leads to a second question. How many instructions could you usefully store in memory and still get work done?
I'm sure there were BASIC compilers for all the popular 8-bits. I'm not sure how they worked, but I'm sure they didn't have such a hard time as the FORTRAN compiler for an extremely early IBM mainframe with hardly any RAM and no ROM, disk or tape. The only IO was punched cards. The way it worked was to keep the program in memory and divide the compiler into programs tiny enough to fit in the remaining space -- 46 of them! The programs were loaded one at a time, and each would transform the program a little bit towards its final state. 8-bit compilers had it much easier, they could just require the user to have a disk drive and transform one file into another.
I liked QuickBasic 4.5 as well, even though when I finally got my hands on a copy I had already gotten Borland C/C++ and Turbo Pascal compilers. Still, I loved tinkering with different languages, it was mainly for *fun.*
So... Have you checked out QB64?
@@robsku1 Yeah, it's a neat project, even if it's not yet complete. It has a nice interface, but it doesn't seem to handle mouse input very well if you resize it. And it doesn't seem to implement the quirky usage of `def` that Gorillas makes use of. I'll have to rewrite that function and see if it works.
WHAT IS THAT SONG AT THE END??? I must play it on the radio!
Astounding, something like A$(0)=CHR$(205):A$(1)=CHR$(206):FORA=.5TO-STEP.:PRINTA$(A+RND(.));:NEXT is roughly 20 % slower ,,, Ok, it's fairly plausible that an array access seems to be more complex than the CHR$ function ... ;)
OMG i love those LOW% speedruns! :D
Aproved? That’s not even a word!
I'm waiting for this program to be ported to a Cray-2 supercomputer.
How about trying run instead of goto?
RUN is a little slower than GOTO without a line number. Presumably RUN takes a bit of extra time to re-initialize various pointers, like the variables.
There is one more minor optimisation - you can remove the space between the line number 0 and the PRINT.
Edit: I now see you do that towards the end!
30:15 : I guess this is an early example of a code injection attack, albeit not in Java.
Perhaps, but by that rule, half of the interesting things in DOS would count as code injection attacks. :) Many TSRs hooked into the keyboard interrupt so they could see what keys had been pressed before even the BIOS did.
A+RND(A) looks weird. Isn't tbat BASICally the same as 205.5+RND(205.5)? What happened to the 1?
Yes. RND behaves exactly the same with any parameter greater than zero, so (1) or (205.5) or (65535) are all equivalent. The difference is that a variable is parsed slightly faster (especially in a program with very few variables) so A+RND(A) executes a little faster than 205.5+RND(1).
You Kobayashi Maru'ued 10 print!
Would a machine code version of the 1 line program make it any faster.
Haven't watched the addendum yet, but there is another trick you can do while unrolling. You're repeatedly calling PRINT, which is unnecessary. PRINT CHR$(...);CHR$(...);CHR$(...) should be faster, and as a bonus you can squeeze in more unrolling into one line.
Another thought is using strings. Construct a string using unrolling, and then PRINT it. I'm not sure this would save time. Stringa are kinda wonky in C64 BASIC. You could absolutely make a C program faster by doing something like that, because printf() is slow. Then you could also combine unrolling with FOR, because you can amortize the extra cost of doing FOR over several unrolled statements. This way you could combine the one liner AND the A=205.5 without the RUN 1 stuff.
Edit: Lol, obviously other people had already suggested this. I should watch the entire video before commenting.
question would be more 'how fast the basic random function is' compared to just getting a free running counter from a 6526, noise from the sid or just the scanline AND $01 with peek. ya know. peek is probably faster than anything 'basic' :P lol. just like using 'stdout' 'print' :P (parsing the whole thing, going through basic, going through the kernal, going through the cbm editor rom) is probably hell a lot slower than just poking it into video ram. :P not entirely sure what the 'rules' would be before it would no longer be considered 'basic' or 'one line' :P does this have to run on any cbm/mickeysoft basic compatible machine? or just and only c64's, that sorta thing... :P cuz if it doesn't need to run on 'antyhing' there probably are a lot faster ways.(actually to this very day... the slowest part of any program is and always has been the output to a screen for some reason. it's even slower than most networks :P so 'most of the speedup' can probably be made there. (not limited to c64's. also applies to your pc on your desk :P also options such as simply changing the entire character set to just / or \ and not even having to do the and and add spring to mind. :P would things like 'simply entering the machinecode as petscii into the basic line and then jumping to it' still count as 'basic' :P ya know.. :P at which point does it stop being 'basic' ;P (even basic loads to fixed addresses, so this is no problem).
As far as I can tell, the machine code version in this video does not work if typed in from BASIC. Any opcode or data in the ranges of $60-$7F and $E0-$FE cannot be used because in PETSCII, they are copies of codes at $C0-$DF and $A0-$BE respectively, so if you type the correct character in it changes the opcode. In the case of this code, ADC and BNE cannot be used because ADC gets changed to CMP and the BNE relative jump back changes from F3 to B3. I made another version that has assembly that utilizes ASL, BCC, and INY to switch between 205 and 206. This can be typed in from BASIC. Here is the assembly of just the executable machine code.
080d ldx #$ff
080f stx $d40e
0812 stx $d40f
0815 ldx #$80
0817 stx $d412
081a ldy #$cd //load y with 205
081c lda $d41b //load a with random # from noise waveform 3
081f clc
0820 asl a //shift hi bit of random number to carry
0821 bcc $0824 //skip increment of y if carry is clear
0823 iny //if carry is set then increment y to 206
0824 tya //transfer y to a for printing to screen
0825 jsr $ffd2 //print to screen
0828 jmp $081a //loop back
Awesome! well done.
Well, what's the basic?
I had a flashback to the Calgary Olympics while watching this video😂
Too bad I wasn't using my official Calgary Olympics Commodore 64!
You could save the random number command by omitting it, and wiggle the power cord just enough to induce random numbers in memory. lol (jk)
I really like the machine code packed in the REM line, now that's really clever, course the actual routine is no longer running in Basic. heh
Or just use an original C64 power supply. 😁
@@TheUtuber999- And there's that too. lol