not that this diminishes the achievement too much, but it wasn't just naked assembly, he had macros. once you get a foundation of good primitives you can build up essentially arbitrarily high level abstractions with a good macro assembler. And if you are working alone theres nobody to chastize you for your program being almost entirely macros.
@@homelessrobot I mean that's true with any language. If you wanted him to remove the macros then well he's just duplicating stuff, which is pointless.
@@homelessrobotI heard that most self rolled abstraction over assembly looks like C. Now I imagine that Rollercoaster Tycoon source code looks much like C
its really not true for any language, certainly not in the way that most people use most of them. Most languages that have macro systems intentionally hobble them so that every other program isn't a new programming language. Either that or the best practices for their usage strongly frowns on getting carried away. No such restrictions or culture exists for macro assemblers, certainly not in 1994. Generally they are as potent and unteathered as the implementors imagination was capable of fathoming. If you are still dubious, you should go look at the other fasm assembler, fasmg. It has an algebra solver built in.
@@homelessrobot I thought by macros you meant just easing code duplication. If they are anything beyond that (like doing math not written by you) then yeah I get your point. But does tycoon use that? If he built the macros himself then it shouldn't take away from the achievement.
Its so ironic that you went through the same stages of grief over the shufps instruction that I experienced when trying to figure out the pshufb instruction. In the end I felt like the instruction was simple too and also mad at the documentation for making things more complex than necessary.
Fascinating experiment. It shows so nicely how close C is to optimized assembly, as long as you don't introduce floats. Btw if you are interested: In German we call ß "scharfes S" which refers to it being spelled more sharp / harsh. Also, it is contained in one of your favorite words "Scheiße" which is kinda spelled like scheisze so it starts soft and gets sharp / harsh towards the end.
Here's a fun hack for next time: If you assemble using GCC rather than a dedicated assembler, you don't need to import external functions every time you want to use it. The difference is GCC requires the Start label to be Main instead. Happy Tsoding!
There is a reason why they encode true as -1(integer) and not 1: You are supposed to use AND, OR, and ANDNOT with these masks and the floats instead of multiplication. So instead of p'=[C]*a+[^C]*b you use p'=( a AND C ) OR ( b ANDNOT C). No conversions between floats and integers and no multiplication by -1.0, etc. NOTE: ANDNOT destroys the mask!! Nice feature for the velocities is: v' = ( C
Only today I've seen a video from uncle Bob about people not writing assembly by hand anymore, and now I'm watching how you write assembly by the hand. Now do the same in ARM instruction set!
reminds me of someone writing an article about doing this with masm (only) with directx11, its great to know that there is someone on this planet sane enough to do this on linux as well
Most of the SSE instructions mnemonics contain letters which encode data type they work with: A/U - aligined/unaligned memory access, Sx/Px - single/packed S/D/Q/W/B - single/double/quad word/word/byte etc. Integer opcodes mostly start with P prefix (paddb, pavgb etc), float/double are without prefix ending with p postfix.
It is kinda funny how low level people constantly invents some super complicated instruction sets to make everything as fast as possible so that high level people (where everything is so much easier to optimize) can don't give a shit about performance at all and do somthing like react xD great session as always
Always great to see someone learn SIMD assembly but frankly I'm suprised your laptop doesn't have avx. Some older, cheaper laptops forgo avx2 because of their use as primarily a low-power web browsing machine, but you might have regular avx"1" extensions, such as vpbroadcast in xword sizes. Anyway, you can check the feature extensions your processor has by using the cpuid utility, which you can find on the AUR, and invoking it with "cpuid -1 -l 1". Fun fact, the instruction "cpuid" does basically the same thing and you can use it to programmatically detect the feature set of any modern x86 cpu at runtime, and then dynamically choose to run only the code compiled for features the cpu actually has.
I could understand the official description of the SHUFPS instruction on my second read, but can't for the love of god wrap my head around the explanation you gave in your own words. It always surprises me how everyone's brain is wired differently.
Holy cow, people say you can't learn anything in two weeks, but here I've learn so much in 3 hours on 1.5x speed, you can't imagine how good is the timing!
i think the reason for writing vector and raster version of the drawing functions is simply because it makes working with graphical data that is either inherently raster or inherently vector easier to do and less error prone, and not because of the potential usecase from assembly.
If I remember correctly, the 8 bit mask in shufps works like this: 00 00 00 00 first 2 bits set the position from source Dat will be copied to 1st position (index 0) of destination Next 2 bits do the same for 2nd position Next 2 bits for 3rd Next 2 bits for 4th So with a mask of 0, ure saying u want all 4 positions to get the 1st element, so ure just copying the 1st element to all positions of the xmm register
i watched this without breaks and my brain has been working at full capacity to keep up with everything thats happening. best worst feeling.. im off to bed wish i didnt miss the streams, but life got in the way
mov instruction in assembly takes more clock cycles than xor operation. If you want to zero register, it's faster to xor its value by itself than move zero to it.
Someone else has probably mentioned it but the "d" at the end of register names stands for "double word" and the "w" for "word" where a word is 2 bytes/16 bits. The "e" at the beginning stands for "extended" and the "r" probably stands for "re-extended" or some other bs because programmers can't name things
Back in the day I found a book that described how to use the Win32 API from assembler. So I had to try it. Got some windows and graphics up. Amazingly I found it was easier to do and read afterwards than the C API.
If I recall correctly, you can pass the object file to a C compiler with a flag to not link C's `main` and to link the standard library. Then it outputs an executable that uses _start as it's entrypoint but also adds the standard library other symbols that C infrastructure expects to be present in a binary
3:43:55 I already explained why it shouldn't happen (modify position and save it back), I don't know why I argue... Wait a minute, we have leftovers from the old code where it modifies the position and saves it back! Lol.
fascinating to see how the optimization works here! also lordy those variable length ops and overlapping register slices are cursed (aka, I've been ruined by risc-v)
I'm more comfortable with nasm syntax but I think there is some interesting aspects in fasm, like assembler and linker information directly in the file.
x86_64 is a bizarre architecture to code in assembly. It has x86 operands - which are rich and created with humans coding in assembly - mixed with newer stuff with makes more sense to compiler writers... I did a lot of x86 assembly years ago but anything after Pentium felt messier than calling Windows API.
Instead of doing the weird float operations of multiplying by minus one and so on in the "Top-Left Borders" video segment, you could just use ANDPS with the mask you get from the CMPSS instruction :)
I work with windows instead of Linux. I frekin memorosed the calling conventions for 32 bits and 64 bits. In 32 bits, u just push the arguments in reverse order into the stack, and return value is in, iirc, eax in ints and st0 (yes, the fpu stack) for floating points. In 64 bits, u have fastcall for ints and vectorcall for floating points. In fastcall, first for args go into rcx, rdx, r8 and r9, then push in reverse order to the stack. In vectorcall, first 4 arguments go into xmm0-3 and then pushed in reverse order to stack, and return value is in rax for ints and xmm0 for floating points. If an argument is too large (for example a 10 byte struct), u put the adress (pointer) of the thing as the argument U can mix them, so if u have a function int func(int a, int b, float C, float d), in 64 bit asm, it would look like: mov rcx, a mov rdx, b movss xmm2, c movss xmm3, d I'm aware movss doesn't allow the 2nd argument to be a number, its an example Also, movss and mulss are exclusively for floats. ss means scalar single precision. sd is for doubles, it stands for scalar double precision. cvt instructions convert int to floating points or vice versa, or single precision to double precision and vice versa (like the cvtss2sd instruction somewere in the first h, it was converting the float to a double for some reason)
A great idea in 3:04:01 is instead of using XMM1 as a vector of floats (-1), use it as a bit mask, since for true is always ones and for false is always zeros, this way you can treat the formula's multiplication as bitwise AND, and does not have to change it
I used to play around with some masm x64 assembly on windows a couple of years ago but SSE instructions were just too hard for me to understand. After watching this, I kinda wanna try doing that again
The address 38 in test.o can be explained as follows: check the machine code on the left, the operand to call is actually 0. Since the instruction call uses address relative to rip, a zero offset means that you are calling the function where rip currently points to, which is the next instruction at 38. The main function is put to address 0 coincidentally, so objdump displays the address relative to main, which is main+38. Now the remaining question is, why does that call instruction use a zero offset? The answer is that it is left blank for the linker to fill in.
shader's conditions isn't really slow. It CAN be slow because shaders execute one instruction in parallel for different values so if an "if" statement result in separation of code flow, it means that part that go into "if" body will be copied (which result in slowing down) and execute "if" body when other part that doesn't go into "if" body stay still and wait until other part return from "if" body resulting in additional slow down if all values doesn't separate on "if" statement then there's nothing to copy and wait, it just go forward without any real slow downs so in shaders not an "if" itself slow but it consequences can be for example if you just comparing constants or uniforms doesn't result in separating so it will be as fast as possible or if you comparing alpha received from texture sampling like (color.a > 0.1), it is likely that in one shader invocation there are some pixels with 0 alpha or 1 so it will result in separation and slow run
Honestly could improve this further. After doing the comparison, you did note that you get -1 if triggered. You could just literally take that result, convert to float and directly multiply that vector onto the velocity vector. You're already getting the -1s in all the correct slots of the vector, (if pos0.x goes out of bounds, you will get -1 in that slot and invert vel0.x)
*Sorry for the english guys* im currently working on a socket multi-threaded chat room in c++, all i can say is, dude, i've done a couple of chat rooms in other languages(js, ts, php, rust, nodejs...), but the complexity and difficulty of c++ in this kind of project is ABSURD, im stuck on how to handle multiple clients using different thread per peer, but there is something im missing about it, i wrote the code, but everytime i delete everything and start doing from the beginning so that i can learn whatever i wrote down, i stuck at the same point, then i realized how complex c++ might be when used for more advanced projects if used by yourself without any high level libraries to help you, is just a totally different concept, idk, maybe it is me, maybe im too dumb 🤣🤣
I always thought of numpy+matplotlib as an open source version of matlab, and matlab is the new fortran in the sense that its basically designed for maths/numerical computing/scientists, who like matrices/vectorisation... (applied math background, it comes down to modelling problems in terms of linear algebra). Fun fact, from what I remember, calling a minimisation function in scipy in python, calls some c lib which still calls some olf fortran lib from what I remember.
@@iamdozerq Haven't used it in a long time, I just didn't like the licence, and preferred python loops and objects at the time, now I've joined the rust cult
@@iamdozerqvery slow is an exaggeration. if everything is correctly vectorized, you will get something at worst an order of magnitude slower than pure C and at best on par
can you graph the complex plane, and 3d for PDEs with python + maths libraries? I've done maths and programming separately but haven't combined them up to now
@@nimitzpro In short yes, but what do you mean graph the complex plane? A scalar function on the complex plane is just like one on 2D space, you use heatmaps, contours. Otherwise you can use multiple 2D plots (e.g. real and imaginary part). 3D figures in matplotlib can be a little clunky but otherwise fine. There's also libs like the js based plotly, but I always used matplotlib. I never used it (I used to embed figures in Qt), but if you want more interactive controls, I'd say have a look at dearpygui.
I know jackshit about Assembly programming, but I have found this very interesting. Makes me kinda want to learn Assembly a little bit, but it looks like such a daunting task
2:23:12 - 100% agree with Tsoding that's "gate keeping". And I'm also 100% sure whoever wrote that shite thinks: 1. the text is clear and understandable; 2. the diagram helps a lot; 3. last case, the pseudo-code makes everything easier. I had to deal with this kind of engineers who can't comprehend how such documentation is only clear to whoever already knows that stuff. I hate this kind of documentation. Stop writing documentation for yourself and start writing for the readers!
How about makng a fully functional OS that can perform all the tasks like process management,file management ....etc with c and Assembly ,i realy like to follow that journey , by the way if you will decide to make it then please make sure that it will be bootable not only on the Legacy BIOS but also UEFI (Please Please make it )
2:18:46 I am pretty sure I figured out the shufps thingy while watching (you might of figured it out by the end of the stream idk). You have 4 float sections & you have four possible rearrangements so that’s 4^4 or 256. The last value just picks which one… The graphic they have is horrendous rofl
Like each float section in the first half of the final value can be one of four floats from the first passed floats & each float section in the second half can of the final value can be one of four floats from the second passed floats. Why is it made that way? Idk.
Last time I used assembly to program anything useful was many years ago. These new instructions are a maze, or at least that's how they look like to me. Perhaps it's a combination of sub-optimal documentation and my lack of experience.
shufps: you could look at imm8 as an array of four 2-bit numbers which chooses what part of the source register to copy to the part of the destination register that corresponds with the array index. So for 0b01'00'01'00, it basically does: xmm1[0] = xmm0[0b00]; xmm1[1] = xmm0[0b01]; xmm1[2] = xmm0[0b00]; xmm1[3] = xmm0[0b01]; Edit: seems you figured it out, the picture is awful. Interesting how x86 is full of instructions, yet the instructions of how to use them are horrible.
parsing is such a weird academic discipline. So much effort goes into studying parser generators, but every useful language has a handcoded parser that's faster and gives error messages.
having a background in how to import stuff from c will be a great start because you will be calling into them at some point or the other to get stuff done also, "the art of assembly" is a great read, though my 2003 edition is a bit out of date
Now imagine developing Roller Coaster Tycoon in assembly... insane
not that this diminishes the achievement too much, but it wasn't just naked assembly, he had macros. once you get a foundation of good primitives you can build up essentially arbitrarily high level abstractions with a good macro assembler. And if you are working alone theres nobody to chastize you for your program being almost entirely macros.
@@homelessrobot I mean that's true with any language. If you wanted him to remove the macros then well he's just duplicating stuff, which is pointless.
@@homelessrobotI heard that most self rolled abstraction over assembly looks like C. Now I imagine that Rollercoaster Tycoon source code looks much like C
its really not true for any language, certainly not in the way that most people use most of them. Most languages that have macro systems intentionally hobble them so that every other program isn't a new programming language. Either that or the best practices for their usage strongly frowns on getting carried away.
No such restrictions or culture exists for macro assemblers, certainly not in 1994. Generally they are as potent and unteathered as the implementors imagination was capable of fathoming.
If you are still dubious, you should go look at the other fasm assembler, fasmg. It has an algebra solver built in.
@@homelessrobot I thought by macros you meant just easing code duplication. If they are anything beyond that (like doing math not written by you) then yeah I get your point. But does tycoon use that? If he built the macros himself then it shouldn't take away from the achievement.
Its so ironic that you went through the same stages of grief over the shufps instruction that I experienced when trying to figure out the pshufb instruction.
In the end I felt like the instruction was simple too and also mad at the documentation for making things more complex than necessary.
Your stream are like lofi songs. Just open it up and relax.❤
and... fall asleep
This is exactly what I do, with the benefit of learning things constantly.
You still absorb information while sleeping
Fascinating experiment. It shows so nicely how close C is to optimized assembly, as long as you don't introduce floats.
Btw if you are interested: In German we call ß "scharfes S" which refers to it being spelled more sharp / harsh. Also, it is contained in one of your favorite words "Scheiße" which is kinda spelled like scheisze so it starts soft and gets sharp / harsh towards the end.
Lovely, your video themes always make me curious to learn more about themes I never considered before
Subject is a better word than themes. I know that in our language themes is used for it too but is not the same in english. I hope it helps.
Here's a fun hack for next time:
If you assemble using GCC rather than a dedicated assembler, you don't need to import external functions every time you want to use it. The difference is GCC requires the Start label to be Main instead.
Happy Tsoding!
There is a reason why they encode true as -1(integer) and not 1: You are supposed to use AND, OR, and ANDNOT with these masks and the floats instead of multiplication. So instead of p'=[C]*a+[^C]*b you use p'=( a AND C ) OR ( b ANDNOT C). No conversions between floats and integers and no multiplication by -1.0, etc. NOTE: ANDNOT destroys the mask!!
Nice feature for the velocities is: v' = ( C
Only today I've seen a video from uncle Bob about people not writing assembly by hand anymore, and now I'm watching how you write assembly by the hand.
Now do the same in ARM instruction set!
reminds me of someone writing an article about doing this with masm (only) with directx11, its great to know that there is someone on this planet sane enough to do this on linux as well
Most of the SSE instructions mnemonics contain letters which encode data type they work with: A/U - aligined/unaligned memory access, Sx/Px - single/packed S/D/Q/W/B - single/double/quad word/word/byte etc. Integer opcodes mostly start with P prefix (paddb, pavgb etc), float/double are without prefix ending with p postfix.
bro with his stream is equivalent to what I studied in two years at my university💀💀💔🤣
Mano, cada vez que vejo o título dos vossos vídeos, meu olho se arregala kkkkk
completamente isano
O gajo é bom msm
@@gustavohqueirozgajo ?
por gentileza, fale português corretamente.
Poisé mano. Esse cara é insano
Muito bom ver esses videos, mesmo que dure 4h
Nice! Old school vibes. I was on 68K ASM back in the late 80s until C compilers began to improve.
It is kinda funny how low level people constantly invents some super complicated instruction sets to make everything as fast as possible so that high level people (where everything is so much easier to optimize) can don't give a shit about performance at all and do somthing like react xD great session as always
What the fuck bro just created a window in 15 minutes from ASM. Too excited for this entire stream
Always great to see someone learn SIMD assembly but frankly I'm suprised your laptop doesn't have avx. Some older, cheaper laptops forgo avx2 because of their use as primarily a low-power web browsing machine, but you might have regular avx"1" extensions, such as vpbroadcast in xword sizes. Anyway, you can check the feature extensions your processor has by using the cpuid utility, which you can find on the AUR, and invoking it with "cpuid -1 -l 1".
Fun fact, the instruction "cpuid" does basically the same thing and you can use it to programmatically detect the feature set of any modern x86 cpu at runtime, and then dynamically choose to run only the code compiled for features the cpu actually has.
Funner fact, you can read all of that information at once from /proc/cpuinfo.
Pretty sure his old laptop is ivybridge i5 mobile of some kind. AVX2 wasn't a thing until Haswell, just after that *shrug*
I could understand the official description of the SHUFPS instruction on my second read, but can't for the love of god wrap my head around the explanation you gave in your own words. It always surprises me how everyone's brain is wired differently.
Amazing how stating: the imm8 argument is treated as a array of 2 bit unsigned integers. Would of cleared everything up.
Holy cow, people say you can't learn anything in two weeks, but here I've learn so much in 3 hours on 1.5x speed, you can't imagine how good is the timing!
i think the reason for writing vector and raster version of the drawing functions is simply because it makes working with graphical data that is either inherently raster or inherently vector easier to do and less error prone, and not because of the potential usecase from assembly.
If I remember correctly, the 8 bit mask in shufps works like this:
00 00 00 00
first 2 bits set the position from source Dat will be copied to 1st position (index 0) of destination
Next 2 bits do the same for 2nd position
Next 2 bits for 3rd
Next 2 bits for 4th
So with a mask of 0, ure saying u want all 4 positions to get the 1st element, so ure just copying the 1st element to all positions of the xmm register
I freaking love how you demand your viewers to take responsibility of their suggestions. What a cool channel I just found. Subbed
Next video: Electron in Assembly
i watched this without breaks and my brain has been working at full capacity to keep up with everything thats happening.
best worst feeling.. im off to bed
wish i didnt miss the streams, but life got in the way
Awesome, educational & fun as usual. Thanks for the great content man.
mov instruction in assembly takes more clock cycles than xor operation. If you want to zero register, it's faster to xor its value by itself than move zero to it.
His laptop can do trillions operations of that kind in second. Why even try to optimize asm?
@@iamdozerq Because you're an idiot and don't comprehend how many times CPU needs ZERO per nano second.
@@iamdozerq high level language programs can outperform poorly made assembly programs
@@iamdozerq easy to find by optimiser
@@iamdozerq”If you keep track of the pennies, the dollars will take care of themselves”
Someone else has probably mentioned it but the "d" at the end of register names stands for "double word" and the "w" for "word" where a word is 2 bytes/16 bits. The "e" at the beginning stands for "extended" and the "r" probably stands for "re-extended" or some other bs because programmers can't name things
Back in the day I found a book that described how to use the Win32 API from assembler. So I had to try it. Got some windows and graphics up. Amazingly I found it was easier to do and read afterwards than the C API.
If I recall correctly, you can pass the object file to a C compiler with a flag to not link C's `main` and to link the standard library. Then it outputs an executable that uses _start as it's entrypoint but also adds the standard library other symbols that C infrastructure expects to be present in a binary
3:43:55 I already explained why it shouldn't happen (modify position and save it back), I don't know why I argue... Wait a minute, we have leftovers from the old code where it modifies the position and saves it back! Lol.
There also is the "align" macro for fasm to automatically fill the space with some bytes to align the following data.
Very informative - with a touch of genius. Well worth a subscription!
fascinating to see how the optimization works here! also lordy those variable length ops and overlapping register slices are cursed (aka, I've been ruined by risc-v)
This is great. You just showed me what I was thinking was very complicated. Kudos for video.
36:05-36:35 was the greatest 30 seconds i ever heard on this channel
I dont know how you do it but every time I get into something the next day you release a video about it
I'm more comfortable with nasm syntax but I think there is some interesting aspects in fasm, like assembler and linker information directly in the file.
ieee754 negative 0 is helpful because 1/0 results in ±infinity depending on the sign of zero which does have use cases
I am hooked, that's coding at its very best.
7:45 oh boy we're getting there, hell yeah
13:50 ....
x86_64 is a bizarre architecture to code in assembly. It has x86 operands - which are rich and created with humans coding in assembly - mixed with newer stuff with makes more sense to compiler writers... I did a lot of x86 assembly years ago but anything after Pentium felt messier than calling Windows API.
Instead of doing the weird float operations of multiplying by minus one and so on in the "Top-Left Borders" video segment, you could just use ANDPS with the mask you get from the CMPSS instruction :)
I appreciate you coding sessions, it's amazing how frequently your release sometimes gets hard to be along, still watching past year videos Lol
2:44:27 according to GNU, malloc actually aligned by 8 bytes on x32 and by 16 bytes on x64 architectures
I work with windows instead of Linux.
I frekin memorosed the calling conventions for 32 bits and 64 bits.
In 32 bits, u just push the arguments in reverse order into the stack, and return value is in, iirc, eax in ints and st0 (yes, the fpu stack) for floating points. In 64 bits, u have fastcall for ints and vectorcall for floating points. In fastcall, first for args go into rcx, rdx, r8 and r9, then push in reverse order to the stack. In vectorcall, first 4 arguments go into xmm0-3 and then pushed in reverse order to stack, and return value is in rax for ints and xmm0 for floating points.
If an argument is too large (for example a 10 byte struct), u put the adress (pointer) of the thing as the argument
U can mix them, so if u have a function int func(int a, int b, float C, float d), in 64 bit asm, it would look like:
mov rcx, a
mov rdx, b
movss xmm2, c
movss xmm3, d
I'm aware movss doesn't allow the 2nd argument to be a number, its an example
Also, movss and mulss are exclusively for floats. ss means scalar single precision. sd is for doubles, it stands for scalar double precision.
cvt instructions convert int to floating points or vice versa, or single precision to double precision and vice versa (like the cvtss2sd instruction somewere in the first h, it was converting the float to a double for some reason)
I demanded Tsoding start streaming live on YT for at least 4 times per week 😅
A great idea in 3:04:01 is instead of using XMM1 as a vector of floats (-1), use it as a bit mask, since for true is always ones and for false is always zeros, this way you can treat the formula's multiplication as bitwise AND, and does not have to change it
Just what I was looking for. Thank you very much for your contribution.
I used to play around with some masm x64 assembly on windows a couple of years ago but SSE instructions were just too hard for me to understand. After watching this, I kinda wanna try doing that again
The address 38 in test.o can be explained as follows: check the machine code on the left, the operand to call is actually 0. Since the instruction call uses address relative to rip, a zero offset means that you are calling the function where rip currently points to, which is the next instruction at 38. The main function is put to address 0 coincidentally, so objdump displays the address relative to main, which is main+38. Now the remaining question is, why does that call instruction use a zero offset? The answer is that it is left blank for the linker to fill in.
the pain in his face when the red cube slid accross the shity box >>xddd im dying
shader's conditions isn't really slow. It CAN be slow because shaders execute one instruction in parallel for different values so if an "if" statement result in separation of code flow, it means that part that go into "if" body will be copied (which result in slowing down) and execute "if" body when other part that doesn't go into "if" body stay still and wait until other part return from "if" body resulting in additional slow down
if all values doesn't separate on "if" statement then there's nothing to copy and wait, it just go forward without any real slow downs so in shaders not an "if" itself slow but it consequences can be
for example if you just comparing constants or uniforms doesn't result in separating so it will be as fast as possible
or if you comparing alpha received from texture sampling like (color.a > 0.1), it is likely that in one shader invocation there are some pixels with 0 alpha or 1 so it will result in separation and slow run
this man is insane
Numpy and Scipy have some fortran at hearth. In general, numeric algorithms written in Fortran, stay in Fortran
I'd argue 'Kronecker delta' is as cryptic a name as 'Boolean logic'. It's just someone's name.
Honestly could improve this further. After doing the comparison, you did note that you get -1 if triggered. You could just literally take that result, convert to float and directly multiply that vector onto the velocity vector. You're already getting the -1s in all the correct slots of the vector, (if pos0.x goes out of bounds, you will get -1 in that slot and invert vel0.x)
3:10:32 apollo guidance computer: ‘looks good comrade’
hint: you can use gcc test.c -o test.S -S to generate assembly file
Не успеваю смотреть твои стримы, сейчас наверстываю плейлист по musializer'у. Спасибо за творчество
*Sorry for the english guys*
im currently working on a socket multi-threaded chat room in c++, all i can say is, dude, i've done a couple of chat rooms in other languages(js, ts, php, rust, nodejs...), but the complexity and difficulty of c++ in this kind of project is ABSURD, im stuck on how to handle multiple clients using different thread per peer, but there is something im missing about it, i wrote the code, but everytime i delete everything and start doing from the beginning so that i can learn whatever i wrote down, i stuck at the same point, then i realized how complex c++ might be when used for more advanced projects if used by yourself without any high level libraries to help you, is just a totally different concept, idk, maybe it is me, maybe im too dumb 🤣🤣
gonna need someone to give me the clip where jblow makes a specific float out of a hex. I need to see it!
Cool! First time I see a 4 hour-long stream.
I always thought of numpy+matplotlib as an open source version of matlab, and matlab is the new fortran in the sense that its basically designed for maths/numerical computing/scientists, who like matrices/vectorisation... (applied math background, it comes down to modelling problems in terms of linear algebra). Fun fact, from what I remember, calling a minimisation function in scipy in python, calls some c lib which still calls some olf fortran lib from what I remember.
Matlab is very clancky to use. When i switch out from it entirely for some reason it became easier to do a lot of things. Also matlab is very slow.
@@iamdozerq Haven't used it in a long time, I just didn't like the licence, and preferred python loops and objects at the time, now I've joined the rust cult
@@iamdozerqvery slow is an exaggeration. if everything is correctly vectorized, you will get something at worst an order of magnitude slower than pure C and at best on par
can you graph the complex plane, and 3d for PDEs with python + maths libraries? I've done maths and programming separately but haven't combined them up to now
@@nimitzpro In short yes, but what do you mean graph the complex plane? A scalar function on the complex plane is just like one on 2D space, you use heatmaps, contours. Otherwise you can use multiple 2D plots (e.g. real and imaginary part).
3D figures in matplotlib can be a little clunky but otherwise fine. There's also libs like the js based plotly, but I always used matplotlib. I never used it (I used to embed figures in Qt), but if you want more interactive controls, I'd say have a look at dearpygui.
I know jackshit about Assembly programming, but I have found this very interesting. Makes me kinda want to learn Assembly a little bit, but it looks like such a daunting task
32:48 "As you can see we have a triangle" (shows red square)
I agree f-ing around is good for learning, it does however the drawback you don’t know about guarantees or exceptions, so read up
very instructive, thank you!
2:23:12 - 100% agree with Tsoding that's "gate keeping".
And I'm also 100% sure whoever wrote that shite thinks: 1. the text is clear and understandable; 2. the diagram helps a lot; 3. last case, the pseudo-code makes everything easier.
I had to deal with this kind of engineers who can't comprehend how such documentation is only clear to whoever already knows that stuff. I hate this kind of documentation. Stop writing documentation for yourself and start writing for the readers!
hey man, it's been few months since I last checked your channel, just wanted to kick in and say fck you, you are awesome
14min in & you got my brain pumped up. Time to hit gym.h
omg Rollercoaster Tycoon 4 let`s go
How about makng a fully functional OS that can perform all the tasks like process management,file management ....etc with c and Assembly ,i realy like to follow that journey , by the way if you will decide to make it then please make sure that it will be bootable not only on the Legacy BIOS but also UEFI (Please Please make it )
Mr zozing streams are a cradle song for me
You should look into Rollercoaster Tycoon 1 & 2 for gamedev in ASM. (I've not seen the whole video yet since it was loaded up only an hour ago... ;w;)
I hope Tsoding tries "ASM++" (asm with functions, if/else, and while)
Did you mean C?
He uses C a lot, you should check out his other streams
@@Omar-fn2im I'm not stupid
@@MenkoDanyI didn’t call you stupid
so... masm?
Easy for Tsoding, no-hit Dark Souls for us.
I did it in Assembler only in 80s, simple any other solution worked fast enough.
"meine Freunde" ... i love it XD .. greetings from germany
2:18:46 I am pretty sure I figured out the shufps thingy while watching (you might of figured it out by the end of the stream idk). You have 4 float sections & you have four possible rearrangements so that’s 4^4 or 256. The last value just picks which one… The graphic they have is horrendous rofl
Like each float section in the first half of the final value can be one of four floats from the first passed floats & each float section in the second half can of the final value can be one of four floats from the second passed floats. Why is it made that way? Idk.
Yeah you got it
Last time I used assembly to program anything useful was many years ago. These new instructions are a maze, or at least that's how they look like to me. Perhaps it's a combination of sub-optimal documentation and my lack of experience.
yoooooo what a bad ass (sincere)
good job kiddoe i do have envy of your skills and applied interests
good job kiddoe
just a little heads-up, there's a discord bot that automatically notifies when you go live on twitch so you don't have to
shufps: you could look at imm8 as an array of four 2-bit numbers which chooses what part of the source register to copy to the part of the destination register that corresponds with the array index. So for 0b01'00'01'00, it basically does:
xmm1[0] = xmm0[0b00];
xmm1[1] = xmm0[0b01];
xmm1[2] = xmm0[0b00];
xmm1[3] = xmm0[0b01];
Edit: seems you figured it out, the picture is awful. Interesting how x86 is full of instructions, yet the instructions of how to use them are horrible.
we want opengl in assembly next time.
This video feels pretty epic 😁
But for i32 gcc 13 still uses x87 math coprocessor commands, not SSE :(
parsing is such a weird academic discipline. So much effort goes into studying parser generators, but every useful language has a handcoded parser that's faster and gives error messages.
This is so cool man wtf
This is amazing bro
What kind of Scheiße, meine Freunde! Love your german. Und bleib wie du bist
Streaming SIMD Extension !:)
We love you from Turkey ❤❤❤
Next episode. Build system in assembly. xD
3:11:25 can't just simply mulps xmm0, xmm0? {0*0=0, 0*0=0, -1*(-1)=1, -1*(-1)=1}
INSANE !!!
what's up with the numbering in editor
Relative line numbers.
Does someone know a good place to start learning assembly, especially with the fasm syntax?
1:09:22 Stuff you need to know about Jonathan! Evil man! :D
1:14:34 True. True. _Insane in assembeler._ :)
What roadmap do you recomend to learn assembly?
Uhm... Have you seen how I learn things?
Just think of something you want to do, think of all the things you need to know to do that, and then Google how to do those things.
Ok got it
having a background in how to import stuff from c will be a great start because you will be calling into them at some point or the other to get stuff done
also, "the art of assembly" is a great read, though my 2003 edition is a bit out of date
Zozin try to write code in machine code (not assembly)
Didn't I basically do that in the previous video about the JIT compiler tho? ua-cam.com/video/mbFY3Rwv7XM/v-deo.html