Answering Basic Assembly Language Questions - Assembly Language for Beginners

Поділитися
Вставка
  • Опубліковано 21 лис 2024

КОМЕНТАРІ • 89

  • @LukeAvedon
    @LukeAvedon 2 роки тому +54

    Your passion for assembly language is hugely inspiring!

    • @WhatsACreel
      @WhatsACreel  2 роки тому +8

      Thank you my friend!! Cheers for watching :)

    • @NeilRoy
      @NeilRoy 2 роки тому +2

      Agreed.

  • @blimolhm2790
    @blimolhm2790 2 роки тому +21

    I seem to learn more from people with a temperament like yours, jovial, unassuming in instruction. Thanks for keeping the topic straight forward, short and sharp

  • @kamertonaudiophileplayer847
    @kamertonaudiophileplayer847 2 роки тому +2

    Finally, somebody does a real programming.

  • @RickeyBowers
    @RickeyBowers 2 роки тому +1

    Glad to see you laying some of the groundwork for assembly.

  • @dylanconway7190
    @dylanconway7190 2 роки тому +3

    I could watch videos like these all day. Great way to learn something new and refresh my memory. Awesome video!!!

  • @olteanumihai1245
    @olteanumihai1245 2 роки тому +1

    Highly underrated channel! Keep up the good work!

  • @BlueRider.
    @BlueRider. 2 роки тому +8

    To sort 3 numbers, I rather suggest this method which requires 6 operations (instead of 8 in the video):
    min3 = min( min(a,b),c); //2 operations
    max3=max(max(a,b),c); //2 operations
    med3=max( min(a,b),min(c,max(a,b))); //4 operations, but can be done in 2 operations instead!
    As the temporary results of "min(a,b)" and "max(a,b)" can be kept from the first steps in registers, this method requires only 6 min/max operations!!!
    (BTW, no issue any more with floating point precision)

    • @treelibrarian7618
      @treelibrarian7618 Рік тому +1

      Well, yes, but actually no. The XOR solution is faster, since the 2 serial xors that come after finding min3 and max3 can happen in less time than the 2 equivalent min/max ops, (latency of fp min/max = fp add/sub = 4, latency of xor = 1). The initial 2 xors can happen concurrently with the min/max ops on the third unused vector port. For the same reason using XOR also reduces load on the 2 vector min/max capable ports and enables faster looping of the whole sequence, although the difference in reality is minimal (8/3 vs. 6/2 cycles throughput - about 11% faster).

  • @aaron6807
    @aaron6807 2 роки тому

    I love your videos, I can just watch them without having to think too much but still learn a lot

  • @andersjjensen
    @andersjjensen 2 роки тому +4

    This was really interesting. I've ever only come across assembly in the Linux kernel's architecture dependant code. It looks like you need to be a certain kind of masochist to enjoy the challenge of writing actual problem solving code in assembly... I should give it a try...

  • @TerjeMathisen
    @TerjeMathisen 2 роки тому +3

    The sort3() function via XOR is very neat, you could in fact do it in scalar using 64-bit integer regs!
    The scariest part when sorting fp numbers happens when you have infinities or NaNs:
    I.e. the add and subtract the min/max fails completely, both with a single Inf or a single NaN, even though the ordering is at least defined when you have a mix of regular and a single inf.
    With NaN, all comparisons return false!

  • @kjrl818
    @kjrl818 Рік тому

    I've been learning about the open source risc-v assembly. liking it so far.
    Also. keep up the good work.

  • @quantumlightum
    @quantumlightum 9 місяців тому

    Just wanted to say a huge thanks for your videos about modern x64 assembly. I'm teacher and I'm preparing a course about x64 assembly and your videos helped me a lot. Many thanks. Subscribed to the channel. Concerning the side effect of 32 bits operations on high 32 bits of 64 bits registers, after doing some search, it seems that there is no physical RAX, RBX etc. registers, but there is a bank of registers and registers are allocated depending on instructions and then merged when instructions are completed... may be for optimisation reasons it is faster to just put zeros in the 32 high bits... but indeed it's a strange effect.

  • @change_profile_n8755
    @change_profile_n8755 2 роки тому +3

    So I found your content due to the fact that I'd like to start with an understanding on how computers work. I just started out learning Java as well as assembly. I don't do that because of commercial reasons but for reasons of fascination. Thank you for your work! Greetings from Switzerland 🍾
    BTW: I have no experience in programming/computer architecture, but built an 8bit calculator in Minecraft (Redstone). This gave me a huge fascination to core concepts in CS and EE. Highly recommend this!

    • @Eidolon2003
      @Eidolon2003 2 роки тому +1

      I highly recommend Ben Eater if you're interested in the nitty gritty details of the hardware. I built a redstone computer as well as a physical one based on what I learned from his videos!

    • @change_profile_n8755
      @change_profile_n8755 2 роки тому

      @@Eidolon2003 I actually was watching his "Hello, World from scratch" video about a week ago :D. Oh wow, a physical as well? How was the process?

    • @Eidolon2003
      @Eidolon2003 2 роки тому

      ​@@change_profile_n8755 It was a long but very rewarding process. Being able to say that I fully understand how it functions is really cool, especially since I didn't really stick to Ben's design at all. His is really simple, but not very capable tbf. I think knowing how a simple computer like that works makes it easier to understand how a modern x86/64 system works too. Honestly modern computers never cease to blow me away. They're so complex it's insane lol

  • @thatcrockpot1530
    @thatcrockpot1530 2 роки тому

    I'm always happy to see you posting

  • @NeilRoy
    @NeilRoy 2 роки тому +2

    Fascinating stuff. Love your videos, I'm always impressed by your knowledge and find this all VERY interesting! Keep up the good work, thanks. 🙂

    • @WhatsACreel
      @WhatsACreel  2 роки тому +1

      Cheers mate! Thanks for watching :)

  • @unperrier5998
    @unperrier5998 2 роки тому +3

    There's another concern with the sorting of 3 numbers method involving min/max/substract technique that is worse than losing floating point precision: if all three floating point numbers are close enough to the absolute maximum representation, adding them will overflow.
    Not sure what an overflow looks like with floating points, but if it's like with integers you'll get something very wrong in the end.
    In any case, thanks for the video, that's interesting. I'd be for a follow-up with more usual patterns and tricks. And maybe another video about ARM and RISCV assembly at some point?

    • @aaron6807
      @aaron6807 2 роки тому

      An overflow in floating point will probably either be an Inf or a NaN

  • @programaths
    @programaths 2 роки тому +2

    Reminder of XOR⊕ property:
    a⊕a=0
    a⊕b=b⊕a
    (a⊕b)⊕c=a⊕(b⊕c)=a⊕b⊕c
    a⊕0=a
    So, if we call the 3 registers a,b and c respectively and min and max as m and n respectively, we have the following expressions:
    a⊕b⊕c (xor the 3)
    (a⊕b⊕c)⊕m⊕M (and xor with min and max)
    Let say m=a and M=c (could be any pair), then the expression becomes:
    (a⊕b⊕c)⊕a⊕c
    Per the above property we can remove parenthesis:
    a⊕b⊕c⊕a⊕c
    We can move values and group them:
    (a⊕a)⊕b⊕(c⊕c)
    We can also reduce the parenthesis:
    0⊕b⊕0
    Which evaluates to:
    b

  • @first-thoughtgiver-of-will2456
    @first-thoughtgiver-of-will2456 2 роки тому +3

    We need to get all the language experts in a room (you being one of them) and create another assembly abstraction like C but with modern memory protection and better/modern op representation built in to the syntax but still being a "mid/low level" structured typed functional programming language that closely represents the codegen.

    • @alexvitkov
      @alexvitkov 2 роки тому

      ok

    • @lapatatadelplato6520
      @lapatatadelplato6520 2 роки тому

      you're not gonna get a functional programming language if you're abstracting assembly. You'd be better off making a procedural language bc it fits the architecture more, but C already exists, so I don't see the point.

  • @ebbflow4591
    @ebbflow4591 2 роки тому

    Excellent stuff, amazing channel!!!!!

  • @devmishra18
    @devmishra18 2 роки тому

    I don't even wanna learn assembly, but I still watch your videos as they make me feel smart.

  • @programaths
    @programaths 2 роки тому +3

    Oh, just discovered CMOV 😢 That would have been extra useful when I was doing CS. The worse part is that I read through helppc at the time to find useful mnemonics we didn't learn. I don't know how I missed that one! That shows how basics can benefit everyone ^^

    • @TerjeMathisen
      @TerjeMathisen 2 роки тому +1

      CMOV (on x86 cpus) is very rarely a win! The branch predictors are so good that it is _almost_ always faster to simply load the one possible return value, then branch over a single instion that loads the alternative:
      ;; EAX has a, EBX has b, return the smaller of a & b:
      cmp eax,ebx
      jl done
      mov eax,ebx
      done:
      When EAX is the prevalent smaller value, then the cpu will predict this correctly and run the entire block in a single cycle (or even less if there is some other work which can overlap).
      With EBX being the return value we also have to execute the MOV, but this can be done in the renamer and so don't actually take any cycles! 🙂
      The CMOV version will always take the same number of cycles, typically 2 or 3. (There are other architectures where CMOV is much faster, sometimes down to one or zero cycles.)

    • @Double-Negative
      @Double-Negative 2 роки тому

      The renamer can take extra time depending on surrounding code, so it’s not always free

    • @TerjeMathisen
      @TerjeMathisen 2 роки тому

      @@Double-Negative Sure, I thought that was clear from the way I wrote it, but I see now that it wasn't. Anyway, absolute worst case a MOV REG,REG takes a single cycle unless the CPU is from before about 1992 (Intel 486). 🙂

  • @maxmuster7003
    @maxmuster7003 2 роки тому +1

    Using intel syntax:
    1. fast addition 16 bit instructions
    LEA bx, [bx+si] ; no memory access, no flags touched, result have to fit the target
    32 bit:
    LEA ecx, [ecx+eax]

  • @dookshi
    @dookshi 2 роки тому

    Great content, always keeping me stoked for the next video. For clarity sake, don't you think you should update the leftover comments that still state that xoring "sums" or "subtracts"? You even sinfully say it out loud. 🙂It accumulates and extracts which is good enough and just what we want but far from adding or subtracting. Keep it up pleeeease! 👍

  • @andrepoelman416
    @andrepoelman416 2 роки тому

    Nice video! I am probably too late to the party, but I think you didnt answer question 3. The question was how to count bits in a dword on a 8086 (16 bit processor). No fancy bitcount instructions there.

  • @zxuiji
    @zxuiji 2 роки тому

    Just a note for those implementing FPN comparisons via binary, treat the sign, exponent & mantissa as separate comparisons:
    int cmpf( fpn a, fpn b)
    {
    int sigA, sigB, expA, expB;
    intmax_t manA, manB;
    /* Extract info */
    ...
    if ( sigA - sigB )
    return -(sigA - sigB);
    if ( expA - expB )
    return -(expA - expB);
    return cmp(manA,manB,bits);
    }
    fpn minf( fpn a, fpn b ) { return cmpf(a, b) < 0 ? a : b; }
    fpn maxf( fpn a, fpn b ) { return cmpf(a, b) > 0 ? a : b; }
    Doing it that way avoids the possibility of incorrect return values (provided I got the signs the right way round in cmpf)

  • @mikesbasement6954
    @mikesbasement6954 2 роки тому

    Another potential problem with the additive method is the potential for overflow. Sure, it's not likely with three values, but it is possible. What instruction format do you prefer (AT&T, Intel, NASM)? Over the years I've found I'm liking nasm more.

  • @zeyogoat
    @zeyogoat 2 роки тому

    You've been teaching this chem teacher to code for years now. Cheers! One question I have: How would you sort *four* numbers in asm?

    • @WhatsACreel
      @WhatsACreel  2 роки тому +2

      Ha! I reckon BubbleSort would do the trick :)

  • @allmycircuits8850
    @allmycircuits8850 2 роки тому

    Drawback of these sorting methods is they can't be applied if there are not only "keys" but also values which should be sorted with these keys. Plane old bubble sort in that case, I presume...
    Nice video nevertheless!

  • @maxmuster7003
    @maxmuster7003 2 роки тому

    x86 cornditional jump instructions:
    for unsigned values
    JA jump above
    JB jump below
    ...
    for signed values
    JG jump greater
    JL jump less

  • @ged9925
    @ged9925 2 місяці тому

    Excellent!

  • @programaths
    @programaths 2 роки тому

    Also, fp error are "weird" as the gap between "consecutive" numbers just widen like crazy as you get far from 0. (expected since mantissa has a finite precision ^^)
    I think that fp introduces too much weirdness because of that and can be a big hurdle for beginners.

  • @pierreuntel1970
    @pierreuntel1970 2 роки тому

    Oh yes, I remember when I was a young lad and started writing my first code in AutoIt and trying to figure out what's ASM and I was like... "WTF are these? are they just there for moving numbers around, adding and subtracting them? What for?" as I was trying to create a program with nice UI and messageBox and stuff... I'm pretty sure there are many peoples out there having the same question when looking at ASM at first ;) one day it just clicks and I still have no idea what I'm doing with ASM most of the time but can read and understand some parts of it.

  • @rsk5714
    @rsk5714 2 роки тому

    Hey man you got nice skills and look & talk similar to Mr.rocky balboa ! 👍🙂

  • @michaelwilkes0
    @michaelwilkes0 Рік тому

    If we can use xor to get perfect float math, why dont all cpus just always do that? Why is floating point error still a problem we have to deal with? Even assuming that trick does not work with multiply and divide, making add and subtract perfect would be amazing.

  • @Eidolon2003
    @Eidolon2003 2 роки тому

    Could you possibly do an explanation of how to call C library functions like puts() from assembly, or maybe just link to a guide with the correct answer? I've found a couple different guides online and I couldn't get it to work for one reason or another. I'm just not experienced enough to know why. I'm using VS2022 btw

    • @mp-lv8bw
      @mp-lv8bw 2 роки тому

      he covered this 11 years ago
      ua-cam.com/video/txFXiFafTTc/v-deo.html

  • @gower1973
    @gower1973 2 роки тому

    What’s your day job? Are you a systems engineer? Or do you contribute to open source projects, is that ebook just for patreons or can anyone read it?

  • @davidprock904
    @davidprock904 2 роки тому +2

    So what about writing in assembly language an application like FreeCAD or autodesk fusion 360

    • @WhatsACreel
      @WhatsACreel  2 роки тому +2

      I would personally write the forms, buttons and front end in C++ or C#, and just keep ASM for the number crunching. I'm not sure I have the engineering skill to organize a very large scale, 100% Assembly project like that!
      It would certainly be a challenge :)

    • @pyromen321
      @pyromen321 2 роки тому

      RollerCoaster Tycoon comes to mind!

    • @adivp7
      @adivp7 2 роки тому +2

      That would be a massive drain of time and effort. Not much to gain and much to lose. What's to be written directly in assembly has to be important enough to be justified being written in assembly.

  • @williamdrum9899
    @williamdrum9899 2 роки тому

    I have a question about ARM Assembly. If you use malloc, will the kernel try to give you a pointer that is 8-bit rotatable (i.e. can be loaded into a register using a single instruction?)

  • @dennisrkb
    @dennisrkb 2 роки тому +1

    Does a jump always have to immediately follow a cmp? Or could you execute some other instructions in-between?

    • @WhatsACreel
      @WhatsACreel  2 роки тому +4

      Some instructions don't affect the flags, so you can execute some instructions between. Mostly MOV doesn't change the flags. Usually the CMP and Jcc are close by though.

    • @williamdrum9899
      @williamdrum9899 2 роки тому +1

      Usually on x86 the answer is yes, but ultimately it depends on the instructions you're using. On ARM, RISC-V, and MIPS you can do whatever you want in between.

    • @pyromen321
      @pyromen321 2 роки тому +1

      If you look at compiler output, jumps are often put far after cmps or other flag altering instructions! I’ve seen a loop where the comparison was a SUBS instruction at the very top of the loop like 40 instructions before the branch.
      I think compilers strive for this because it essentially guarantees that the comparison result will be completely done before the branch is hit, preventing branch prediction miss penalties.

    • @maxmuster7003
      @maxmuster7003 2 роки тому +1

      Imagine since Intel Core2 architecture we can execute 4 integer instructions parallel, if there are no depency between and if the code have a good mixture of complex and simple instructions in the pipelines. This is not a CISC CPU, it is a mixture of RISC and CISC. The CPU split complex x86 instructions into micro ops to execute with some of the RISC units.

    • @dennisrkb
      @dennisrkb 2 роки тому

      @@pyromen321 Could that actually become counter-productive at some point? I.e., could it happen that by the time you reach the jmp, the cmp has been evicted from the intrsuction cache?

  • @Pistolsatsean
    @Pistolsatsean Рік тому

    I got a question bout assembly.
    If you have a determinate size loop, does it execute faster if written out line by line? (even if marginally so)

  • @zxuiji
    @zxuiji 2 роки тому +1

    12:55, nah if I was to code that I would've skipped eax completely:
    mov ecx 17
    mov edx 32
    cmp ecx, edx
    cmovl edx, ecx
    ret

  • @infinitesimotel
    @infinitesimotel 2 роки тому +1

    I love assembly, the problem is that it gets too addictive and you want to use it evrywhere LoL..

  • @lohphat
    @lohphat 2 роки тому

    How do compilers generate object code which can run on the variety of AMD64 family CPUs?
    There are so many variants which have extended complex action opcodes, how can the compiler know when to use those opcodes? I know there are compiler flags but in software distribution it’s impossible to know ahead of time which CPU instructions are supported. How is this handled at runtime?

    • @rsa5991
      @rsa5991 2 роки тому

      There is a CPUID instruction, that returns information about supported instruction sets. You can patch your code at the start of the program. You can also compile several versions and make installer pick one, depending on CPUID.
      But the default behavior is to target old enough CPU and just crash, if even older one is present.

  • @yusufat1
    @yusufat1 2 роки тому

    what happened to FADD (float add) instruction, why do we use SIMD all the time for one floating-point value?

    • @mp-lv8bw
      @mp-lv8bw 2 роки тому

      faster and easier to use simd registers

  • @zxuiji
    @zxuiji 2 роки тому

    19:17, I thought you would do x = min(a,b), y = max(b,c), z = (a+b+c)-(x+y)
    **Edit:** Gave it more thought and noticed a scenario where the wrong answer would be given, I'll leave finding that as a thought exercise for peops who care

  • @ChrisM541
    @ChrisM541 2 роки тому

    Excellent video, cheers for the upload. I wish there was conditional moves back in the day with the 6502/10, then again, I like spaghetti.
    For the count the set bits question, using that 8bit 6510, I would look at bit shifting e.g. ASL of the value being examined (split over 4 bytes), examining the carry flag and increment a counter if set. I wrote the following as one way to do the job.
    ldy #4 ;4 bytes
    lp0 lda Data-1,y
    ldx #8 ;8 bits
    lp1 asl
    bcs BitSet
    lp2 dex
    bne lp1
    dey
    bne lp0
    rts
    BitSet inc Count
    bne lp2 ;faster than jmp and ok to use as long as Count never wrapped to 0
    ;faster if below in zero page
    Count byte $0
    Data byte %11110000,%00000111,%10101010,%11100111

    • @rsa5991
      @rsa5991 2 роки тому +1

      Instead of conditional jumping, you can use ADC #0 to add CF to A.

    • @ChrisM541
      @ChrisM541 2 роки тому

      @@rsa5991That's a really good idea. Could you provide a working example for the 4 bytes? - my brain was too sore to write an optimised version for the 4 bytes.

    • @rsa5991
      @rsa5991 2 роки тому +2

      @@ChrisM541 I don't have any 6502 tools, so I cannot confirm it working, but:
      ldy #4 ;bytes
      lda #0 ;count
      byteLoop ldx Data-1,Y
      stx 0 ;use zero page to store current byte
      asl 0 ;"prime" the loop
      bitLoop adc #0 ;spoils ZF, so should be before ASL
      asl 0 ;sets CF for ADC, ZF on last '1' shifted out
      bne bitLoop
      adc #0 ;count last '1'
      dey
      bne byteLoop
      rts ; reg A holds the result
      UPDATE: Had checked on online emulator, seems to be working. Runs in 312 cycles vs yours 506.

    • @ChrisM541
      @ChrisM541 2 роки тому +1

      @@rsa5991 Fantastic! works perfectly! - I've CBM Prg Studio installed, working on old platformer CDU magazine entry I submitted a 'wee while' ago - wish I had that utility then.
      Always nice to see different ways to solve problems, cheers.

  • @FalcoGer
    @FalcoGer 2 роки тому

    Okay, adding two numbers together is easy. but that's not really helpful, is it? What we want isn't to read the result off in the debugger, or to change our code to change which numbers to add. In other words, I/O is missing. I know to deal with I/O you use syscalls, system interrupts or in embedded devices access the mapped memory of attached devices and read/write values to specified addresses which map to those devices, possibly in response to an interrupt.

  • @bpark10001
    @bpark10001 Рік тому

    Your second scheme is what I refer to "converting algorithmic operation to arithmetic operation" eliminating branches. The routine is "straight-line code". The complexity of code is proportional to the number of branches in it.
    Your scheme for calculating the number of 1's in a number only works if you have the special instruction. Without one, I have scheme for determining if the number has one or fewer 1's in it. Copy the number to another register. Decrement one of the numbers. Then do a bitwise AND between the 2 numbers. If it is zero, the number had one or fewer 1's.
    ROUNDING CRAP: get RID of floating point math! Floating point math belongs only in hastily "slapped together" programs written to get a quick answer. Most programmers are too lazy to properly scale their numbers.
    For the sort: as you explain in your sort videos, there are 3! possible = 6 outcomes. Do 3 compares, say 1&2, 2&3, & 1&3. After each compare, shift the carry flag (is set if 1st arg >= 2nd arg) into register with SHIFT LEFT (with carry) into precleared register. You have 3-bit number (8 possible outcomes, of which 6 are "legal"). Use this to index into a look-up table of the swaps required. Put the index of the "get" of the swap into the table of 8 entries.
    Example: let's say : A>B, A

  • @SimGunther
    @SimGunther 2 роки тому

    I think the important takeaway here is that if you haven't experienced the pain of making your assembler for a fictional CPU, you don't truly know the assembly meta.

  • @waynemv
    @waynemv 2 роки тому

    How, in assembly language, does one write function that takes two or more arguments and returns a result? And how does one afterwards call that function from other languages, such Python, C++, C#, or F#?

    • @WhatsACreel
      @WhatsACreel  2 роки тому +2

      Well, there’s videos on here for doing some of these things. Mostly very old videos. Calling from C++ is easier than C#, and I found that writing a wrapper in C++, and then calling that from C# was maybe the best way? C# just has a lot of extra type safety and memory management issues you have to work with.
      To call native code from C#, I have found it convenient to compile the native code to a DLL. Then you use something like ‘interopservices’ and ‘importdll’ from C# to import the functions you want to use. Something like that, you’d have to look up the details.
      As for calling native ASM from C++, there’s a lot of ways. If you’re in 32 bit, you can code inline ASM. If you’re in x64, then it’s a little trickier, but a lot of the videos on my channel here involve calling ASM from C++, so maybe if you have a look at one of the early ASM and C++ vids, you will see one way to do this. I’m pretty sure we did this in the very first video I uploaded.
      You might want to try assemble to a library file, either LIB or DLL, and link to that in your C++. I do not usually do this in these videos because they’re usually just little code snippets, but in a real project it helps to set things out like that. Then you’re looking for how to call a native DLL from C++, which is bound to get plenty of results on googs.
      As for Python and F#, I must say I have no idea sorry.
      Hope this helps, have a good one :)

    • @waynemv
      @waynemv 2 роки тому

      Thank you.
      I've figured out how to do some of that. I have DLLs created from legacy code written in Fortran that I then call from C# using interop services. But in that situation, the Fortran compiler made the DLL for me, so I didn't learn much, if anything, about the internal layout of the DLL in the process.
      Do you have any link to clear instructions on how to code a DLL from scratch in pure assembly?

  • @decky1990
    @decky1990 2 роки тому

    Do you have Irish in your family??

  • @gilmannayeem4340
    @gilmannayeem4340 2 роки тому

    But that comnt is 4 months old

  • @dennisrkb
    @dennisrkb 2 роки тому +1

    Damn mate it's nice to have you back but you put on some weight. Please don't let it get any worse!

  • @josephmoore9609
    @josephmoore9609 2 роки тому

    🌸 𝓹𝓻𝓸𝓶𝓸𝓼𝓶