Assembly Language Misconceptions

Поділитися
Вставка
  • Опубліковано 4 тра 2024
  • Support What's a Creel? on Patreon: / whatsacreel
    Office merch store: whats-a-creel-3.creator-sprin...
    FaceBook: / whatsacreel
    In this video we look at some misconceptions about Assembly language. Apologies for the sound in this one (and possibly the next too!), the main audio was not usable so I have used the camera mic instead.
    Instruction listings by Agner Fog: www.agner.org/optimize/instru...
    Software used to make this vid:
    Blender: www.blender.org/
    Audacity: www.audacityteam.org/
    OBS: obsproject.com/
    Davinci Resolve 16: www.blackmagicdesign.com/prod...
    OpenOffice: www.openoffice.org/
    Gimp: www.gimp.org/

КОМЕНТАРІ • 582

  • @mheermance
    @mheermance 3 роки тому +394

    I learned to program in the 80s when compilers stunk, and it was a piece of cake to beat them with hand coded assembly. As a result many projects were written in assembler to run on older and newer hardware. The advent of efficient compilers was a godsend, and for work I was glad to see it sidelined. But for fun I still code in assembly because building high level features like lambda functions or garbage collectors from the ground up teaches you a great deal.

    • @sallylauper8222
      @sallylauper8222 3 роки тому +12

      Yeah, I thought it was really inarestin that he said that today to write faster assembly you have to know all the tricks of the compilers.

    • @SunMasterXIV
      @SunMasterXIV 3 роки тому +12

      I used Lattice C (and 68k assembly) on the Amiga in the 80s, and I thought it was pretty good. But the way modern compilers are able to optimize the code is sometimes amazing. It doesn't help tailormake assembly that so many x64 CPUs variations are available, where instructions execution time vary.

    • @AURORAFIELDS
      @AURORAFIELDS 3 роки тому +3

      68000 is a good example of why C compilers are not good for everything. A lot of the efficient code relies on passing arguments via registers, while C relies on stack frames. Memory access on the 68000 is really slow, so automatically C will be slow too.

    • @mheermance
      @mheermance 3 роки тому +7

      @@AURORAFIELDS true, but many C compilers implement fast call linkage. They pass by registers and the called function saves on the stack if it calls another function.

    • @Ehal256
      @Ehal256 3 роки тому +1

      @@mheermance finding a compiler that does that for the 68k nowadays however, is quite difficult. GCC doesn't, and while llvm recently added support, I doubt it does either. Maybe something from the 80s, but I'd rather code things by hand when performance is really important.

  • @spacewolfjr
    @spacewolfjr 3 роки тому +191

    I work in CyberSecurity and end up using assembly a lot when reverse engineering / disassembling malware, it's an essential skill for that kind of work

    • @shanehebert396
      @shanehebert396 3 роки тому +28

      Well... you have to since I doubt the malware writers are going to give you the source and all you have is the executable ;)

    • @tappineapple3381
      @tappineapple3381 3 роки тому +5

      Did you go to college? If so what did you major in? I am currently a junior in high school and I would like to further learn about reverse engineering and getting better with stuff like IDA and reclass. Any advice?

    • @y2ksw1
      @y2ksw1 3 роки тому +1

      Agreed.

    • @y2ksw1
      @y2ksw1 3 роки тому +11

      @@tappineapple3381 I suggest to disassemble Viruses. Most of them are brilliant examples of engineering and most of them are made by true masters of art.
      The next step I suggest, is to make your own operating system. If you master this step, you will have no problem to solve all other problems you may come across.

    • @tappineapple3381
      @tappineapple3381 3 роки тому +4

      @@y2ksw1 Thank you!, I have been following the tutorials on guided hacking and I have very much enjoyed reversing video games and I feel like malware would be the next best step. Now, making an operating system scares me.

  • @randyscorner9434
    @randyscorner9434 3 роки тому +80

    With current compiler technology there is one area where the move to assembly provides massive advantages. That is when you can vectorize the code to fully use the SSE and MMX extensions. For one routine, unrolling the loop 1 time fit the register set, allowed 8 wide vector calculations and increased the overall performance of a high end electronic piano by 12X. This was sufficient to move the program off a new Mac to a RPI3. The load went from 40% of the CPU on the Mac to 9% of the CPU on the RPI3 with just one thread. Getting to this point with a high level programming language requires a different compiler and coupling that to C or C++ is much harder than doing the 60 assembly instructions by hand.
    It's all about how badly one or two routines dominate the runtime. It's often the case that these "hotspots" can get extra love and show major performance improvement. Of course, the best optimization would be to stop using Python as production code.....

    • @thomasmaughan4798
      @thomasmaughan4798 2 роки тому +26

      "Of course, the best optimization would be to stop using Python as production code"
      LOL 🙂

    • @FM-tq2gs
      @FM-tq2gs Рік тому +2

      Newbie question: why can't compilers do that kind of optimization? Will they be able to one day?

    • @Mr8lacklp
      @Mr8lacklp Рік тому +15

      ​@@FM-tq2gs they will be able to do it sometimes in the future but there are really two problems here:
      One is that the compiler can only do an optimization if it can prove that it won't change the behavior of the program for any value it might possibly see and it simply doesn't have all the information as all it sees is the source code. You might for example have a number that represents the day of the week so *you* know it's never going to be greater than seven but the compiler can't know that so it can't apply any optimizations that assume that the number won't be greater than seven. So there are some optimization you can do that are literally impossible to do for a compiler no matter how advanced.
      The other problem is that both finding an optimization and proving that it doesn't change the behavior of the code are very difficult and not generally things computers can do at all. And this is where compilers are steadily getting better but it's very possible that there are some optimizations that will just never be worth the longer compile times or the effort of implementing them.

    • @FM-tq2gs
      @FM-tq2gs Рік тому +3

      @@Mr8lacklp thank you for the explanation!

    • @robegatt
      @robegatt 9 місяців тому

      ​@@Mr8lacklpyeah, that is why some programming language are better than others... a Pascal compiler could easily do what you said in the first example.

  • @craigmhall
    @craigmhall 9 місяців тому +6

    I rarely write in assembly any more, but it's good to know for:
    -debugging release / optimized code
    -studying the generated assembly and finding ways to tweak the source code to generate better assembly
    -generally understanding how the machine works, what is expensive and what is not

    • @AliaAsten
      @AliaAsten 9 місяців тому +2

      This! I personally write asm only as a hobby for microcontrollers, where cycle-level timing is sometimes required (the rest of the time C suffices), but I read it a lot more as disassembled code for the reasons you mentioned.

  • @wingunder
    @wingunder 3 роки тому +63

    "If you can help yourself, try not to write a virus." 😂😂😂
    You should put this quote on a t-shirt. Your sense of humor is simply wicked 👍

    • @OpenGL4ever
      @OpenGL4ever 9 місяців тому

      I love that line.
      And the background to that is, if you can do that, you don't need to write a virus. You will also find a well-paid job without having to drift into the criminal corner to make a lot of money.

  • @ChiliTomatoNoodle
    @ChiliTomatoNoodle 3 роки тому +245

    Really good information quality and density here. This guy knows his stuff.

    • @WhatsACreel
      @WhatsACreel  3 роки тому +24

      Means a lot brus! You are a legend, Chili :)

    • @classicnosh
      @classicnosh 3 роки тому +10

      @@WhatsACreel - He's not wrong. I learned Pascal and C wasn't really taught in my school since Pascal was considered "academic". Assembly was also easier in those days since the microcomputers were much smaller and it was possible to really understand the memory map. Nowadays, the philosophy is very different. The rule of thumb is, don't try to outsmart the compiler. ;)

    • @tootaashraf1
      @tootaashraf1 2 роки тому

      The c++ guy

    • @Andoxico
      @Andoxico Рік тому

      ayy it's papa Chili

  • @lgrantcdg
    @lgrantcdg 3 роки тому +35

    Excellent talk!
    In the 1970s at General Motors Research Labs, they ran an experiment with a PLI-based computer graphics system. They recoded a few high-usage routines in assembly language. The system got faster. Then they recoded them in PLI and the system got even faster. Then they recoded them in assembly language again, and it got faster still.
    It turned out that each time they recoded the routines, they improved the algorithm, and that made much more of a difference than which language they used.

  • @SimGunther
    @SimGunther 3 роки тому +151

    Gotos are NOT considered harmful
    Wormholes in the other hand are considered VERY harmful

    • @k7iq
      @k7iq 3 роки тому +31

      If one does not like "goto" then just rename it to jmp and then it's OK because it's what the compiler might output in assembly anyway ! 😁

    • @imperatoreTomas
      @imperatoreTomas 3 роки тому +4

      Goto is my favorite function

    • @programaths
      @programaths 3 роки тому +8

      In BASIC, well, it was very present. I learned that on my own and was used to put GOTO everywhere as it was the way to skip code based on a value "ON x GOTO label1,label2,label3" (or line numbers!)
      Then I used GOTO also to recycle code (as in GOSUD).
      Very good for state machines too, even if I didn't know it had a name.
      Then I had to take visual basic courses at school and the teacher was pulling her hair reading my code...no FOR and IF, GOTO worked just fine. On top of that, I kept my habit of reusing code.
      I am not even sure I would be able to understand my own code as I totally forgot that habit. Still, have good memories of that because the teacher ended up saying she will not correct it anymore and just give points for it working as intended. ^^ At the same time, others had troubles to understand what a variable was and I had already implemented snake and Sokoban just for fun :-D
      (As devs, we find it to be very simple, but I taught a bit too and this is a huge hurdle!)

    • @LionKimbro
      @LionKimbro 3 роки тому +16

      Wormhole = en.wikipedia.org/wiki/COMEFROM

    • @roygalaasen
      @roygalaasen 3 роки тому +3

      @@programaths when I started out with computer classes back in 1991, we had to draw flowcharts before we were allowed to write a single line of code. Only one entry point, one exit point and no lines were allowed to cross, essentially banning goto entirely.
      Now my favourite programming language, Swift, is sometimes forcing you to use a label to tell which loop you want to BREAK out of, which is essentially a goto in disguise.
      My brain cringes but I have to get used to it lol
      Edit: to clarify. Break in all programming languages breaks out of neared LOOP. If you are in a switch .. case you will still break out of the nearest loop. In Swift you will break out of the switch case, still stuck in the loop unless you label the loop you want to break out of.

  • @guillermoleon0216
    @guillermoleon0216 3 роки тому +19

    First Assembly I ever learned was for the Z80 and I absolutely loved it! I don't use it at work but getting to know it taught me a lot about how computers work.

  • @kevinjensen3056
    @kevinjensen3056 3 роки тому +12

    Been programming in assembly and C since '79. Assembly is still widely in my field of embedded programming, but I haven't needed to resort to it for years. The code density that an expert on the CPU can achieve in assembly is incredible. Still most of what you've said is correct for most complex CPUs, but some comments are a little inaccurate for embedded processors today. Most MCU core instructions are still atomic, but the problem of mutilthreaded read write race conditions still apply when the data size is less than the buss width. This sort of issue appears in most interview tests for embedded programmers.
    You really should do a lecture on race conditions at the sub instruction level (as you just did), the instruction level, at the thread level, the o/s level and even beyond.
    Liked your lecture on radix sort. Never tried that one before. Keep up the good work.

  • @ricos1497
    @ricos1497 3 роки тому +13

    If I'm to take just one thing from this video its that I shouldn't write viruses. One virus, absolutely fine - or recommended perhaps - viruses, not. Great advice, thanks.

  • @clickrick
    @clickrick 3 роки тому +6

    I'm glad you got to the point that there are assembly languages for just about every processor and didn't allow people to assume that x86 is all there is.
    As someone who has written assembler on ICL 1900, IBM 360 & 370, DEC PDP 11, as well as microprocessors like the 6502 and Z80, I've become aware of just how different the fundamental architectures are, in particular addressing modes.

  • @DownhillAllTheWay
    @DownhillAllTheWay 3 роки тому +10

    12:15 "Assembly language is the language of the hardware."
    Permit me to nit-pick. *_Machine language_* is the language of the hardware. Asm is a near-English representation of it.
    Many years ago, I had access to a Data General Nova computer (it was the back-up machine on a customer site). I knew how to swap modules, and I was OK at hardware maintenance (scopes, and that sort of stuff) but I didn't know anything about computers at the time. By reading the manual, I entered a 3 (in binary) into a memory address, and a 6 into another address using the front-panel switches, then I wrote an instruction in machine code to add them together - and it produced a 9 in the destination address - a thrill that I remember to this day.
    I learned the machine code pretty well on that machine, and wrote an assembler in binary code. I had been intending to write diagnostics on the machine, but I moved on before I did that, and never used my (rather strange) assembler. Well, I had never seen an assembler up to that point, so I didn't have much to go on.

    • @ancapftw9113
      @ancapftw9113 2 роки тому

      The best example I saw was a guy making a 6202 (I think) program by writing to a ram chip and feeding it into the processor. He showed what the assembly would look like, but had to program it in hex code.

  • @ParagonX13
    @ParagonX13 2 роки тому +8

    i'm a young person and i taught myself reverse engineering/assembly over the past several years (messing around with disassemblers and searching my questions on the internet) and actually enjoyed it way more than i thought i would... at first it was just a means to an end but i very quickly grew fascinated with it all. i have no idea what to do with this passion though other than hobby projects... :p

    • @OpenGL4ever
      @OpenGL4ever 9 місяців тому

      If you need a playground. Many open source audio and video codecs are already optimized for the x86 and ARM architectures, but this is not yet the case for the RISC-V architecture. So you could buy a single board computer (SBC) with a RISC-V CPU and then see what could be optimized there. You would need to learn RISC-V assembly though.

  • @draconite
    @draconite 2 роки тому +11

    #1: This does depend on the architecture you're building for. Compiling for the 68000 with GCC, it's easy to beat the compiler if you know what you're doing

    • @OpenGL4ever
      @OpenGL4ever 9 місяців тому

      You've already made an assumption here, using a specific compiler. On the other hand, if you use a compiler that is optimized for the use of fast calls and 68k, then it can look different.

  • @mattias3668
    @mattias3668 3 роки тому +49

    There are some case were you want to use assembly for performance because the compiler will not choose the best instructions for your good. For example, if you are addition on bigints, you will probably with to use the addition with carry instruction, which the compiler probably will not be able to figure out that it can use. And there are probably a large number of very specialised instructions like this, I imagine for example that the compiler won't use the SHA or AES instructions.
    Not only are there different assembly languages for different architectures, you also have different dialects for different assemblers.

    • @WhatsACreel
      @WhatsACreel  3 роки тому +18

      I absolutely agree!

    • @shanehebert396
      @shanehebert396 3 роки тому +10

      You would hope that if you are using a library that's implemented bigint or SHA/AES that the people who wrote the library used intrinsics to implement the library calls.

    • @mattias3668
      @mattias3668 3 роки тому +9

      ​@@shanehebert396 Actually, I wouldn't necessarily hope that. When I implemented addition for bigint, GCC didn't have a good intrinsic for doing add with carry (I don't know it it has now), the closest it had was addition with overflow detection, which it couldn't optimised, so inline assembly was necessary for good performance. So you want your bignum to use inline assembly in this case, and then just add a portable fallback for unknown architectures. In other situations, intrinsics may work just as well, but in these cases you still need a portable fallback, so the older reason to use intrinsics instead of inline assembly in these situations is that the intrinsics may be supported for multiple architectures, and hopefully most compilers will recognise them, but that's not necessarily they case, and it is more likely that they will recognise the inline assembly.
      Similarly, intrinsics for SHA/AES, if there even are any, are not portable.

    • @shanehebert396
      @shanehebert396 3 роки тому +3

      @@mattias3668 yeah, that's the beauty of conditional compilation ;) if the arch is detected, use the version of the library that uses intrinsics, if not, fall back to the library made from portable code. Then it's up to the library providers (or an interested 3rd party in the case of open source) to add to the project.
      But yes, you're also at the mercy of the compiler and how it generates code (gcc, in your case, with add with carry).

    • @andrewdunbar828
      @andrewdunbar828 2 роки тому

      Rotate instructions are also not accessible from your high level language. Endian-switching instructions used to be inaccessible too but various compiler + CPU combos I looked at a while ago could recognize most ways to do endian switching in C and produce the right ASM code... but not always!

  • @starpawsy
    @starpawsy 2 роки тому +2

    Most successful assembly program I wrote was in 1992. I did a square root function using Newton's method, that was faster than what the compiler of the day provided in the maths library! In those days, the width of the floating point divide register was 80 bits. Dunno what it is today. This might not work today.
    As an aside, some people night say "only 80 bits"? Well, consider that 80 bits == 24 significant decimal digits. Consider that if you measure the diameter of the known universe to 24 significant figures, the last figure is less than the classical diameter of a hydrogen atom.
    Newton's method for calculating the square root of x.
    Start with a guess, call it a.
    Calculate b = x/a.
    Take the average c = (a/b)/2.
    That will be closer than either a or b. Use c as your next guess for a and iterate. Keep going until a & b vary only by 1 in the LSB.
    The challenge was making a really really good guess for a that works for all numbers. I hit on the idea of dividing the exponent by 2 (shift right by 1) , and zeroing all but the most significant bit of the mantissa. For negative exponents you do the opposite - double the value of the exponent. This actually worked really well.!
    Here's a worked example. Square root of 10 (well actually 10.000000000000000000000000000000)
    start with 3
    10 / 3 = 3.33...
    3 + 3.33...= 6.33...
    divide by 2 = 3.166...
    In one iteration, you've got 2 decimal places.

  • @Alex-op2kc
    @Alex-op2kc 3 роки тому +6

    Here's an alternative definition: An assembly language is a set of mnemonics and other language elements defined by an assembler that let you write symbolic statements that map to hardware instructions.
    Under that definition, there can be multiple assembly languages per architecture. For example, there are multiple assemblers for x86: MASM, NASM, YASM, and fasm. And each define a different, although very similar, assembly language.

    • @robertobokarev439
      @robertobokarev439 8 місяців тому

      Nasm has the finest "classical" syntax, while all you wanna do looking at masm is to go back to C. Can't tell anything about fasm and yasm, don't have enough experience

  • @_mrgrak
    @_mrgrak 3 роки тому +1

    The best programming related content on youtube right now. Creel explains complex topics simply, truly a great teacher. Looking forward to the next video!

  • @stevem3432
    @stevem3432 3 роки тому

    I begun learning assembly at uni this semester and I actually enjoy it. Thanks for these videos.

  • @TerjeMathisen
    @TerjeMathisen 3 роки тому +4

    Congratulations Creel, you've managed to create a very informative set of videos on x86 asm, all stuff that I would have loved to have back in the days, starting in 1982 when I had to write interrupt drivers in hex. :-)
    PS. I went on to use asm on everything from video (DVD & BluRay) & audio codecs (ogg vorbis), crypto (AES competition), games (Quake) and I still write some really low-level code, usually using compiler intrinsics since Visual Studio doesn't allow inline asm anymore. :-(

  • @brannonharris4642
    @brannonharris4642 3 роки тому +3

    Reductive learning. Discovering what something is not is seemingly more potent than only pondering on what that thing is.
    Love this video!

  • @jeffm2787
    @jeffm2787 3 роки тому +5

    I was writing x86 before it was called x86. Did 6502, 6809, etc. as well. Stopped when the 486 came out.

  • @3Balala3
    @3Balala3 3 роки тому +7

    Great video, helps a lot understanding the assemly's place and purpose nowdays. Also great timing. Tomorrow I have an exam in assembly. We are programming on an emulated dos program. Really, really interesting... :D

  • @kevinz1991
    @kevinz1991 3 роки тому

    great information and great delivery. thanks a lot for the time you put into this. subscribed

  • @hell0kitje
    @hell0kitje 3 роки тому +15

    Glad to see you back, mate :) I started with your c++vids and now im discoveri g asm, keep posting more!

  • @johnyoungquist6540
    @johnyoungquist6540 3 роки тому +63

    Talking about assembly in general across different processors is fraught with trouble. I do embedded apps in 8051 assembly only. In fact I wrote the assembler. I can promise that C in the 8051 environment is at least 500% slower and also 500% bigger than assembly even for simple things that C should be good at. It is widely accepted that compilers use a tiny fraction of the instructions set and leave a lot behind. It is easy to point out that ordinary languages contain no information to help compilers use special instructions or constructs. The assembly programmer will recognize an AES algorithm and use the AES instructions a C compiler won't. In modern processors the compiler code generator could hold a significant advantage over the programmer with a detailed knowledge of architecture magic like pipelines, cores, caches, threads. I don't know they handle the moving target of the new processor of the week or tell what processor they will run on. One processors optimization is another's down fall. In contrast the assembly programmer wizard may better the C code speed by 100 times or more with devilish clever thinking and detailed knowledge of the whole instruction set.
    One thing that is universally overlooked is how assembly and high level applications are similar. Apps are typically constructed of functions tailored to do common things for that app. If you need 98 digits precision you'll be writing routines to handle that in any language. These modules are easy to define and test and spread among several programmers. We build bricks first then walls later. A function call is about the same complexity and work to implement in any language. Now all of a sudden apps in all languages are basically function calls and logically look about the same. Neither is more difficult than the other. The planning stage and logic can be nearly identical for any language.

    • @donjindra
      @donjindra 3 роки тому +5

      Exactly. People who don't regularly program in assembler have no idea how much faster assembler is than any high level language. Compiler optimization cannot compete with a programmer who knows the instruction set intimately and can tailor the use of those instructions for a particular task. A 10x improvement in speed is pretty normal. OTOH, a poor programmer is not going to benefit much from assembler code. You have to know what you're doing. The 8051 is a good example. That cpu is so weird a compiler can't deal with it efficiently. A compiler does better with something like ARM.

    • @SimonBuchanNz
      @SimonBuchanNz 3 роки тому +14

      @@donjindra complier optimisation can definitely best any reasonable amount of effort for the majority of code, assuming you're not using the trivial C implementations that come with microcontrollers - inlining and avoiding pipeline stalls is drudge work that's better to let the computer handle, especially when your problem is getting something working or cleaning up a mess, not making something faster. Not always, there's always going to be some cases that confuse a compiler enough that it's easier for you to use assembly than to figure out how to mangle your code so the compiler does the right thing, but advanced instructions are available through intrinsics, and compilers will auto vectorize loops, and so on. The low hanging fruit is getting picked all the time.

    • @donjindra
      @donjindra 3 роки тому +1

      @@SimonBuchanNz I don't know why you think that. In fact, I don't even know what sort of code you have in mind. I don't advocate using assembler to add two register-width numbers.

    • @SimonBuchanNz
      @SimonBuchanNz 3 роки тому +3

      @@donjindra sorry, could you clarify what I said that you have an issue with? I was taking about your statement that "a compiler can never compete with [an assembly] programmer": trivially true in that said assembly programmer could at worst use the same instructions, but not practically true. Not sure where you're getting adding numbers from, but if that's literally all you're doing, then actually yeah, you probably will beat a compiler. It's the 50kloc of "adding two numbers" that's not worth the absurd effort to keep optimized in assembly, and mixing and matching can (depending on your baseline) actually pessimize the code since the compiler can't inline now.

    • @donjindra
      @donjindra 3 роки тому +1

      @@SimonBuchanNz Concerning adding numbers I said the opposite of what you think I said. If the task is simple, such as adding two numbers, the compiler does just fine. There's no point in resorting to assembler. It's the complicated, time consuming tasks that benefit from assembler. Compiler optimization was done by assembly language programmers. But they optimize general cases. They aren't magicians. They can't predict all particular cases. Therefore they cannot optimize for all of them. I have no idea what you mean by the end of your comment.

  • @Guztav1337
    @Guztav1337 3 роки тому +26

    You should get more cushions/backdrop in the room, there is a bit of echo in the background.

    • @mrdouble
      @mrdouble 3 роки тому

      Was thinking the same, looks like an expensive mic though :/

    • @swharden
      @swharden 2 роки тому +1

      The condenser microphone is "too nice". It's picking-up every little echo in the room. A dynamic microphone or a basic gaming headset (microphone closer to the mouth) could be better options for this space.
      Edit: audio is good in later videos

  • @alberto3028
    @alberto3028 3 роки тому +45

    ASM is perfect for bootloaders and some parts of OS

    • @WhatsACreel
      @WhatsACreel  3 роки тому +14

      It is indeed! UEFI changed the necessity a little, but certainly low level OS code is one of the most important use cases for ASM! Cheers for watching mate :)

    • @lewiscole5193
      @lewiscole5193 3 роки тому +5

      Assembly language gives complete control of the hardware to the programmer in a way that no HLL can, in no small part because assembly language is processor architecture specific, while an HLL is supposed to be processor architecture independent.
      So, it's not that "ASM is perfect for bootloaders and some parts of OS", it's that there is no other way to get there from here using an HLL.

    • @WhatsACreel
      @WhatsACreel  3 роки тому +3

      @ozan o. I would love to :) Judging by the recent reviews of Apple’s new M1, I think maybe ARM will give x86 a very good shake very soon! We might be witnessing the beginnings of the fall of x86 in the laptop and desktop markets...? Unbelievable!
      Not sure when I can cover these things, but they’re certainly on my to-do list. Thanks for the suggestions, and cheers for watching :)

    • @lewiscole5193
      @lewiscole5193 3 роки тому

      @ozan o.
      OSs have to change over time to meet new hardware and/or user demands, or else they die off.
      Unix is no different and has evolved over time to be different than what it originally started out as.
      So in a very real sense, I suspect that Tony Hoare's famous saying, “I don't know what the language of the year 2000 will look like, but I know it will be called Fortran,” has applicability to OSs with "Linux"/"Unix" being substituted for "Fortran".
      And keep in mind that there already environments where "Linux"/"Unix" is not king ... real time environments such as can be found in cars where QNX, a proprietary message passing microkernel based OS (which can run on ARM based systems by the way), is already more common.
      Yet, thanks to the Posix standard and the QNX's people's interest in it, how, QNX offers a similar interface ("abstraction") to application programs so that their developers feel warm and fuzzy about it.
      I suspect the same thing will likewise happen with any OS that depends on C, including Fuschia.

    • @lewiscole5193
      @lewiscole5193 3 роки тому +2

      @ozan o.
      > As you know, processes never really
      > pause in posix, I don't know if it
      > was due to hardware restriction or
      > design error during constructing
      > of unix back then.
      I don't know what you mean by "processes never really pause in posix".
      Posix is an interface standard for OSs that just happens to look like the interface that Unix/Linux typically used to present.
      It's not an OS itself.
      An OS can be something other than Unix/Linux entirely under the hood and yet present a Posix compliant interface as is the case with QNX which is a proprietary message passing microkernel based OS that is Posix compliant as I indicated before.
      To the extent that Posix was supposed to look like Unix/Linux to the outside world (programmer), various interface calls such as a file read or write do block (pause) because that's what they in Unix/Linux historically did in The Good Old Days.
      That doesn't mean that an OS can't present natively use non-blocking interfaces internally which are look like they are blocking to the user.
      > there is also root privilege problem.
      Again, I don't know what you mean since Posix isn't an OS.
      > Plus Android turn into giant layers of burger.
      > I guess that's why google wanna leave Android.
      Android *IS* Linux by another name. Really.
      > if any other new os becomes complicated
      > and consist of many layers in the future,
      > it will be loop then they will be wandering
      > new solutions in the future:).
      Again, OSs change over time or they die.
      To the extent that everyone thinks that what they want done is the way thing should be, OS developers are likely to toss in lots of crap to satisfy different users.
      If you want a lean, mean OS for your specific machine(s)/application(s), feel free to write one yourself ... and spend forever doing it.

  • @VTdarkangel
    @VTdarkangel Рік тому +1

    I had to do some SPARC assembly programming when I was in school. The real advantage of it was when we had to do hardware interfaces. Those functions could have been done in C, but when I broke the object files down, I found out that the compiler was inserting a bunch extra commands that were completely unnecessary such as settings in the master register for settings that weren't being used. By doing the interfaces in assembly, I could bypass all of that.

  • @sergiomarroquinjr3587
    @sergiomarroquinjr3587 2 роки тому

    I always seem to learn something new from you. Keep it up!

  • @mikefochtman7164
    @mikefochtman7164 3 роки тому

    Good information. When we had some ASM instruction dependencies, we sometimes would look down a few lines and see if we could move some other instruction in between the dependent instructions. That meant we could space out the two dependent instructions to let the first one finish and give another ALU something to do while the first one crunched.
    Also worked on a different processor that had a special increment. Used in the OS interrupt handling, it had a couple of instructions that were non-interruptable so we could guarantee that the increment and sto would be atomic.

  • @controlflow89
    @controlflow89 3 роки тому

    Absolutely amazing channel, keep up the great work!

  • @herrbonk3635
    @herrbonk3635 3 роки тому +10

    2:34 _"That one clockcycle is called the latency"_ Not really, that one cycle is called _throughput_ in these contexts. The latency *for simple instructions* (like ALU reg,reg/im) usually equals the number of pipeline stages. In a simple pipelined CPU, that would be: fetch+decode+calculate+write result, i.e. 4 stages and so 4 clock cycles. For the 486, that was five stages and five cycles, for the P4 it was around 20 stages and cycles, and so on (again for simple instructions like ALU reg,reg/im).

    • @laurelsporter4569
      @laurelsporter4569 2 роки тому

      But, calculate can be repeated as nauseum, and as long as that can go on, write can be hidden. The full pipeline isn't executed fully for each instruction, before the next one executes.

    • @herrbonk3635
      @herrbonk3635 2 роки тому

      @@laurelsporter4569 Yes, that's the basic idea with a "pipeline", i.e. having all the stages of the instruction execution fully overlappning, so that (different stages of) several instructions in a sequence can be processed at the same time.
      (Typically instruction fetch -> decode -> effective address calculation -> operand fetch -> ALU -> write-back.)

    • @TellowKrinkle
      @TellowKrinkle Рік тому

      Don't know how people talked about the 486, but on modern processors, when people talk about latency, they mean the number of cycles from when the register value is first needed to when it's available to the subsequent instruction. If your CPU has forwarding circuitry (like every modern processor), that's only the number of calculation stages.
      For the example of an `inc rax`, if you had four of those in a row, the cpu would fetch all four in parallel, decode them all in parallel, and calculate them serially, with each one forwarding its result to the next without waiting for writeback. In the end, four (dependent) `inc rax`s would run in four consecutive clock cycles, which is why `inc` is considered to have a latency of just 1 cycle, not 20 or however many a modern processor's pipeline has. The throughput of inc is not 1 but 1/4 for a skylake processor, meaning that the processor can execute four non-dependent inc's in one clock cycle.

  • @LukeAvedon
    @LukeAvedon 3 роки тому

    Wonderful video! Glad you are back.

  • @spacewolfjr
    @spacewolfjr 3 роки тому +2

    The legend returns! Thanks Mr. Creel.. man..

  • @coder2k
    @coder2k 3 роки тому +1

    Looking forward to seeing that next video you already teased :)

  • @trashtrashisfree
    @trashtrashisfree 8 місяців тому

    I always wrote a good macro library for the assembly I was working in. System 360/370 didn't even have stacks so my first priority was writing things to push and pull values and create subroutines. Everyone else was hand-cutting every single line. Far more error free. Same for other issues in 6502.

  • @theDemong0d
    @theDemong0d 3 роки тому +3

    In my experience writing assembly (mostly to capitalize on AVX), yes the function call overhead is a huge performance hit, but you need to write your program in assembly anyways because when you switch to AVX intrinsics, you need to know what assembly you want the intrinsics to produce. Writing the function first in assembly makes it easy to translate into AVX intrinsics, and the intrinsics should allow you to write C++ that compiles almost exactly instruction-for-instruction identical to your handwritten assembly. Yeah, it's not quite as cool as your program running your handwritten x86, but it's the next best thing and with the call overhead eliminated, you can reap large performance boosts.

  • @michaelbuerge
    @michaelbuerge 3 роки тому

    Great stuff. Interesting and relevant info. Thanks.
    Allow me a remark about audio: You invested in a nice mic. Now you might want to think about the room you're recording in. Maybe put something absorbing in place to reduce room reverberation.

  • @sikkavilla3996
    @sikkavilla3996 3 роки тому

    Happy Holidays @Creel!

  • @malusmundus-9605
    @malusmundus-9605 9 місяців тому

    I love this channel

  • @steveokinevo
    @steveokinevo 3 роки тому

    Another beaut of a video chris man, thanks again pal.

  • @danepane527
    @danepane527 2 роки тому

    The algo sent me here.. was watching a bunch of Coach McGuirk videos.. subbed!

  • @PaulaBean
    @PaulaBean 9 місяців тому

    When the rubber hits the road, you can always benchmark the speeds of your C++ code against assembly code. Measurement trumps speculation. Thanks for the nice video!

  • @BlackStarEOP
    @BlackStarEOP 2 роки тому +1

    8:10 "Race conditions are brilliant" :D (y) Thumbs up for that... Tracking down race conditions has been the most difficult part of my career as a software engineer.
    If you implement something using more than 1 thread, if you carefully think things through, there's not much you can do wrong. However... when suddenly one guy in your team says "yes I know how to improve the performance, just put this and this into its own thread" then you know you need to buckle up. You're in for one hell of a ride...

  • @programaths
    @programaths 3 роки тому +1

    First year in school: Compute the volume of a cone...in assembly!
    Most student were blocked on the division!!! That's when the learn overflow AND underflow.
    I do not remember the in and out, but the division gives you a good ride if you didn't pay attention to the curriculum.
    Then that's when you are doing your work that you realize that registers can be split in different way, that there is a flag register too.
    At that time (15 years ago), there was "help PC" with nice explanations of all of this...
    Another difficulty of assembly is that it's "verbose". In higher language, "if" is identified as is. In assembly CMP+JNE,JEQ,JZ,JNZ,JNP.
    And even conditions with conjunctive or disjunctive becomes challenging.
    Another nicety was using the stack for local variables instead of trying to guess which register is safe to use ^^
    It's a bit cloudy, because it's far away now. But that wasn't that easy! It's a gymnastic on its own!
    But overall, whatever is the language, programming is really complicated.
    It's all about solving problems and expressing the solution as code...And most of the time, the problem to be solved is also to be found!

    • @WhatsACreel
      @WhatsACreel  3 роки тому +1

      So true! Cheers for watching :)

  • @RufianEmbozado
    @RufianEmbozado 8 місяців тому

    Assembly will always retain two strong points. First, when you learn to code in assembly you go through a rush of "illuminations" (I'm always thinking on 8 bit platforms because they are simple enough to have a grasp on all the landscape, and because I'm that old. Nothing is yet done, you push and pull all those pesky bits all over the place "by hand", a blazingly fast hand) that put a lot of pieces of the information science puzzle rigth into place. Second, there is an inherent beauty in assebly code. Motorola 68000 had a beatiful , beautiful assembler (I crashed on it with an Amiga 500 and, man, what a joy it was! All those fancy chips at your command... Most missed piece of hardware ever). I never got that feeling when I tried to code assembly on i386. I still think learning to write assembly for any CPU is worth the price. No need to do great things, just some humble tasks. You'll have the ride of your life (as a nerd, at least) and wont fall for those kind of misconceptions. Great video, of course. Assembly has the virtue to dispell all sorts of misconceptions. But assembly itself is covered by some key misconceptions which keep it from teaching all it can.

  • @wrtlpfmpf
    @wrtlpfmpf 2 роки тому

    One thing doing a project on a small assembler can really help is with coding style. I used to write multiple screen long functions with control structured nested several levels deep. Writing in assembler can really teach you how to write code that is as simple as possible, yet correct. I once did that for a little project on an ATMega. Those are cute little 8-Bit micro controllers. Since they have different addresses for RAM and Flash, programming them in assembler is a lot less painful than, for example, C. Anyhow that project really helped me write readable code when I later did C projects. I later played around with those microcontrollers in C and looking at the assembly created by the compiler I have to say that it's highly dense.
    (The rationale behind assembler was that I had more experience with AVR assembler and that that code would use the remaining flash program storage as data storage, something that is even harder to do in C)

  • @Alex-op2kc
    @Alex-op2kc 3 роки тому

    Creel's back on his cubemaps!

  • @kylegivler8372
    @kylegivler8372 3 роки тому

    Thanks for sharing this 😁

  • @rfvtgbzhn
    @rfvtgbzhn 8 місяців тому

    From what I heated, you can get a significant performance boost in some cases by disassembling the compiled code and rewriting parts in Assembly language.

  • @roax206
    @roax206 Рік тому +1

    Though from my understanding, assembly is mostly just machine code but replacing the binary instruction IDs with short nicknames for the instruction.
    Technically any compiled "higher level" language will be converted into assembly at one point (unless the person who wrote the compiler is a masochist and memorized all the instruction ID numbers). The main point when assembly becomes quicker then simply relies on whether the problem is easier to express in assembly language rather than the HLL used and to what level you are willing to manually optimize the assembly code.

  • @gregorymifsud5389
    @gregorymifsud5389 3 роки тому

    Great content mate love it

  • @overcritical304
    @overcritical304 3 роки тому

    Love your videos man. Learned so much from this channel. Have you thought about ARM. Will love to learn that too

    • @WhatsACreel
      @WhatsACreel  3 роки тому

      I would love to do some ARM! Hopefully I can record soon, though I am unsure at the moment exactly when. Thank you for watching, and thank you for the suggestion :)

  • @thomasmaughan4798
    @thomasmaughan4798 2 роки тому

    There was a time when assembly was much faster than compiled but eventually the compiler optimizations produced code that executed efficiently. Depending on what one is doing, assembly is considerably smaller. A function in COBOL to parse a text file was 30 kilo-words and took 30 seconds to execute; I re-wrote it in assembly and it produced an executable that was only 3 kilo-words and parsed the same file in 3 seconds. 1/10th the size and ten times faster! But that extreme example is a result partly of COBOL not really a good choice for that sort of thing and my re-write also used static linking; everything it needed was already linked in the executable so at run time, no "fixups" were needed.

  • @microdocker
    @microdocker Рік тому

    Very good and explanatory shot.
    One small weired thing (not related to the topic) is, guy is literally sitting in front of a mic and still recording his voice on oncamera microphone ^_^

  • @lgrantcdg
    @lgrantcdg 3 роки тому +3

    IBM’s DB2 database for the IBM mainframe (MVS) is written in a proprietary PLI-like language. A few years ago, they increased its speed by 20 percent, just by improving the code that the compiler emitted. Computer architectures are constantly evolving, as newer and fancier instructions are added. Even if you are the world’s best assembly programmer, and know every instruction inside and out, there is no way you can update a large assembly-language code base to take advantage of each improvement in the architecture.

    • @OpenGL4ever
      @OpenGL4ever 9 місяців тому

      Fortunately, C has a preprocessor for such cases. It allows you to write all code in C, optimize where necessary for one or more CPU architectures in their specific assembly language and then use the C code as a fallback. And if you then have a much better C compiler. All that is needed after that is just a recompilation with the improved compiler using only the C code. Then you can see where the C compiler optimizes better. And where it's still worse, you compile the assembler routines back in.

  • @davidliverman4742
    @davidliverman4742 2 роки тому

    Thanks dude!!

  • @EvilSandwich
    @EvilSandwich 3 роки тому +4

    I like to program for old systems like the Apple II and the NES, so I code a lot in 6502 ASM. Believe me, you start to miss high level after a while.
    You guys ever try Hello World when you have to explain to the computer how to read and print strings before it can even do that? Heck, the NES doesn't even have ANY internal ROM, so you have to draw the letters manually before you can even start on strings. lol

  • @MistWing
    @MistWing 8 місяців тому +1

    The first computer language I learned was back in the 80's (back when the myths weren't myths :) ) and was assembly on the Z80. Back then, we had simple instructions like "LD reg1,reg2". Nice and simple. Now we have things like "AESKEYGENASSIST xmm1,xmm2/m128,imm8".
    And instructions were only 1 or 2 bytes long. Now, instructions can be up to 15 bytes long.
    My how times have changed :)

    • @briancampbell179
      @briancampbell179 8 місяців тому

      I started a few years before that on a Motorola 6800 D2 kit, then my own 6502 based SYM-1. Yes, assembly language was a lot simpler assuming you had the luxury of access to an assembler. I recall hand assembling programs and entering the code byte by byte.
      It wasn't a choice between a compiled language and assembly language, it was a choice between assembly language and raw object code.
      The key difficulty with assembly language is the sheer number of lines needed to do the same as a couple of lines of a high level language.

  • @CallousCoder
    @CallousCoder 8 місяців тому +1

    ARM 64 cpus actually have a couple of assembly dialects. You have your AARCH64 but also your Thumb instructions, which are a small instruction to save space.

  • @PvblivsAelivs
    @PvblivsAelivs 3 роки тому

    I have seen many people say that compilers do these wonderful tricks and that hand-coded assembly language is not (generally) faster than a compiler's output. While there may be some compilers that do this, no compiler I have actually used does so.
    "You might get the right result."
    Especially if you use the lovely little LOCK. Any processor that can feasibly be part of a multi-processor system needs a way of executing al least certain instructions without interference from other processors.
    "The CPU will perform the instruction a lot slower."
    It will if two processor units are trying to access the same memory at the same time. After all, one must stall. But the processor that "gets there first" has a negligible performance penalty. It was a two-cycle penalty on the 8086. (I only have timing information up to the 486.)

  • @vikassm
    @vikassm 3 роки тому +1

    Fantastic video and channel! Subbed.
    My 2¢ about the poor audio: Use your mobile phone with a ~5$ lapel mic to capture your "B-Roll" audio 🙂
    That way if your nice desktoo mic doesn't record for some reason, the backup audio from your cellphone is still wayyyyy better than the absolute garbage camera mic.
    Just clap once (Aaand ACTION) at the beginning and the end of each take to simplify A/V sync during editing.

  • @y2ksw1
    @y2ksw1 3 роки тому +8

    I have been programming for a vast time of my life in Assembly, and the most challenging tasks were to write code in a way, to run in parallel in the separate pipelines (super scalar). The example you have given, would have been rewritten, eventually longer, in order to get the parallel mechanism working. One way would be:
    mov ebx, eax
    inc eax
    nop
    inc ebx
    So the first two run together, and the resting again. And we would gain at least 2 clock cycles.
    However: assembly made a lot of sense in the old days. Now, with multi-core multi-scalar processors and the brilliant optimisation of compilers, Assembly code died pretty much out.
    I still use it on special hardware though. I am eyeballing the Raspberry Pi Pico, for example 😊

    • @OpenGL4ever
      @OpenGL4ever 9 місяців тому

      inc eax
      mov ebx, eax
      Does the same job as your code and requires less RAM.

    • @y2ksw1
      @y2ksw1 9 місяців тому

      @@OpenGL4ever It's not a question of memory, but to get part of this code running in a different pipeline and thus double up the speed.

    • @y2ksw1
      @y2ksw1 9 місяців тому

      Your code would run 4 times slower

    • @OpenGL4ever
      @OpenGL4ever 9 місяців тому

      @@y2ksw1 Why should it? In my opinion it runs at the same speed.
      Your code might do
      mov ebx, eax
      inc eax
      in its own pipeline, but
      nop ; does nothing
      and
      inc ebx
      depends on the mov ebx, eax before.

    • @y2ksw1
      @y2ksw1 9 місяців тому

      @@OpenGL4ever If you do first an operation on eax, and then use it to assign its value to another register, it stalls and waits to settle just that tiny bit which doesn't allow to move the code to the other pipeline. I have been timing these instructions very accurately and your assumption, while are technically correct, perform way less efficient. On time critical applications, such as real time graphics manipulation I was working for, the code alignment and sometimes illogical reordering of instructions, made the difference of fluent or staggering graphics.
      I got mainly the filter and render code prepared by graphics specialists and my task was it to speed it up. But also big number mathematics and operating system libraries. Most of them grew noticeable in size, but were of unmatched speed.

  • @AngDavies
    @AngDavies 3 роки тому +1

    Minor nit/clarification: while you definitely need to know assembly on a deep level to be able to code an optimising compiler- after all, it's a program that turns code in a given language into as efficient/fast machine code representation as possible.
    That doesn't mean you necessarily should write one in assembly itself- it wouldn't make faster code, only code, faster.
    The better option is often to write the compiler in the language that you intend to compile with.
    You spend loads of time writing a compiler that can create really optimised code for a given platform, build it using some existing compiler, which doesn't make very optimised code, and so the compiled compiler takes ages to compile code.
    But now you've just created a program that turns your code in your language into optimised machine code, so just feed the original code through the new compiler, and you now have an optimised optimizing compiler :D
    Having just "GCC" that compiles to your machine is so much better than having to find a version of GCC tailored to your exact platform

  • @NomenNescio99
    @NomenNescio99 3 роки тому

    A long time ago in a galaxy far far away, before the time when gcc used the mmx instruction set to optimize vector arithmetic there was sometimes huuuge gains to be had from inlining some assembly code.

  • @BrightBlueJim
    @BrightBlueJim 3 роки тому +1

    So to summarize a couple of things you said:
    1) Functions written in assembly don't really run faster than compiled functions.
    6) Assembly is still necessary for low-level optimization, where speed is really important.
    Also, your point on atomic operations applies just as directly to C and C++, or indeed for ANY program written to take advantage of multi-threading.

  • @dcocz3908
    @dcocz3908 3 роки тому

    I agree but there are lots of situations where the compiler simply fails for example gnuarm won't use multiple load and store properly which for me generated a lot larger code that wouldn't fit in SRAM so it had to run with wait states from flash on my project. By re-writing it in hand assembly allowed me to get a much smaller function, allowing it to be moved into SRAM with the data that was required by application and that is where I got a really large speed improvement. I couldn't have done it without swapping micro for larger memory footprint using just compiler

  • @gFamWeb
    @gFamWeb 3 роки тому +11

    I've always pictured the talk about clock speed as analogous to how fast a cars tire can spin. Sure it can spin very fast, but if you don't have that good of traction on the tires, it's not going to help much. Same with throughout and clock speed.

    • @okaro6595
      @okaro6595 3 роки тому +1

      IMO the engine RPM is better.

  • @tchiwam
    @tchiwam 3 роки тому +1

    Would be fun to see a video on transforming locked multithread to lockless thread with a thread manager and completely lock less multithread manager.

  • @DigitalPhage
    @DigitalPhage 3 роки тому +29

    "x86 Assembly Language Misconceptions" would be a more apt title, however a good video.

    • @TheBypasser
      @TheBypasser 3 роки тому

      Oh yeah, say Arduino compared to pure AVRASM is like a snail vs a ballistic missile (just like for the most of the RISC cores, HLL vs ASM that is).

    • @niclash
      @niclash 3 роки тому

      Misconception; x64 Instruction Set is a typical one. The micro controllers are typically magnitudes easier to learn fully. And then there are the funky/academic outliers, like 1 OpCode Instruction Set. But the majority of Assembly Languages out there are dozens, maybe 100 and a bit, and not the thousands in the Intel/AMD world.

  • @thadtheman3751
    @thadtheman3751 2 роки тому

    Actually part of the complexity of assembler comes from the fact that "decorations" of instructions are not uniform. To clarify I will make up an example (it's been a while so don't expect this to be a real world example ).
    You might have INC A,N.
    increase A by N.
    A might be a memory location and N a number (direct addressing)
    INC $A, N
    A might a memory location pointed to by a memory location (indirect addressing)
    INC [$A],N
    N might be a memory location
    INC A,$N
    ...
    THe thing is that some comands accept some of these addressing modes and other do not. A JMP forexample might exceprt all addressing modes, abut a JSR would not. So it get complicated keeping track of which instruction does what.

  • @oisnowy5368
    @oisnowy5368 3 роки тому

    G'day mate! I'd wonder if at one time (for example the first of next month) you could tell us everything about the BEER-instruction.

  • @k7iq
    @k7iq 3 роки тому +5

    I program ARM in C lately... I find that being able to view the ASM output helps to reduce my C code operation. For instance, recently looked at a particular IF statement that I suspected might not work the best that it could and found that defining one of the variables as local register int32_t it reduced the time of that bit of code by two and the size was a bit less two.
    Also, needed to create an ASM function, a float to int function because the compiler did not output the FPU instruction for the rounded version of that instruction.
    ASM has it's uses but mainly, for me I think, in debugging C code.

  • @DukeDudeston
    @DukeDudeston 2 роки тому +2

    "You can do a lot of stupid things in any language"
    I was able to delete ntfs.sys in a language called "DarkBASIC" when I first started out. So yes. You can do a lot of stupid things in languages.

  • @RT55J
    @RT55J 3 роки тому

    The effectiveness of unrolling loops as a performance optimization can vary wildly depending on the caching situation. If your architecture has no cache to worry about, then it would give a definite performance boost. However, if you have an instruction cache to worry about, then (depending on the size of the unrolled loop vs the cache) you might suffer a performance decrease from the extra instruction fetching from RAM.

  • @WolfCoder
    @WolfCoder 3 роки тому +3

    The only time I've written assembly was for the 6502 (because its fun), the Z80 clone in the Gameboy (because its fun and the only compiler I found was terrible and couldn't handle ROM paging well, etc.) and the ARM7 DTMI in the GBA where, while there's a port of gcc for it, you still have to write assembly for heavy duty subroutines like interrupts, audio engines, etc. as the compiler optimizations don't seem to work as well in the gcc port. For x86-64 though? Uh.. I think I'll let the compiler have the 'fun' when it comes to that.

  • @FORTRAN4ever
    @FORTRAN4ever 3 роки тому

    I programmed in assembly on a Sperry Univac 1143 mainframe computer in the early 1980's. Each instruction consisted of a 36 bit word. Commenting was a must. I would prefer to program in FORTRAN or COBOL anyday over assembly.

  • @GogiRegion
    @GogiRegion 2 роки тому +1

    I’ve actually looked into virus programming, and commonly out of curiosity, and it looks like good hackers will use C and then compile to assembly for optimization, then assemble it. That’s assuming that you need high level functions in order to do what you need, you want it to take up as little space as possible so it’s harder to detect, and possibly want to remove null bytes (which is supposed to allow your code to work with a wider array of hacks since some rely on a lack of null bytes). It’s actually an interesting topic, and from what I was reading, it sounds like C is preferred over assembly for the same reason Linux is shown in primarily C.

  • @erwinmulder1338
    @erwinmulder1338 2 роки тому

    I grew up programming home computers in the 1980s. You had to write assembly (and sometimes even translate it to number by hand) to make anything that would run faster than at a snail's pace. I mean 8 bit computers at 3.5HMz are not incredibly fast at anything. So if you had BASIC, which was interpreted (not even compiled) that was SUPER slow. You couldn't even draw an entire screen in one second most of the time. These days, I mostly work with assembly in writing (toy) compilers for my own programming languages. In the end, what any compiler really does is basically translate the source code to assembler instructions.

  • @xeridea
    @xeridea 2 роки тому +1

    Older compilers were known for being slow, and assembly was often used, especially in early consoles. Modern compilers are highly optimized. Besides all the basic stuff, they have all sorts of tricks for optimizing multiply, divide, and what instructions to use, even specific to CPUs if you want. Sometimes CPUs have weird quirks that compiler developers can take advantage of, or at least avoid penalties. Optimizing multiply and divide goes beyond obvious stuff, like bitshifts for powers of 2, they have all sorts of tables for methods for various numbers. Often they can even convert loops into SIMD instructions automatically. If not, doing SIMD completely manually is very tedious, there are methods available in some lower level languages to make it a bit easier.
    Some things can still be hand optimized, but requires very in depth knowledge of CPUs, and even then, may not even be faster. For most purposes, not worth it, though some low resource embedded systems, some drivers, and some other niche cases benifit.

  • @gideonz74b
    @gideonz74b Рік тому +1

    @Creel: Executing an instruction in one cycle does *not* mean that the *latency* is one cycle. It means that the *throughput* is one instruction per cycle. The latency is always a lot more than that, because it has to pass through the pipeline.

  • @msoulforged
    @msoulforged 3 роки тому

    Great video!

  • @emjizone
    @emjizone 9 місяців тому +1

    3:53 This "one instruction per cycle" might be true for the oldest machines, with no clever vectors and lookups and with a very limited set of instructions. This might explain why people believe it to be still true today.
    In that case you'd have to program most of usual math functions yourself (modulo, square root, etc…) and they would take several cycles anyways.

  • @michaelmoorrees3585
    @michaelmoorrees3585 3 роки тому

    Still write assembly code for microcontrollers, such as the AVR and 8051 lines. Those are bit painful. Writing assembly on the old Motorola HC line was beautiful, in comparison. I have a hierarchy of pain. If the final binary is less than 4K, its gonna be assembly. If 16K or larger, it will mostly be high level (ie C), but some critical areas will still be in assembly.
    Optimized compilers are similar to autorouters, when laying out a PCB. They will screw things up. Often you have to go into the trenches, and do some manual labor.

  • @connclark2154
    @connclark2154 3 роки тому +1

    I think one thing that wasn't mentioned was assembly allows you flexibility that higher level languages do not. With this flexibility you can implement more efficient algorithms. For example in between assembly routines you can return more than one value from a function by using a custom calling convention. Its the ability to leverage the freedoms that gives assembly its power and performance.

    • @bigshrekhorner
      @bigshrekhorner 8 місяців тому

      That's not something exclusive to Assembly.
      C is able to do this by using pointers as function arguments. Even higher level languages are also able to do this by using tuples that mix types (or simply the same type), or with methods similarly to C, if they allow memory management concepts like pointers.
      Compilers and compiler engineers are extremely smart and definitely way smarter than me or you. That means that if you have thought of an efficient implementation of an algorithm in Assembly, it's also pretty likely the compiler engineers have also thought of it and implemented it. At least if we are talking about mainstream compilers, like GCC or Clang (for the case of C/C++)

  • @wingman2tuc
    @wingman2tuc 2 роки тому

    Modern CPU are also "deep" pipelines. Fetch -> decode -> exec ->mem access-> rightback.As a very simple example.
    Todays CPU can have 20 to 40 steps for completeing a single instruction.
    Things can be pipelined but you need a very inteligent a complicated forwarding unit and branch predictor in order to take advantage of pipelines.
    Understanding modern cpu architecture is a must in order to use ASM eficiently. Also ASM can be cpu spesific so it may not work in other cpus.

  • @Cubinator73
    @Cubinator73 3 роки тому +8

    15:49 I think you got something wrong there. Obviously, assembly is needed in all sorts of things like programming compilers and optimizing low-level routines. The "misconception" that "assembly language is no longer needed due to optimizing compilers" expresses the fact that your average programmer doesn't need to write assembly himself because far more competent people already did it and made their optimized routines available in the optimizing compiler. I myself only ever used assembly to explore how CPUs work and how compilers optimize stuff, but I never NEEDED to write my own assembly code for my own projects.

    • @lewiscole5193
      @lewiscole5193 3 роки тому +2

      That's nice ... OTOH being a former OS maintainer/developer, I used assembly a lot, not just because most of the OS was also written in assembly (which it was), but because it gave me control over data/code placement that no available compiler did/could, which was especially important in the bootstrap code I was responsible for the care and feeding there of.
      And I suspect that's still true ... the hardware defines and uses data structures that I don't want/need a compiler guessing what sort of code should be generated for.

    • @WhatsACreel
      @WhatsACreel  3 роки тому +3

      Yes, I do wish that the proper position of ASM was expressed more clearly in computer science education. I was taught to fear the language during my degree, encouraged to neglect it entirely. Maybe it’s different in other institutions?
      I do not disagree entirely with the sentiment. But I do think it is skewed a little too far away from ASM. I think learning ASM for OS development or to understand the CPU are excellent applications!
      Cheers for watching and commenting folks :)

    • @lewiscole5193
      @lewiscole5193 3 роки тому +2

      @@WhatsACreel
      I have no idea how ASM is being taught in schools these days, but back when I was a student -- just after the dinosaurs had been killed off by an asteroid -- there was no question that any non-impaired human could outdo a compiler in terms of generating fast/small code.
      The reason why you were supposed to use an HLL was because it increased programmer productivity.
      Studies had supposedly been done that showed that the average number of DEBUGGED lines of code that could be produced per programmer per day was about TEN (10) independent of programming language.
      And because each HLL statement typically turned into more ASM line, that meant that if you could use an HLL, you should because you could potentially get more done using an HLL than you could ASM especially in terms of code that was supposedly "portable" across platforms.
      There were also supposedly studies that showed a wide variation in programmer output as well and so YMMV, but familiarity with a particular language also had a lot to do with programmer productivity (I don't recall how much).
      The gist of this is that I usually write in ASM because that's what I'm most familiar with, and because I'm no longer getting paid for what I write, it's my choice.
      I can speak C if I have to, but I don't consider myself fluent and I simply don't see the need to spend time becoming more fluent in C when I can do what I want probably (?) faster in ASM.
      What bothers me is that people who seem to shy away from away from using ASM seem to think that there's something fundamentally different in how you generated ASM code versus an HLL thrown at a compiler.
      To me, though, that's not the case.
      When I occasionally do write HLL code, I do the exact same thing that I do when I write ASM code, the only difference being how far "down" I "refine" the code before I come to a valid HLL or ASM statement.
      I just don't understand what it is that makes people think there's something special when it comes to how to write ASM code versus HLL code.
      It makes me think that maybe too much time is spent teaching the structure of various HLLs and not enough on how to think and solve problems.
      Just my opinion ....

    • @WhatsACreel
      @WhatsACreel  3 роки тому +2

      @@lewiscole5193 Ha! I know the feeling! I learned in the 90’s. Things have changed a lot since then. Especially Assembly language. It’s gone from maybe 100 instructions and 16 registers to massive SIMD register files and 3000 instructions!
      I certainly agree that programmer productivity and portability are very important. And the choice of language is a big part of that. Sometimes ASM is a good fit, and sometimes it is not. I do love how fast it can be, and how flexible. There’s some brain-melting, deep trickery that is natural to ASM, which is too low level to be practical in HLL’s. But for the most part, anything is pretty achievable in any language, and so it becomes a matter of choosing the best tool for the job.
      I couldn’t agree more! The problem with ASM is the perception of it. Folks shy away from it in a way that might not be warranted. It’s just a language, after all. IMHO, it’s a really fun and powerful language.
      I do love a good bit of HLL code too, but ASM will always hold a special place for me. If for nothing else, I made a video about ASM 10 years ago and put it up on UA-cam, and have since built this little channel :)

    • @lewiscole5193
      @lewiscole5193 3 роки тому

      @@WhatsACreel
      Ten years? My how time goes by when you're having "fun".

  • @den2k885
    @den2k885 2 роки тому +1

    Compilers optimize very well... for general purpose code, without knowing its data layout. It's very difficult that a compiler will use SIMD instructions and in the rare cases it does it won't make use of the inner characteristics of your problem, as it has no knowledge of them.
    Using Assembler I managed to douvke a linear Sobel algorithm performaces and triple a segmented integral table algorithm's performances. Not even Intel compiler managed to equal those times.

  • @amigalemming
    @amigalemming 8 місяців тому

    15:45 I am too lazy to plan register usage myself, thus I use LLVM to generate real assembly code for me. But I inspect the results regularly in order to find weaknesses in LLVM or my code.

  • @brorelien8447
    @brorelien8447 3 роки тому +23

    14:43 I partially disagree with you on this point. Some processor like the 6502 has a little instruction set which can be easily learn (only around 56 instructions). I know an 8 bit CPU can't really be compared with a modern x64, but some embedded CPU still uses these simpler 8 bit instruction set.
    Otherwise I like the video.

    • @y2ksw1
      @y2ksw1 3 роки тому +2

      Well, some 8 bit processors have a lot of instructions. Of course, if you group, then almost any processor has only a few:
      Add, subtract, multiply, divide, invert, move. That's about it.
      When I teach, I actually point out that most processors can only add and negate. They do it in a very efficient way though.

    • @NoNameAtAll2
      @NoNameAtAll2 2 роки тому

      risk v >_>

  • @furyzenblade3558
    @furyzenblade3558 3 роки тому

    Great Video!

  • @techkev140
    @techkev140 2 роки тому

    Just had this vid recommended to me, found it interesting. Got to say i think the misconception of assembler being faster during the 8-bit days often came from writing a program in interpreted BASIC vs compiled assembly language. It was simply faster than any interpreted language. Maybe that view has persisted.

  • @ug333
    @ug333 3 роки тому

    Great information, great knowledge
    Side note: what's up with the audio?

  • @sambrown9494
    @sambrown9494 3 роки тому

    Very interesting stuff, enjoying these videos. Hope you don't mind my asking - is that microphone actually turned on? It's a bit echoey like it's the camera microphone doing the recording across the room ..? Looking forward to more vids! Thx :)

    • @sambrown9494
      @sambrown9494 3 роки тому

      Ha umm sorry! I commented and only then read the description. Already covered. Just so you know I was paying attention! ;) Rock on ...

  • @tabletopjam4894
    @tabletopjam4894 3 роки тому +1

    Interesting about dep. scaling, how you might actually be capable to do more in a cycle by putting instructions slightly out of order
    I suck at ASM... don’t plan to program anything in that, but it’s still cool to see what my HLLs are doing under the hood

  • @Lantalia
    @Lantalia 2 роки тому +1

    So, with regards to #1 inline assembly skips the function call overhead, the main reason to do it is to do it is to use instructions not yet supported by your compiler

  • @ladyViviaen
    @ladyViviaen 3 роки тому

    hello creel, i really would love to know about lexers and parsers, interpreters etc. all online explanations goes right over my head or is really hard to understand, since you do a lot of low level stuff and are GREAT at explaining things i wanted to know if you already have videos explaining these please tell me and if not i would definitely love to see them if you have any plans on them, thanks for the great content too!

    • @WhatsACreel
      @WhatsACreel  3 роки тому +3

      This is a fantastic topic!
      I have not made much on topics like this, no. I did do a few videos on the Shunting Yard algorithm a long time ago, and that might be helpful? It’s just an algorithm for parsing mathematical expressions.
      As for parsers themselves, defining grammars and compilers, that kind of thing, I am sorry but I have nothing up at the moment. Though, it would be a dream come true to create a series on that topic! I’ve often thought about it, and sketched video ideas, but it has yet to materialise…
      Thanks for the suggestion, and cheers for watching :)

  • @jp5000able
    @jp5000able 2 роки тому

    Back in the early 80's I did some 6502 assembly programming. What made it so difficult, the cpu was only 8 bits. There were no instructions for 16 bit numbers and floating point numbers.