why can’t computers have thousands of cores?

Поділитися
Вставка
  • Опубліковано 6 жов 2024
  • If you're watching this video on any device made in the last 10 years, be it a desktop, a laptop, a tablet or a phone, then there is an extremely high chance that your device is powered by a multi-core processor. Since the release of the first dual core processor in 2005 by IBM, it has become more and more common for computer processors of all varieties to be multi-core. This is in direct contrast to laptops in the 2000's, like my iBook G4 for example, which was powered by a single core PowerPC processor at around 800MHz. Now a days, it is common for any desktop to have at least 4 cores, and clocked easily into the GHz.
    But what does it mean for a processor to have multiple cores? How does a processor with multiple cores work? Why are more cores better than just one? How many cores are too many? These are all really important questions, and, like you, I was curious to find the answer.
    🏫 COURSES 🏫
    Learn to code in C at lowlevel.academy
    🔥 SOCIALS 🔥
    Low Level Merch!: www.linktr.ee/...
    Follow me on Twitter: / lowlevellearni1
    Follow me on Twitch: / lowlevellearning
    Join me on Discord!: / discord

КОМЕНТАРІ • 1,3 тис.

  • @utubekullanicisi
    @utubekullanicisi 2 роки тому +1682

    Both Intel and AMD are rumored to release server processors (codenamed Sierra Forest, and Turin respectively) with more than 200 cores in the next few years (as soon as 2024). Servers will continue to scale well and make use of as many cores as you can give them.

    • @kayakMike1000
      @kayakMike1000 2 роки тому +128

      Aren't these are intended for data center where customers lease VMs or some other slice ? AMD has encrypted RAM....

    • @cassandrasibley228
      @cassandrasibley228 2 роки тому +173

      This video is about home and personal computers. Obviously industry hardware is gonna be a lot tankier

    • @kayakMike1000
      @kayakMike1000 2 роки тому +60

      @@cassandrasibley228 well, some of those xeon server processors end up in high-end workstations. I suspect higher end workstations might have take a mid road between core count and individual core performance.

    • @AnarexicSumo
      @AnarexicSumo 2 роки тому +99

      @@kayakMike1000 Can't I just enjoy learning about the extreme limit to the same tech without people going "Uhm ackshually that's not designed for consumers"

    • @littlemeg137
      @littlemeg137 2 роки тому +49

      Sun/Oracle had a 128 core SPARC64 chip over a decade ago. I've still got one of those servers in my basement.

  • @davidthacher1397
    @davidthacher1397 2 роки тому +379

    Technical Tradeoffs:
    1. Power - Power Consumption / Thermal
    2. Area - Core Size / Cache / Memory Bandwdith / IO interconnect / Yield
    3. Performance - Architecture / Instruction Set / Clock Speed / FPGA / Multicore / CMT / NUMA / SIMD
    Business Tradeoffs:
    1. Algorithm Capability / Application
    2. Market share / Manufacturing Size
    3. Time to Market / Training

    • @leosmi1
      @leosmi1 2 роки тому +3

      Thank you

    • @brolysmash9333
      @brolysmash9333 2 роки тому +2

      bro you’re the best. Thanks for sharing this up. I’m a network engineer and didn’t know nothing about that.

  • @larrydavis3645
    @larrydavis3645 2 роки тому +674

    As a former programmer, not all functions of a program can be run in parallel. Sometimes a function needs to wait for another process to finish before it can proceed.

    • @pwnmeisterage
      @pwnmeisterage 2 роки тому +88

      You just can't count or calculate the next number before you've finished calculating or counting the one before it.
      There is no logic, no clever math or algorithm or brute force which can speed up some simple processes. The complex things just have to wait until the simplex things get done.

    • @MrZnarffy
      @MrZnarffy 2 роки тому +13

      Thats if you iterate...But try using a functional language where even iteration is done by recursion....

    • @larrydavis3645
      @larrydavis3645 2 роки тому +4

      @@MrZnarffy Thank you for the feedback.

    • @duaanekobe2773
      @duaanekobe2773 2 роки тому +1

      It can be endless, 12 is the basic max, 24 = 2x all at very stable more. times infinity Now I seen 36 max and it gets bugs. a controller separating 24 and next 24 = 58, etc... So buffer and code (stable 24 time x), makes best code function. now is the actual program and controller(s), Think A,B,C and the programs, export

    • @larrydavis3645
      @larrydavis3645 2 роки тому +8

      @@duaanekobe2773 Thank you for the feedback. I did most of the programming on mainframe computers and the programs there were extremely linear in nature. We use subprograms for common functions meaning the main program was in a wait state waiting for the subprogram to complete.

  • @veleriphon
    @veleriphon 2 роки тому +856

    We see the cores to code limit already with Threadripper 64 core, 128 thread units. It's hilariously overpowered for most tasks.

    • @SandTurtle
      @SandTurtle 2 роки тому +155

      i feel bad for people who buy a threadripper then realize their favorite games either dont support multithreading, or only support 1 or 2 extra threads for main logic.

    • @giahuy8701
      @giahuy8701 2 роки тому +290

      @@SandTurtle of course, Threadripper is not for gaming

    • @saricubra2867
      @saricubra2867 2 роки тому +134

      @@giahuy8701 No way to blame sh*tty game code optimization on monster CPUs. There are games still struggling on CPUs with 4 cores due to bad optimization and very low CPU use.

    • @SandTurtle
      @SandTurtle 2 роки тому +12

      @@giahuy8701 ye ik but I've heard of people tryna buy them for gaming

    • @ed_iz_ed
      @ed_iz_ed 2 роки тому +13

      @@giahuy8701 games can EASILY make use of multiple threads

  • @badass6300
    @badass6300 2 роки тому +633

    Also, a big factor is that many programs have linear logic. Amdahl's law shows how well a task scales with multiple cores depending on how parallel it is. For 50% parallel tasks above 4 cores is pointless. For 75% parallel above 16 cores is pointless. You just don't gain performance and that is baked in the logic of the task. Many cores are great when doing multiple of the same task without caring which task is completed first.

    • @mewsermeow8683
      @mewsermeow8683 2 роки тому +93

      That and the fact that once you start getting into problems that are highly parallelizable, you'd just use a gpu anyway.

    • @badass6300
      @badass6300 2 роки тому +26

      @@mewsermeow8683 if the gpu has the instructions for it, but 99% of the time yes.

    • @hjups
      @hjups 2 роки тому +51

      @@badass6300 It's not about if the GPU has the instructions, it's largely about the type of problem too. GPUs for example, don't do well with highly divergent streams, but do well with highly uniform streams. Modern CPUs can often do much better with divergent streams due to their internal out-of-order nature, and throwing more CPUs at the problem has almost perfect scaling with Amdahl's law in such problems (very small sequential part - usually global book-keeping).

    • @badass6300
      @badass6300 2 роки тому +8

      ​@@hjups True, but GPU architecture is getting close to CPU architecture with the passing generations. AMD GPUs since RDNA1 have hardware schedulers and might get OoO Execution in the future.
      Then again with chiplets they might get a whole CPU to themselves for certain tasks.
      Or vice-versa, integrated GPUs might get good, or both.

    • @hjups
      @hjups 2 роки тому +27

      ​@@badass6300 Not really. GPUs are fundamentally different than CPUs due to the parallel / vector nature. Some improvements have been made to handle thread divergence, but they are never going to be robust as a CPU.... otherwise.... they would be CPUs...
      As for OoO, both NVidia and AMD GPUs do OoO internally. It's not something advertised though.

  • @desmondbrown5508
    @desmondbrown5508 2 роки тому +284

    I think it would be interesting to go into why GPUs CAN have so many cores, be parallelized more effectively and with better thermal efficiency, but CPUs cannot. I know the answer, but I do think it would be an interesting follow up video.

    • @JustinShaedo
      @JustinShaedo 2 роки тому +14

      Total agreement. I don't know the but I'm certainly curious!

    • @richardg8376
      @richardg8376 2 роки тому +104

      @@JustinShaedo A basic explanation would be that the kind of work a GPU does is easy to break up and spread among hundreds of small cores, and a GPU is designed for parallel processing on tasks that don't depend on each other.
      In a GPU you define a single program called a "shader", essentially a script which defines various inputs and what the GPU should do with those inputs.
      Each core on the GPU then runs in lockstep with each other: they all run the exact same shader script, albeit with different parameters. You cannot have half the cores run one script and the other half run the other. This is great for 3D graphics where the output of each pixel on a screen can be calculated independently, all using the same script.
      This is also why we still need CPUs and not just run everything on GPUs: GPU cores cannot run separate processes simultaneously on each core: only hundreds of copies of the same process with different inputs.

    • @Chezrlz009
      @Chezrlz009 2 роки тому +26

      @WJ gpus are designed to do a bunch of math at once. Each core is designed for a very specific task. Hence tensor cores, rt cores, etc. Cpus are supposed to be able to handle anything and everything, but maybe not as efficiently.

    • @Chezrlz009
      @Chezrlz009 2 роки тому +6

      @WJ i dont see how that shows gpus being able to have more cores in the first place. Utilization is different than physical cpnstraints. Alsp, they want the OS to work on old laptops or cheaper ones which the majority of which have really weak processors with few cores. If ms optimized windows for pcs with 8 cores, pcs with 4 cores would struggle to do anything. I agree though. I wish linux replaced windows. Windows is a cash grab but so is everything in capitalism that isn't truly for free.

    • @Chezrlz009
      @Chezrlz009 2 роки тому +4

      @WJ yeah it stinks. Sadly, quantum computing has the same issue as more cores. Noone will want to switch, but nothing digital can be translated. Qbits arent binary and work off of quantum superpositions. They are programmed entirely differently. Additionally, quantum entanglement is highly unstable and can't be observed or interacted with in any way or the particles will lose their entanglement and define themselves. That means you need to cool the qbits with liquid nitrogen, which is very difficult. You are correct about gpus being useful to streamers though. For certain tasks, computers can use gpus to perform tasks such as encoding streams and rendering videos on the go. Chip makers design a specific pipeline for a gpu that will help it perform tasks. Gpus made for rendering game graphics tend to work well with rendering streams which is handy. Pipelines are basically made up of core clusters that each perform a different tasks and instructions are sent through the pipeline to have things like shaders, sharpening, particles, textures, etc applied.
      Edit: tldr: quantum computing and more cores have the same obstacle of the economy and society having trouble changing rapidly and both, especially quantum computing, would disrupt a lot. :c

  • @RoelBaardman
    @RoelBaardman 2 роки тому +78

    You're more describing the limits of the Von Neumann architecture and our current (mostly sequential) models than anything else imho.
    Have a look at Erlang and the Actor model, and I think you'll argee that processors can scale just fine if we rule out shared memory.

    • @kayakMike1000
      @kayakMike1000 2 роки тому +2

      The functional wizard has spoken. (Pay no attention to the man behind the curtain)

    • @Handelsbilanzdefizit
      @Handelsbilanzdefizit 2 роки тому +2

      It's called memory driven computing. Very smart.
      HP tried this for a while.
      I have no idea what happened to this.

    • @RoelBaardman
      @RoelBaardman 2 роки тому

      @@Handelsbilanzdefizit Thanks for sharing!

    • @StCreed
      @StCreed 2 роки тому +1

      Occam on transputers already solved a lot of issues with programming. Too bad it never took off.

    • @RoelBaardman
      @RoelBaardman 2 роки тому

      @@StCreed Interesting, thanks for sharing!

  • @ccflan
    @ccflan 2 роки тому +66

    On of the best UA-cam channels out there, it feels like you should pay to see this Content so thank you

    • @LowLevel-TV
      @LowLevel-TV  2 роки тому +6

      Thanks for the love as always!

    • @ObligedTester
      @ObligedTester 2 роки тому +3

      Totally agree. I hope some of my youtube premium dollars end up on this channel 😅

    • @8lec_R
      @8lec_R 2 роки тому

      There's a patreon, feel free to pay. I can't afford to so I'd rather have content that is free and is viewer supported rather than something locked behind a paywall

  • @dannygjk
    @dannygjk 2 роки тому +64

    Back in the 80's I read a magazine article about the 'Connection Machine' which had 65536 processors but each processor wasn't like a core that we think of these days. Each processor was a tiny simple device which operated in a massively parallel architecture. Such machines had a limited practical value since they were specialized for a narrow range of problems and were also limited by being a 'hard-wired' architecture. Right now I can't think of a better description but I do know I should word it differently. I vaguely remember it had clever solutions to how to break down tasks and how the machine's processors worked together. It makes me think of things like ant colonies.

    • @ivanscottw
      @ivanscottw 2 роки тому +23

      Errr... GPUs ?

    • @dannygjk
      @dannygjk 2 роки тому

      @@ivanscottw I don't remember if the connection machine was analogous to a GPU because I don't remember the details of the architecture.

    • @mateusvmv
      @mateusvmv 2 роки тому +1

      Sounds more like a cluster

    • @dannygjk
      @dannygjk 2 роки тому +7

      @@mateusvmv iirc the whole machine's architecture was roughly analogous to a GPU. It wouldn't be like what people think of as a cluster we have these days. Each processor was a very simple device nowhere near what a processor is in a cluster we think of these days. It was huge tho compared to a GPU which isn't surprising since the first one was built in the 80's.

    • @littlemeg137
      @littlemeg137 2 роки тому +4

      The whole point of the Connection Machine's hypercube topology was to allow programmers to define the optimal architecture for the problem they were trying to solve. Unfortunately, very few HPC programmers of the time could make the cognitive leap to this model from Fortran on vector machines.

  • @benandrew9852
    @benandrew9852 2 роки тому +14

    I've recently started a job as a technical support engineer / technical writer working on complex digital signal processing applications. Videos like this, like yours, are exceptionally valuable to me as a non-programmer. There are limitations to design, implementation and efficiency that are contingent on factors entirely within low-level hardware programming, and having them explained so succinctly makes my job way easier, because I'm being provided with a higher level understanding that I can pass on to my reports. Props.
    And, on a more personal-craft level, the quality of your videos in terms of rapidly explaining complex topics through efficient use of graphics and constrained use of jargon is very inspiring. Well done.

  • @pwnmeisterage
    @pwnmeisterage 2 роки тому +32

    GPU and SPU cards already pack hundreds or thousands of "cores" onboard. They can only process simplex tasks, not complex tasks, but they can stream their outputs in near-realtime.
    They suck a lot of power and spew a lot of heat while working at full load.

    • @gantz4u
      @gantz4u 2 роки тому

      Which theyve been laying the ground work for since single core with things like liquid cooling and cryogenic cooling. Even my air cooler block is light years above what we had in the 1990's to where its on par if not out cools a 1990's water cooler.

    • @mm2f419
      @mm2f419 2 роки тому

      what are spus?

    • @hjups
      @hjups 2 роки тому

      @@mm2f419 I think it should have been "DPU" not "SPU". So the smart network cards like Nvidia's BlueField.

    • @JorgetePanete
      @JorgetePanete 2 роки тому

      simple*

    • @CocoaEm
      @CocoaEm Рік тому

      @@JorgetePanete they do lots and lots of simplistic operations just when its added up its complex.

  • @LilacMonarch
    @LilacMonarch Рік тому +38

    The "number of transistors doubling every 2 years" might already be hitting its end. The problem is in order to add more, they have to be made so small that it's impossible to keep the circuits properly separated. The gaps are so small that electrons easily jump across, causing shorts. Maybe we will see an increase in larger sized CPUs, but that will have its own problems.

    • @NFchegg
      @NFchegg Рік тому +1

      Chiplets

    • @AlMcpherson79
      @AlMcpherson79 Рік тому +10

      Improve efficiency without improving capability to the point that we can start stacking the processors... resulting in THICC CPUS.

    • @LilacMonarch
      @LilacMonarch Рік тому +17

      @@AlMcpherson79 now that sounds like a thermal nightmare

    • @KeinNiemand
      @KeinNiemand Рік тому +1

      The number of transistors still increases so we havn't hit the absolute end yet, but it probably has already slowed down from the doubling every 2 years of morse law.

    • @ViktardTRTH
      @ViktardTRTH Рік тому

      @@jsmith8147 I’ve got some bad news if you think quantum computing is the answer bc while it can theoretically perform faster there is no real world useful function with scalable architecture

  • @AlessioSangalli
    @AlessioSangalli 2 роки тому +227

    I always categorized "asymmetric" systems the ones that, while having multiple cores, do not have cache coherency - so it's up to the programmer to synchronize the cores. I once worked on a system that was running Linux on a core and an RTOS on the other, with independent MMUs

    • @LowLevel-TV
      @LowLevel-TV  2 роки тому +78

      Two OS's on separate cores, very interesting.

    • @llothar68
      @llothar68 2 роки тому +8

      Yes, i think Apple will have to go this way. We can see on the M1 Ultra that they hit the limit for chip connect already. But it could be nice to have a start with getting "blade computer" into the world of desktops. We had them in servers for a long time. Multi Socket Boards still try to do cache coherency. But unfortunately desktop computers aren't there yet.

    • @MatthijsvanDuin
      @MatthijsvanDuin 2 роки тому +7

      Embedded and mobile SoCs quite commonly can have lots of different cores with little or no coherency. For example TI's TDA4VM has (counting only freely programmable cores):
      - dual-core arm cortex-A72
      - three dual-core arm cortex-R5F subsystems
      - one TI C71x DSP
      - two TI C66x DSPs
      - two real-time subsystems with 6 TI PRU cores each
      with cache coherency only available between the cortex-A72 and the C71x as far as I understand (with snooping of main memory access by other cores or DMA, but no coherency with local caches of e.g. the R5F or C66x subsystems), while many of TI's older SoCs have no cache coherency whatsoever.

    • @kippie80
      @kippie80 2 роки тому +3

      This is already done with Intel and Apple chips. Security. Forget name in intel but Apple put its T2 chip in the cpu Mx series

    • @Mrcrappyfuntastic
      @Mrcrappyfuntastic 2 роки тому +3

      Didn't the Ps3 have a similar issue too?

  • @RonJohn63
    @RonJohn63 Рік тому +16

    AMD also released their first dual-core CPUs in 2005. (Of course, not everyone instantly bought them...)
    Another issue with huge core counts is cross-core communication: threads usually want to talk to each other, and the wiring between all those cores gets crazy. You effectively get a traffic jam in there...

    • @jessepollard7132
      @jessepollard7132 Рік тому

      That is what the crossbar switch is used for with communication between the CPU and the shared cache that mediates acess to the memory bus.

    • @RonJohn63
      @RonJohn63 Рік тому

      @@jessepollard7132 right. But even crossbars have a bandwidth limit.
      This is also why NUMA was developed.

    • @jessepollard7132
      @jessepollard7132 Рік тому

      @@RonJohn63 yes, but it isn't a bandwdth limit - but number of switches limit for physical implementation. NUMA provides the same interconnections but with different constraints.

    • @jessepollard7132
      @jessepollard7132 Рік тому

      @@expressionsartistic5856 Actually that was built by Sun, not Cray. If I remember right that was supposed to be the CS-64.

  • @MichaelBristow137
    @MichaelBristow137 2 роки тому +58

    My first computer had 48k (it was an Apple II+ with 16k extra memory). I remember learning some assembly language. Now I have a multi gigabyte memory (255 Gb SD, plus 128 internal) phone which takes 8 Mb photos... I am so amazed at how far we've come and what the computer is actually doing to even display what I'm typing right now. It's mind bogglingly amazing...

    • @EdKolis
      @EdKolis Рік тому +9

      I remember the animated intro for Megaman X saying back in 1993 that X had 32,768TB of RAM, and I had to look up what a terabyte was and I was like "lol what". Now that actually seems feasible in the not too distant future - will AI advance to the same point as X too?

    • @rodjacksonx
      @rodjacksonx Рік тому +3

      My first was an Atari 800, I'm pretty sure it had 64K. It was a fossil even when I got ahold of it. I recently fulfilled a childhood dream by building a new system and just maxing out the RAM for the heck of it. 128GB has never felt so good!

    • @cpK054L
      @cpK054L Рік тому +6

      @@EdKolis well... 64-bit operating systems won't go away for a LONG time... as it can address 16 exabytes (still at least 6-folds away)
      Nothing says you can't have 32 exabytes of RAM... they question is.. .why?
      You might as well live alone in a 100,000 sqFt mansion and ask yourself... why?

    • @embrikchloraker8186
      @embrikchloraker8186 Рік тому +2

      @@EdKolis What also amuses me is that, even in the future with specs like that, they're apparently still using DOS interfaces.

    • @Bobby-fj8mk
      @Bobby-fj8mk Рік тому +2

      I learnt Intel 8085 assembly language back in the early to mid 80s.
      What is actually going on is so simple on an instruction by instruction basis.
      At some point in history the CPUs were able to allocate tasks from a single program
      to multiple cores all by themselves without a programmer writing instructions
      for them to do that.

  • @TrippTech
    @TrippTech Рік тому +5

    (electrical engineer here)
    LOVE this, great explanation!!! One thing i would have mentioned, especially when talking about single core chips is "out of order" execution where the chip executes instructions as soon as its ready, rather than everything waiting in a queue. Probably one of the biggest advances in chip design in history.

  • @69k_gold
    @69k_gold 2 роки тому +13

    "If you're watching this on any device made in the last 10 years.."
    Me watching this on my 2008 Windows XP Professional PC with an Athlon chipset: *You're wrong*

  • @dmitrykargin4060
    @dmitrykargin4060 Рік тому +3

    Scientific computing guy here. Most often we hit RAM bandwidth limit. Sometimes we use all bandwidth by a single core with optimised AVX2 code and perfect memory layout. Using more cores will just slow down everything until you switch to a platform with more DDR channels.

  • @renchesandsords
    @renchesandsords 2 роки тому +52

    to be fair, in the science and datacenter space, that kind of core density can be effectively leveraged, threadripper and epyc proved it, and the development of processors like genoa and bergamo only serves to drive that point home further

    • @drstrangecoin6050
      @drstrangecoin6050 2 роки тому +2

      Yeah exactly. I got clickbaited by the title because I work on a system with over a thousand cores. Promise based task schedulers and MPI make it possible to recruit massive computing power for certain workflows and vectorizing loops over a distributed system is somewhat independent of code at this point. Old Perl script? Throw it into the OpenPBS scheduler with GNU parallel and loop over your entire data set as a matrix.

    • @prashanthb6521
      @prashanthb6521 2 роки тому +9

      @@drstrangecoin6050 I think you are getting it totally wrong. There is no machine with a single CPU of 1000 cores. You are using a cluster with independent memory bandwidth. That doesnt provide any hurdles mentioned in this video at all.

  • @RunForPeace-hk1cu
    @RunForPeace-hk1cu 2 роки тому +6

    More core = more memory = more cache = more interconnect speeds = more energy = more heat.
    Cache coherency nightmare.

  • @erikshure360
    @erikshure360 2 роки тому +15

    It's pretty much impossible for Moore's Law to persist for another 50 years -- transistors can only be so small. If anything, a different form a computing will takeover by then -- like optical computing.

    • @4.0.4
      @4.0.4 2 роки тому +2

      And they'll call it quantum computing for marketing purposes. Which isn't entirely wrong but not what people expect.

    • @officialrights6009
      @officialrights6009 2 роки тому

      Or analoug computers

    • @matsv201
      @matsv201 2 роки тому

      Well.. yes but now.
      Effecticly the way it was originaly precived it alreddy died 10 years ago... well really even more.
      The nm scale we have to day os symbolic, not real.
      The transistor density have been increased by using other tricks like standing transistors or more layers

    • @ABaumstumpf
      @ABaumstumpf 2 роки тому +1

      At some point yes, it will stop being correct. And no sane person doubts that.
      But we do not know yet When that will happen.
      Also quantum-computers very likely are not the answer to 99.999% of all problems as far as we are aware - they simply are too slow and inefficient for anything that does not involve sifting through enormous amounts of combinatory possibilities.
      @@matsv201 "Effecticly the way it was originaly precived it alreddy died 10 years ago... well really even more."
      No.

    • @matsv201
      @matsv201 2 роки тому

      @@ABaumstumpf you probally need to motivate your starment a bit

  • @xeridea
    @xeridea 2 роки тому +8

    The main issue is that, many tasks don't gain much efficiency from being split to many cores, due to having data dependencies on previous instructions. Generally, better applications for multithreading are those that have workloads easily divided up. Anything to do with graphics tends to be heavily threadable, which is why GPUs these days have upwards of over 10,000 tiny cores, you have millions of pixels on a screen, so it is easy to split up the work. Game logic, however, isn't as easy to split up, which is why games don't generally benefit from having more than 6 CPU cores. It would be trivial to have a CPU with 1000 cores, just shrink the cores. With CPUs though, it is generally better to have a smaller number of cores, that are better at executing code fast, than it is to have a crazy amount of simple cores.
    It is significantly more energy efficient to have more cores if workload can use them, which is why GPUs are so much more efficient at drawing graphics than CPUs. On the flipside, GPUs are pretty bad at general code, since to effectively use them, code needs to be what is referred to as "embarrassingly parallel". Many non graphics tasks are still able to be effectively programmed on the GPU, so they are still used for non graphics tasks, just not as CPUs.

  • @CustomCans
    @CustomCans 2 роки тому +10

    I saw the title of this video and instantly thought of the Cerebras wafer scale processors - I think they definitively prove that computers and CPUs can have thousands of cores ;)

    • @ABaumstumpf
      @ABaumstumpf 2 роки тому

      Can have and be useful for general purpose are very different things.
      You can build a jetpack, you can build a microwave that is powered by hamsters - does not mean you should do it or that it would make any sense to do so.

    • @leovang3425
      @leovang3425 2 роки тому +1

      @@ABaumstumpf more like having a supersonic airliner, sure its fast. But it's not economical or is it pleasant to be around.

    • @prateekpanwar646
      @prateekpanwar646 Рік тому

      @@leovang3425 Concorde

  • @saricubra2867
    @saricubra2867 2 роки тому +13

    I'm still using a 4 core 8 thread CPU from 2013 for audio. The code is NOT bad, IS the real time audio processing itself that runs in series which is the bottleneck. BUT, putting audio tracks in parallel scales way better with more cores.
    Some tasks by themselves ARE the bottleneck, not their code.

  • @herrxerex8484
    @herrxerex8484 2 роки тому +25

    This is one of my fav channels genuinely , could make a RISC-V series or compile resources for it to learn . would love to learn more riscv just don't have a structured way to yet

    • @LowLevel-TV
      @LowLevel-TV  2 роки тому +2

      Working on it! :D

    • @joelsmusic7771
      @joelsmusic7771 2 роки тому +1

      Risc processing is a college course offered at most universities.. I generally enjoyed working with this language.

    • @LowLevel-TV
      @LowLevel-TV  2 роки тому +8

      @@joelsmusic7771 RISC is the general idea of reduced instruction set computers, where as RISC-V is the open source architecture and spec for those processors. RISC-V is more like saying MIPS or ARM than RISC alone.

    • @mikapeltokorpi7671
      @mikapeltokorpi7671 2 роки тому +1

      I remember about drooling RISC processors in early 90's with my school mate. Seems to finally mature to commercial products (like Raspberry Pi replacement). Both high priced for the performance at the moment, though. However/depending on problem, you should get your discrete code running way faster than in CISC architecture on those.

  • @homeboy6668
    @homeboy6668 2 роки тому +27

    Hey, could you consider making videos on compiler design maybe? it'll be cool to learn too. BTW, awesome video.

    • @raven4k998
      @raven4k998 Рік тому

      how many core's will windows need in the future more then it needs now🤣🤣🤣🤣🤣

  • @nickscurvy8635
    @nickscurvy8635 2 роки тому +4

    Some electrical engineers, when confronted with a problem, say "I know, I will use more cores". Now they are the ceo of amd

  • @DDRWakaLaka
    @DDRWakaLaka 2 роки тому +6

    0:10 I think you've confused two different facts -- IBM's POWER4 is from 2001. You might be thinking of AMD's Athlon 64 X2, which is the first *consumer* level dual-core chip and is from 2005.

    • @CocoaEm
      @CocoaEm Рік тому +2

      the power4 chip he was on about isnt even multi core, its straight up 2 cpus on the same wafer.

    • @DDRWakaLaka
      @DDRWakaLaka Рік тому +3

      @@CocoaEm Yeah, I'm realizing now he's likely referring to the PPC970MP. Which, like you said, was MCM, not two native cores.

  • @overloader7900
    @overloader7900 2 роки тому +5

    GPUs: 11k cores and more on the way

  • @DJ_Force
    @DJ_Force 2 роки тому +9

    You didn't talk about wafer yield. The more cores you have, the physically bigger the chip. The bigger the chip, the more susceptible it is to random manufacturing defects. Meaning, the bigger the chip, the more likely it is to be defective. This can dramatically raise the price since you get less sellable chips per silicon wafer.

    • @WaterZer0
      @WaterZer0 2 роки тому

      So it's fair to say there's an ideal ratio in terms of cost to core count? At least from the manufacturer's point of view.

    • @DJ_Force
      @DJ_Force 2 роки тому

      @@WaterZer0 Well, the smaller the chip, the better the odds it doesn't have a defect. But yes, too small and it won't be powerful enough to do be competitive.

  • @wbtittle
    @wbtittle 2 роки тому +6

    Once upon a time, I was an entry level engineer for Bettis Atomic Labs. They gave us a tour of the facilities. As we were wandering the warehouse, our guide pointed to the 32,000 processor computer they were planning on using to design Atomic Reactors (I made that part up, they were just planning on tryign to figure out how to use a 32,000 processor machine).
    They were trying to work out how to program such a machine.
    Then we moved down the warehouse 20 ft. "This is our 128,000 processor machine"
    "Why did you buy a 128,000 processor machine before you figured out how to code the 32,000 processor machine".
    "Because it is bigger and better!"
    The hurdle of making a 32,000 processor machine work is much much bigger than making a 128,000 processor machine work after you figure out 32000 processors work.

  • @JohnMiller-mmuldoor
    @JohnMiller-mmuldoor 2 роки тому +5

    6:51 I need me one of them intel I69420 processors 😆

    • @GreatMossWater
      @GreatMossWater 3 місяці тому

      Sounds like a nice processor that smokes the competition out.

  • @albertsun3393
    @albertsun3393 2 роки тому +25

    Interesting thing about multiple cores is that coherence and even just latency in communication between multiple cores eats a huge chunk out of performance. Arbitrating cache coherency between one, two, maybe four cores isn't too bad, but when your critical path in coherency (or latching for multiple clocks) goes all the way across the chip, suddenly your performance drops like a rock. We've seen the transition from higher frequency to more cores because of the exponential increase in power consumption when increasing core clock, but with too many cores we sometimes struggle to even hit our initial clock due to all the overhead for everything else.

    • @Rokabur
      @Rokabur 2 роки тому

      From everything I've seen, more almost always means lower clock speed (unless you're over locking). My quad-core i7-4820K runs at 3.7Ghz while I've seen Threadrippers running at barely 3Ghz.

    • @Demon09-_-
      @Demon09-_- 2 роки тому

      @@Rokabur the thread ripper is not quite a fair comparison thats two different brands with different ipcs and applications. the less clocks may be true to some degree but it would be more fair for you to compare it vs intels new i9 12900k that has 16 total cores and still will run at 5gz out of the box it has some stuff with P cores and E cores. So if you want a more all P core comparision you can look at the 10900k that would still do 4.9ghz out of the box on all 10 cores and 20 threads. and comparisons inside amd are similar there newer 5950x which is 16 cores barley loses any clock speed to there lower 5600x . when comparing you really have to stay inside of the same artitech intel does lose a little bit of clock speed comapred to there lower chips when you compare in server stuff . But server stuff is a little bit of a different ball game where you can get up to 56 cpus that have alot of memeory lanes

  • @theldraspneumonoultramicro405
    @theldraspneumonoultramicro405 2 роки тому +1

    fun fact: there is a hard physical limit to how small a transistor can be, as eventually they reach such a small size that the electrons will freely flow thru it, thus leaving the transistor permanently locked in a on state, following morse law, we should reach that pysical size limit as early as 2023.

  • @seeibe
    @seeibe 2 роки тому +31

    Thanks to GPGPU, we already effectively have CPUs with thousands of cores, just with some limitations.

    • @matsv201
      @matsv201 2 роки тому +3

      That is very ture... but the flip side of that is that aplications that is easy to multi thread runs on gpgpu, while the one that is not run on the CPU... again limit the use of many cores

    • @null6482
      @null6482 Рік тому +1

      Hehe. "GPGPU"

  • @specialopsdave
    @specialopsdave 2 роки тому +2

    My dual-core desktop has had enough performance for everything until 2 years ago, but I don't play many AAA games anyways, so it still works fine for me.

    • @satrah101
      @satrah101 2 роки тому

      Same here, running Linux on it. Gets the job done,

  • @mully006
    @mully006 2 роки тому +8

    This is a good video, but I think that you overlooked an important aspect and that is HPC computing. While no single chips have thousands of cores, in high performance computing it is common to run code on many many nodes all with 64 or more cores.
    Additionally GPU are really just proceccors with more limited instructions and they generally come with thousands of cores on a single die.

    • @dannygjk
      @dannygjk 2 роки тому +1

      The architectures of GPUs and CPUs are different it's not just about instructions.

  • @zxuiji
    @zxuiji Рік тому

    Well adding more cores and keeping all buses available to all threads (albeit not at exactly the same time, just close enough) is easy, all that's needed is a dedicated chip who's only purpose is to loop through each boolean bit linked up to threads supported by the CPU to check if they need an operation done via RAM, the actual operation is then read from the thread that set the bit and once completed the bit is then cleared to say the operation completed, the thread doesn't need to care what bus was used, only that the operation was handled by the chip as soon as one was available. It being a chip it can be made to skip the "if N < THREADS_SUPPORTED" logic by just linking a power of 2 (2,4,8,16,32...) threads and letting the index overflow back to 0 during incrementation thereby reducing power consumption from it and the time it takes to get back to a waiting thread. As for RAM side of things, the most can do is sport an equal amount of RAM as needed to hold all apps and their virtual memory in memory, unlikely to happen any time soon though and would require some understanding on the user's side like "if I open this app with closing another then all apps will be slower due to surpassing RAM limit"

  • @littlemeg137
    @littlemeg137 2 роки тому +4

    The Paracel GeneMatcher had 6,144 cores. The Connection Machine had 65,536 cores.

  • @lockdot2
    @lockdot2 2 роки тому +4

    I am one of the few people still using a single core CPU to watch UA-cam. The CPU I use is a AMD LE 1640, with 1 core, 1 thread.

    • @utubekullanicisi
      @utubekullanicisi 2 роки тому

      You're able to stream at 4K no problem, right?

    • @Elinzar
      @Elinzar 2 роки тому

      Man... How?, Im sure even if you dont have much money you can scrap some am2 cpu with at least double the cores for like nothing these days and swap that cpu out, is it a desktop cpu right?

    • @dannygjk
      @dannygjk 2 роки тому

      @@Elinzar Sounds to me like they have a small laptop or netbook. I have a netbook it is also 1 core 1 thread and only 2 GB RAM. I would add more RAM but I don't think there are RAM modules larger than 2 GB for it and there is only 1 RAM slot. It can barely stream a video at 360p.

    • @saricubra2867
      @saricubra2867 2 роки тому

      9 year old 4 core 8 thread Intel Core i7-4700MQ here at 3.4GHz
      99% or 100% CPU use in one thread for gaming, audio, even for loading and saving stuff to the HDD (now SSD and the CPU is the bottleneck for the SSD, still ridiculously fast).
      That microbe AMD system would 100% freeze in a DAW lmao.

    • @Elinzar
      @Elinzar 2 роки тому +1

      @@dannygjk i looked it up and one page said it was a desktop cpu from the AM2 platform
      other page said it was a 2014 chip...

  • @diconustra
    @diconustra 2 роки тому +4

    I was sysadmin on a couple of multi-processor machines which had very similar issues scaling, except across CPU's and CPU boards. One was an IBM X460 with 4 interconnected chassis, each with four Intel CPU's, the other was an E25K with a half-dozen CPU boards, each with four SPARC CPU's. In each case, we ran into scalability issues related to memory bus bandwidth, the latency of memory fetches and bus I/O across chassis (X460) or boards (E25K).
    Operating system and database configuration and tuning helped, but ultimately both platforms faced diminishing returns on performance as boards and chassis were added, with 16CPU's being the sweet spot.

    • @llothar68
      @llothar68 2 роки тому

      Apples M1 Ultra already hit it. I'm very curious how they will design their MacPro. But i predict we go with multiple computers in the same chassis, also known as Blades in the 2000s server days.

  • @Cyberfoxxy
    @Cyberfoxxy 2 роки тому +4

    Meanwhile a common GPU boasts 8000 cores. Tho they are much slower and has only a small set of instructions. Also the instruction set is not standardized. As such OpenGL/OpenCL is implemented by the vendors themselves.

    • @coleshores
      @coleshores 2 роки тому

      Still Turing complete though. There are highly parallel SQL Databases which run entirely on the GPU, such as Omnisci (formerly MapD) for example.

  • @zolp
    @zolp 2 роки тому +3

    There are already 3-digit processors in abundance, memory access also continue to improve, and there are many applications that parallelizes well. GPUs already have thousands of cores and are put to good use.

    • @romanpul
      @romanpul Рік тому

      Yeah, but you can‘t really compare GPUs to CPUs. To my (given kinda limited) understanding GPUs resemble more to vector processors and are only efficient for usecases where your input data can be vectotized (ie cases where you fetch huge chunks of data at once and then crunch it). CPUs on the other hand are much better at crunching data which requires frequent, atomic memory access due to their way more elaborate caching architecture

  • @310_Latchkey_kid
    @310_Latchkey_kid Рік тому

    This is my first time watching one of your videos and honestly all I can say is that your answer to all those questions are very comprehensible and easy to understand! Great work.

  • @bwiebertram
    @bwiebertram 2 роки тому +3

    In the future, one super computer will do the work for every person on earth

  • @CogentConsult
    @CogentConsult Рік тому +2

    Want to hear something scary? In 1969 when the Saturn 5 rocket carried our astronauts to land on the moon, their command module computer had less computing power than today’s pocket calculators. In 1981 my first home computer had only 64k of RAM, a 5mb Hard drive that weighed 66 lbs, used an 8-inch floppy disk, ran on the CP/M operating system and had a green monochrome CRT monitor. Its only function was as a very advanced machine code language translator and word processor for the court reporting profession. Its cost was $50,000. I had two; one for me, one for my wife, since we were both court reporters. The first computer I used was in high school. It was a Univac and we fed our code into the machine via computer punch cards. It was the size of a mini van and weighed nearly as much. We wrote our code in the FORTRAN language and later COBOL. Yeah, I was one of the computer pioneers. Over the past 40 years I’ve owned over 45 computers, five of which I’ve built myself. It has been incredibly fascinating to watch technology get faster, smaller and more powerful each and every year! I am by no means an expert in computers or software design; I’m just a guy who has used computer technology to have employability on my side. What a ride it has been…and what an exciting future computing has: quantum computers…

  • @diablo.the.cheater
    @diablo.the.cheater 2 роки тому +6

    Some task can only benefit from paralelization until x limit, some task simply are not even paralelizable, some have very minor gains that would add unnecesary code complexity and some task you can always trow more cores to do faster.
    In general PC use case, most tasks are sequential so more cores only benfit you if you are doing a lot of multitasking

    • @rtyzxc
      @rtyzxc 2 роки тому

      This. Game logic for example. First you tell a character to move x amount. Then you check for collision, and if hit, correct the position or execute some logic. Then you might check if the character is shooting, which again, depends on the character's position. Things have to happen on the correct order, you can't just have multiple cores do each thing simultaneously, or the results would get messed up depending on the order in which the tasks happen to be completed.

    • @techpriest4787
      @techpriest4787 Рік тому

      @@rtyzxc that is why OOP is not a thing for games but data oriented programming makes more sense. All languages are OOP except of Rust. You can do DO in C++ and C# too but that is abuse. They are not really made for that.

  • @rickpontificates3406
    @rickpontificates3406 Рік тому +1

    DMA comes into play also. Memory allocation is important, but having a CPU's MMU core managing its own memory helps ease the bottleneck

  • @endurofurry
    @endurofurry 2 роки тому +8

    i had a 9980XE which is a 18 core processer. but only gets up to 4.5GHz so i decided with the new 12th gens and ddr5 I would upgrade to the 12900KS which is a 16 core (8 efficiency, 8 performance cores) at 5.5GHz and honestly I think my system ran better with the older extreme edition then the much faster newer processer. so it doesn't seem the speed is everything either, I figured the much faster speeds would make up for the few cores lost but it really didn't, i use this PC for gaming which most games don't even use more then 4 cores so my assumption was faster single core performance would be better then more cores, but that seemed to be false.

    • @Demon09-_-
      @Demon09-_- 2 роки тому +1

      eh you should have seen better performance in games if you were cpu bound. Games these days can and will easily use over 4 cores and depending on your gpu and the settings and the game you could see fps improvments quite high. But if your running alot of background or other applications more total performance may benifit you then having the higher ipc. not to mention ddr5 is quite meh atm and basicaly equal to fast ddr4 kits.

  • @AgentSmith911
    @AgentSmith911 2 роки тому +19

    I just discover a law that is a lot like More's law, but for cores. It says that eventually, we will be in theory reach so many cores that it doesn't matter if we add more cores and threads. It's called Amdahl's law.

    • @matsv201
      @matsv201 2 роки тому

      That law is often missunderstood, its about compute latency, not preformance

    • @davidolsen1222
      @davidolsen1222 2 роки тому

      Amdahl's law is about the relationship of different performance based things within a computer. Where if you take some section that takes 90% of the time and hyper-optimize the crap out of it, so it takes 10% that it's previous time, you've managed the amazing feat of speeding up the system 5X and now you need to optimize the other stuff that didn't take much time before. You end up speeding up one part and that's good but then the other parts become wildly more important and you get diminishing returns on those types of optimizations.

    • @jessepollard7132
      @jessepollard7132 Рік тому

      already limited to the bottleneck between CPU and RAM.

  • @johndoh5182
    @johndoh5182 2 роки тому +5

    At 7:00, this issue is once again what I mentioned for 2:15. It's multi-threading. I blame Intel for programmers trying to play catch up with modern many core processors, although there were many software engineers that knew Intel was wrong, and this goes back to Intel vs. AMD around the time of 2008 - 2011, when desktop CPUs had loads on them that started to become large. Intel basicallly said you don't need to add more cores to desktop computing because they will be able to keep improving IPC and getting clock speeds faster and faster. And they seemed to be right because their 2c/4t CPU where better than AMD 4c/4t CPUs. And when AMD came out with an 8c/8t CPU it didn't fare a lot better. Well, those first gen 8c/8t CPUs had core pairs that shared a single FPU while each having their own APU. I know from going through classes that I was taught that FPU math was better. It really wasn't as far as running it on an X86-64 CPU. It is NOW, but it wasn't so much then and it certainly wasn't for AMD 8c/8t 8ALU/4 FPU CPUs. If only professors would have remembered WHY it is you learn Discrete math.
    Regardless, the point is one of multi-threading and how well an application does it. This is something that takes a lot of work for programmers, and lazy developers in the last 2 decades didn't want to think about it and Intel gave them a reason not to. Writing and testing multi-threaded software is harder. I can write a multi-threaded algorithm that is the same speed as a single threaded algorithm or possibly even slower. If one thread is simply waiting for another thread to finish work, such as I have a main thread that spawns another thread to run some function, but my main thread is waiting, this is slower even though it's multi-threaded. So multi-threading requires experienced programmers or engineers to work with a project to evaluate the software development, and it isn't always so obvious if doing one thing vs. another is more beneficial.
    There was one solid point you brought up other than the failure of programmers over the last 15 years to move towards developing their skills writing multi-threaded applications, and that was memory bandwidth. There is nothing other than that you brought up which is a physical limitation until there are other conditions thrown into the conversation, which then means this conversation needed to come from a person who can describe power efficiency, nodes, how clock frequencies affect power efficiency, etc........

    • @Dhaydon75
      @Dhaydon75 2 роки тому

      Another problem is you can have more cores or higher IPC and Freq but still be slower. But that is more of a time critical system problem.

    • @billyswong
      @billyswong 2 роки тому

      The infrastructure and tools for efficient multi-thread software development are not yet polished enough. In theory a programming language could handle thread pool implicitly, in an OS-neutral way. Meanwhile the OSes would provide part or all of the thread pool implementation such that multiple programs using thread pool at the same time won't overcrowd the CPU and introduce unnecessary task switching.

    • @ABaumstumpf
      @ABaumstumpf 2 роки тому

      " blame Intel for programmers "
      And you'd be wrong. Or rather you are just wrong.
      "So multi-threading requires experienced programmers or engineers to work with a project to evaluate the software development, and it isn't always so obvious if doing one thing vs. another is more beneficial."
      If that were the only thing. Many problems simply can not be processed in parallel. The towers of hanoi are a often used example.
      And not to mention all the other problems like coherency, scheduling and specially the bugs that creep up.

    • @johndoh5182
      @johndoh5182 2 роки тому

      @@ABaumstumpf I know not every problem can be solved by multi-threading. There has to be real parallel paths in processing for multi-threading to make any difference. But that parallel path can simply be a few microseconds and it's still beneficial. It can be two sets of calculations that can happen independently and you'll get benefit.
      However Intel said EXACTLY what I said they did when AMD was releasing CPUs with more core counts than Intel. So yes, they were part of the problem. And yes, programmers have been lazy in many companies, and yes many programs can be written much better.
      You're about to see this play out in game engines and what happens when you bring near realistic graphics to a game. Part of this of course is the ability of a GPU, but part of this is the game engine. Unity for instance has been notorious for saying that since a main game thread dictates how fast a CPU can process code, having a game engine be multi-thread only adds complexity with no benefit. On the other hand, Epic Games released UE5 and games are going to be coming out on it starting the end of this year. I watched demos of Matrix Awakens and it was pushing a 5950X to around 40% CPU utilization. Simple math says this game with this game engine overwhelms a 4c/8t CPU, it pushes a 6c/12t CPU to 100% so even that is going to be a bottleneck, and an 8c/16t CPU is going to be minimal to run the game without the CPU being a bottleneck due to being overloaded. There's other reasons the CPU can be a bottleneck, but this is going to be the first time as far as I know that for PC gaming, a 6c/12t CPU is going to be a bottleneck simply because it doesn't have enough cores.
      YES, INTEL SAID that gaming would never require more than 4 cores. Now, finding old information with a search engine isn't very easy, so I'm not going to bother digging. Of course by the time they put out 9th gen, AFTER AMD had released very effective 8c/16t CPUs, Intel did a 180 on THAT statement.
      I'd be a millionaire if I had a dime for every time I've heard a game will never need an 8c/16t CPU. Maybe a slight exaggeration, but I think you get the point. What I think is going to happen is if a game company wants to develop a game that looks realistic, they're going to use UE5 and Unity will be relegated to more simple graphics.
      Autodesk, same thing. Their software gets poor CPU utilization and often when people have a powerful system, EVEN WHEN the software is rendering an image on screen, it's painfully slow. You read comment threads on their site for different software packages and users complain about this, and point out other software that does the same type of rendering and it's much faster.
      Adobe, same thing. They've improved SOME of their software.
      At some point people will leave these companies behind when new hardware is still running like a turtle.
      So yes I know some software cannot be optimized more than it is. But I also know that thousands of students have gotten a BS in software engineering and their professors never emphasized multi-threading along with testing multi-threaded applications. And I also know that in many cases, I'm right and we're going to agree to disagree. I was a person BTW who went through most of a BS degree in software engineering (I had already retired from the military and time was catching up to me along with my back breaking down) and saw this first hand. I ended up having back surgery before my senior year, and after that point I only wanted to work part time and didn't feel like putting 100% of myself into another career.

    • @johndoh5182
      @johndoh5182 2 роки тому

      @@billyswong I agree, and I'm sure there are still many universities that don't push software engineers to program this way, and testing is hard.
      Testing effectiveness for multi-threaded applications, when the intent is to speed up the time it takes to run means time testing along with testing that functions work the way they're supposed to. Multi-threading can slow down an application if done improperly. Simply spawning threads to complete a task, if some other thread is simply waiting for that data can slow down performance due to passing data back and forth.
      So yes it does require testing and the testing is going to be very complicated, but in the end it's the right thing to do for applications that require a bit of computation, and not simply a text editor or other simple computing.
      "Meanwhile the OSes would provide part or all of the thread pool implementation such that multiple programs using thread pool at the same time won't overcrowd the CPU and introduce unnecessary task switching."
      When you have something like a 6c/12t CPU even the Windows schedulers do a good enough job at minimizing context switching. That's not really the issue. Sure if you're doing a bit of multi-tasking it can become an issue but that's not really what I was talking about. And even with multi-threaded apps, I would think that between the application and the scheduler, the scheduler isn't randomly switching a core from one thread to another. I would think that since many threads are short lived, they run to completion so data can be passed, before another thread is loaded to that core (where even with a 6c/12t CPU, it's viewed as 12 cores). When you move up to 8c/16t CPUs and even more cores, this should get easier for a scheduler to handle.

  • @Ferrari255GTO
    @Ferrari255GTO 2 роки тому +1

    The sweet spot for most consumers is 8 cores imo, most games don't need more, and asuming your CPU is fairly modern it will be perfectly capable of doing whatever you require it to without issues. It won't be an oven, but it will still need some decent cooling and since it's an 8 core it won't be top of the line, making it cheaper than other CPUs while delivering a really good experience. What i mean is don't just get the biggest thing you can, it might not be as convenient as you think

  • @AlessioSangalli
    @AlessioSangalli 2 роки тому +9

    "Symettric" (5:05) well typos happen 🤣 seriously however the quality of the production is awesome, I wish I were this good with video editing. What program do you use, out of curiosity?

    • @LowLevel-TV
      @LowLevel-TV  2 роки тому +7

      Hahaha crap, there’s always one. I use Davinci Resolve, largely because it’s free XD. Thank you!

    • @vikassm
      @vikassm 2 роки тому

      @@LowLevel-TVFree, yes, also the small matter of it being the most powerful, fully featured A/V production suite in the world 🤣
      If it works for MARVEL, I'm sure us 'lowly' UA-camrs can make do with DaVinci Resolve 😂😂

  • @FalcoGer
    @FalcoGer 2 роки тому +2

    Moore's law was never a law to begin with. It was more of a design goal for the engineers. And now things are getting so small that individual transistors are only a few tens of atoms across. Given that silicon works by having a very specific amount of impurities, such as phosphorous, in them and that quantum effects start to mess with the whole process, it's physically impossible to keep up transistor doubling, let alone in a 2 year timeframe.

  • @mryodak
    @mryodak 2 роки тому +28

    LLL: "Computers Can't Have Thousands of Cores"
    GPUs: Am I a joke to you?

    • @hjups
      @hjups 2 роки тому +1

      GPUs technically don't have thousands of cores either. The Titan V only has 80 (the SM is the equivalent to a CPU core, not a "CUDA Core").

    • @mryodak
      @mryodak 2 роки тому +5

      @@hjups SM(Stream Multiprocessor) are just collections of CUDA cores as far as I know. And Radeon calls their stuff Stream processor and they also have thousands of them.

    • @hjups
      @hjups 2 роки тому +7

      ​@@mryodak That's correct. But they are not "cores", they are ALUs. Put it this way.... you can either claim that the Titan V has 5120 cores and the 5900x has 816, or you can claim that the Titan V has 80 cores and the 5900x has 12.

    • @Conenion
      @Conenion 2 роки тому +1

      GPUs don't have cores. That is simply wrong. They have very small computing units, but many. The entire GPU architecture is targeted towards making a single thing fast, i.e. the graphics pipeline. It can be used for some special number crunching stuff (GPGPU) but that is not what the people who designed GPUs had in focus. When programming for a GPGPU you use a very special style of programming and you have to do a lot of things "by hand".

    • @mryodak
      @mryodak 2 роки тому

      @@Conenion Cuda is c++, opengl is c++, vulkan is c. Other then being parallel and having it's own instruction set, what's the difference?

  • @therosses5
    @therosses5 Рік тому +1

    the first computer I touched was the Tandy radio shack 80 model 1. I was surprised you were able to explain cores in a way an old guy can understand. very well done. I'm astounded that after decades the speed of our apps are still held hostage by sucky HD read/write nonsense.

    • @jessepollard7132
      @jessepollard7132 Рік тому +1

      Basically, a core is just a CPU. a multicore processor, is just a collection of CPUs wired together to access RAM. it is why each core has an L1/L2 cache and sometimes L3 dedicated to its operation, then a shared cache for all CPUs to use for access to RAM. The shared cache is either called L3 or L4 (usually L4 if the CPUs have dedicated L3).

  • @occapella8643
    @occapella8643 2 роки тому +3

    At its most basic level, a CPU is just a rock that we trapped lighting inside of and tricked into thinking.

    • @xCwieCHRISx
      @xCwieCHRISx 2 роки тому

      If the apocalypse comes those magical stones are very valuable.

  • @RealCadde
    @RealCadde Рік тому

    It would be worth mentioning the difference between parallel and linear programs as well.
    A linear program is one that, in a simple example, takes the output of the previous operation as an input for the next operation.
    a
    a + b
    ab + c
    abc + d
    ...
    That's a linear operation.
    A parallel operation on the other hand does NOT rely on the result of the previous operation in the program as a whole.
    Using the previous example again, but making it parallel...
    Core 0:
    a
    a + b
    ab + c
    abc + d
    Core 1:
    e
    e + f
    ef + g
    efg + h
    Core 2:
    i
    i + j
    ij + k
    ijk + l
    Core 4:
    m
    m + n
    mn + o
    mno + p
    Then as all four cores have ran their code in their slice of the data, they can synchronize and this happens:
    Core 0:
    abcd + core1 + core2 + core3
    or...
    abcd + efgh + ijkl + mnop
    But before that can happen, ALL cores must have completed their slices. In this simple example it's no biggie. Each core runs their slice linearly and in linear time too. So they should all finish at the same time.
    But in reality, not every program is that simple. Some slices take more time than the others to complete as they do more complex operations. In the meantime, all other cores are just sitting around waiting for the most complex operation to finish. Well, they are free to do other things but not for that one program as the program is waiting for the biggest slice to finish.
    Being able to evenly slice up threads of a program such that they all finish at roughly the same time is almost impossible in more complex programs. Especially when you aren't the only program using the cores available as the scheduler might not agree with the program using all cores at that moment in time.
    A somewhat perfect example of parallel tasks that actually do take the same amount of time every time (almost) is what the GPU is doing.
    The GPUs of today has some ten thousand cores. They all work on their own slice of a rendered image.
    Say you have an image that is 1000 x 1000 pixels large, or a megapixel image if you will. Those 10,000 cores will each be working on a region that is 100 pixels large.
    If the task is to fill a gradient horizontally across the screen, then each core simply takes the starting and ending colors and interpolates those going from start to end in their block.
    This operation takes exactly the same amount of time on each core so it just works on GPU's... Because graphics is less complex than programs are in that sense. Graphics don't tend to sit around waiting for user input, network communications and access to memory.
    Each batch on a GPU has exclusive access to memory and all cores. The more data and operations you can cram into a batch the better, otherwise you have to keep telling the GPU what to do.
    In other words, it's better to tell the GPU to draw ten million polygons in one batch than it is to tell the GPU to draw a million here, a million there and another million there...
    When the GPU has ALL the data in one batch, it splits the tasks amongst all cores equally and just barfs pixels back at you in no time.

  • @leftlovers9137
    @leftlovers9137 2 роки тому +4

    I searched this and wala I found your video 11 hours after upload lol

  • @philipmcdonagh1094
    @philipmcdonagh1094 2 роки тому +1

    You answered everything when you said there was a Boss core. Take the real world what do Bosses do, slow overall work performance down. Thank you.

  • @Kevin-jb2pv
    @Kevin-jb2pv Рік тому +6

    "Can Intel make a processor with 1,000 or more cores?"
    Yeah. They're called GPU's.
    I know a GPU is different as far as what it's designed for, but fundamentally it's the same concept just optimized for different tasks. I'm pretty sure that if you had the time, skills, and desire, you could take a GPU (the chip, not necessarily the whole card) and design a Turing complete computer around it functioning as the CPU. It would suck and be super limited and totally not worth the effort, but it would technically still be a computer.

  • @1over137
    @1over137 2 роки тому +1

    I know you are simplifying but multiple parallel executions have been possible in single cores for a lot longer than we have had multi-cores. There are many CPU tasks which take many clock cycles. Some of those tasks can be executed in parallel with other instructions. Instruction pipelining, speculative execution etc, all work in single cores resulting in an IPC (instructions per clock) greater than 1. As to whether a hardware context switch could occur within the pipelining ... my understanding is that, "hyper threading" is a relatively recent thing, but it exists.

  • @johndoh5182
    @johndoh5182 2 роки тому +8

    6:00, Thermal efficiency. This is hard to throw into a conversation about core counts because a CPU can be lower speed or high speed. Then you have constraints of a node being used. These things together mean that Thermal efficiency has little to do with how many cores can go into a CPU, or if we want to be more technical, a die or chiplet. If one says for instance that due to a thermal limit of X, this die can only have 8 cores, that not really a true statement. It's more on the line of, due to the thermal limit of X and running a processor at a speed of Y, on THIS node a core chiplet using AMD's Zen 4 X86-64 core should have no more than Z cores. Every node has different thermal limits, and different characteristic which cause ever faster speeds to cause the die to heat up to the point where thermal limits are the main constraint. You can clock Intel's Intel 7 obviously up to 5.3 - 5.5GHz which is consuming a large amount of power but clearly it's not affecting the efficiency of the core to do it's work. What is happening more is POWER efficiency rather than thermal efficiency. On the other hand, TSMC N7 isn't efficient over 5GHz in any way. Maybe this will change over time.
    So thermal efficiency is really an edge issue, not a main issue. I could have a die with 30 cores if I run thenm at one speed, and only 8 cores if I run them at another speed when loading all-cores to 100%. So, that's not a BIG constraint and not one I would have led off with.
    This is a situation that just because someone has put out some data, you have to be careful on how you use that data. It's a neat chart that was shown but only useful for some use cases. There had to have been a lot more data talked about before that chart was shown, or David Henderson from GA Tech is not very sharp. Without talking about all that other data, this point is like my other comment, painting a wrong picture.

    • @AnarexicSumo
      @AnarexicSumo 2 роки тому

      How pedantic. Firstly, it's an issue. Whether you think it's a fringe or main issue isn't really here or there. Secondly, your comparison to a slower processor with more cores being cooler is intentionally arguing in bad faith. All else equal, a processor with twice the cores will run hotter and require more cooling to run at its best. In fact due to inefficiencies they will run *disproportionately hotter*. As a rule, consumer CPUs with more cores require more cooling.

    • @johndoh5182
      @johndoh5182 2 роки тому

      @@AnarexicSumo So what you're saying then is every time you use a new node, the argument changes.
      "In fact due to inefficiencies they will run *disproportionately hotter*. As a rule, consumer CPUs with more cores require more cooling."
      So far these inefficiencies ARE related to clock speed. Every node that every fab makes has a point to where pushing beyond that requires more energy than it's worth for the return amount of work being done by the CPU. AND, this is INDEPENDENT of core count.
      As a rule more cores requires more cooling when everything else is equal. But that's the point. Everything else is always CHANGING! So there are no HARD rules for core count with regards to THERMAL EFFICIENCY. It depends on everything else. It's a secondary point. NOT primary. THAT is the point. And yes that is arguing in good faith. The points made in the video is arguing in bad faith.
      To quote "In full transparency some processors these days"........................... and then proceeds to talk as if it was magical that there exists 64 core CPU, which he simply called "double digit", which I find laughable.
      So yes, thermal efficiency is ONE point, but I could probably put 50 cores of compute power in an Apple iPhone using TSMC N3. I don't NEED to, but because that die is clocked slowly, those tiny cores would be NOTHING at the speed at which they operate. So in that case, thermal efficiency ISN'T a limiting factor for the number of cores that are in the device. And that's why I made the point I did. There's no such thing as a certain number of cores that creates a thermal inefficiency. It depends on too many other factors.
      Here, points made in good faith for the limit of core count:
      Memory capacity. Each core needs to have a certain amount of memory space. What that amount of space is, is widely variable because it depends on applications being run.
      Bandwidth into and out of the CPU. The bandwidth needs to be capable of handling the input or output of data that each core could require. What this amount of bandwidth is, is widely variable because it depends on the applications being run.
      Capability of the operating system. The OS has to be able to schedule processes (threads) for each core. If there are so many cores that a scheduler cannot direct threads to each core because the scheduler is not fast enough to rotate through all the cores, then this is too many cores for that operating system. But this is widely variable and depends on the applications being run because a thread can be short lived or long lived.
      I'm trying to think of limitations and the MAIN one that comes to mind is space constraints. This is a REAL constraint, because it doesn't depend on other factors. So, space. AMD is going to be able to release server and WS CPU with Zen 5 that can have 192 cores, or even more. Based on current space, that's what AMD will be able to do with TSMC N3 with either a server MB or a WS MB. And if you're wondering how I get that figure, N3 triples the transistor density over N7. But AMD could be moving to big-little for Zen 5, and AMD might be moving to L3 cache being off-die and being stacked, in which case based on current space constraints, they could probably get up to 256 cores on a SINGLE Zen 5 EPYC CPU. But they'd have to make other changes to the CPU architecture and other architecture to pull that off. PCIe gen5 even with all the lanes that EPYC has probably won't move data fast enough so it would probably need to be using PCIe gen6, which means the rest of the hardware will need to be PCIe gen6. And then DDR5 with 8 memory channels wouldn't be good enough even at the fastest rated speeds. And, with DDR6 supposedly using the same data word length as DDR5, I highly doubt memory bandwidth would allow for that many cores, for many SERVER applications. You'd have to rely on many of the cores already having cached the instructions they need to run so you don't have a couple hundred cores trying to hit memory at the same time.
      But would "thermal efficiency" be an issue for a 256 core CPU? For a server application using TSMC N3 which uses about 40% less power than N7, where boost clocks are usually in the low 3GHz? No, each core could run very efficiently. Total package power could be exceeded though, and that's not an issue of "efficiency" There isn't a limit because it's not "EFFICIENT" It's a limit because it's too much for that package. I THINK AMD could release a 192 core EPYC CPU, so WAY more than just triple digit, which makes this guy's "double digit" comment a complete JOKE. I THINK that with TSMC N3 and the lower clock speeds of EPYC, AMD can get up to 192 cores with Zen 5 as long as DDR5 has hit much faster speeds (they're at 6400 now) AND you increase memory channels to 12, AND AMD has move to stacking L3 cache and it uses something on the lines of 192MB - 256MB AND the hardware platform is using PCIe gen 6 AND AMD adds 25% more PCIe lanes to the CPU, although maybe the move to PCIe gen6 is good enough to handle the bandwidth needs of that many cores with the existing lanes they have now for EPYC.
      And I hope that helps to clear up your lack of understanding on this topic. If not we'll agree to disagree.

  • @marcinmorun
    @marcinmorun 2 роки тому +2

    What is the point of having an increasing number of cores, if only 1 or 2 programs are exploiting those cores and that only a few persons are actually using these programs?

    • @electroflame6188
      @electroflame6188 2 роки тому +1

      well you'll have more people programming things that can take advantage of (or even things that are only feasible with) a large number of threads for one

  • @Sourcer3r
    @Sourcer3r 2 роки тому +4

    Multi hundred cores are already running well,
    just in another way you might taught first: GPU or more specific GPGPU (general purpose gpu) applications.
    Just think a moment about ethereum, ai (delf-driving cars), rendering or scientific research (protein folding, space analysis).
    Of course: your standard operating system will not boot with just a GPU because the instruction set on a gpu compute unit is very limited.
    This might change in the future: take a look at the Apple m1 or any arm (mobile) chip... They can run more efficient in consumer applications, because they carry less instructions (therefore less transistors and shorter paths (wiring) that generate heat).

    • @youtubeshadowbannedme
      @youtubeshadowbannedme 2 роки тому +3

      Just because they run more efficient doesn't mean it'll give good raw performance. The M1 chips excel in both performance and efficiency because of the way Apple designed them to compete with Intel and AMD in the computer market. It's like how Intel was able to make x86 chips that practically was a knockoff of ARM back then, by the name of Atom brand. Only when Intel specifically went out of their way to make an extremely efficient x86 CPU could it happen.

  • @OverDriveOnline7921
    @OverDriveOnline7921 Рік тому +1

    In the world of x86, there have been multi processor systems for many years, I used to fix them in the mid to late 90’s frequently. However back then, the physical limit was 4 processors before system performance was hit, anything more than 4 were divided into sub groups of up to 4 processors and interlinked together with a separate scheduled data transfer architecture (until transputers came along, but that’s another story).
    This limit was overcome, in part by adding complex cache systems, and while 8 processor systems were now possible cheaply, there were two issues looming on the horizon, Moores law and physical space. The answer to keeping speed bumps predicted by Moores law? Bung more than one processor on a chip, this helps with space, and oddly enough, power consumption too.
    Further advancements have helped shove more cores, essentially what we used to call our CPU, onto a single chip, boosting performance as we go.
    However, doubling the cores does not double the performance, there are and always will be bottlenecks, which become greater with the more cores added, plus the thermal envelopes that our systems need to run under. In many systems now we get past this by breaking the chips down into multiple chip let’s, essentially smaller chips on a single chip chassis, or by adding multiple chips, meaning we’ve gone full circle.
    Still, it’s been interesting from my view, watching computing develop over my (nearly) 51 years at the time of posting this, with 3nm chips due to become mainstream, whole RAM modules fit into the space of an entire CPU from 4 decades ago.

  • @matsv201
    @matsv201 2 роки тому +5

    Intel have made a 1000 core processor... back in 2010... it really wasnt that large, it was a fork of 386 ment to run grapics code...so a x86 gpu.... I turned out to not really work well.. but the processor worked

    • @zredplayer
      @zredplayer 2 роки тому

      A 1000 core real CPU. Fo you have a proof that Exist?

    • @ultrapetey
      @ultrapetey 2 роки тому

      @@zredplayer en.wikipedia.org/wiki/Larrabee_(microarchitecture)

  • @adrianalanbennett
    @adrianalanbennett 2 роки тому +2

    One can never have too many cores, too much memory, or too much computing power.

    • @daedliy963
      @daedliy963 Рік тому

      that's where you're wrong though, the limit of just how much raw processing power actually gets to be used is extremely fickle
      bottlenecks can happen from the rest of the hardware not being able to keep up and causing bottlenecks (like the mobo) to software too simple to really fully utilize that extra firepower
      you'd only be able to use the 100% to show off

  • @christopherleadholm6677
    @christopherleadholm6677 2 роки тому +3

    "My mom- my momma says bad code is for the devil!"
    - Adam Sandler as Water Boy

  • @mwbgaming28
    @mwbgaming28 2 роки тому +2

    4-8 cores with a stupidly high clock speed would probably be the best setup for the time being

    • @Demon09-_-
      @Demon09-_- 2 роки тому +1

      probably about right. 4 with hypthreading is probably fine for the every day user and 8 with hypthreading is what most who plan to play games at this point should shoot for as games have already started to move pretty fast where 4 cores will leave you pretty hard cpu limited depending on the rest of your set up

  • @AlejandroRodolfoMendez
    @AlejandroRodolfoMendez 2 роки тому +5

    So far Windows for desktop have a limit of cores that can be used, Linux has not. But it's a thing for considering on the future.
    Maybe when the limit of core is reached they will make emphasis on number of instruction per cycle.

    • @clovernacknime6984
      @clovernacknime6984 2 роки тому +3

      They did, long ago. That's what pipelining, superscalar, out-of-order-executing processors are all about. However, there's limits to how much you can auto-parallelize a single thread, thus they turned to multi-core - which make the programmer parallelize explicitly - out of desperation, since all other avenues for improvement were exhausted.
      The future is more cores, because we hit the point of diminishing returns for adding more transistors to a single core long ago.

    • @AlejandroRodolfoMendez
      @AlejandroRodolfoMendez 2 роки тому +1

      @@clovernacknime6984 there was attempted seriously since pentium 4 on regular cpu were more on servers and specific cpus. The risc did more but at expense of the operations. Maybe return of cisc too can work.

    • @Conenion
      @Conenion 2 роки тому +1

      @@AlejandroRodolfoMendez
      Since Pentium Pro around 1995 all Intel CPUs are RISC-like internally. AMD followed. x86 CPUs are CISC from the outside, but internally they use all of the "tricks" that make RISC CPUs so fast.

    • @Conenion
      @Conenion 2 роки тому

      @@clovernacknime6984
      > out of desperation, since all other avenues for improvement were exhausted.
      Exactly. Well said.

    • @AlejandroRodolfoMendez
      @AlejandroRodolfoMendez 2 роки тому

      @@Conenion they weren't full risc tho. But yes they were doing stuff like that before.

  • @grndragon2443
    @grndragon2443 2 роки тому +1

    Then there is multi processor computers on top of that. Imagine if home computers were built more like a server. 128+ processors and being 10+ cores each.

  • @triularity
    @triularity 2 роки тому +5

    It's more likely the number of "core" will keep increasing, but most of them will be specialized (i.e. not full CPU cores with full system access). Instead, there could be a bunch of core doing something dedicated (but still programmable), such as encryption or compression in a way which they mostly keep to themselves except when being sent input or outputing results.

    • @mornnb
      @mornnb Рік тому +1

      That has trade offs - you have a large number of cores that can only be used for specific tasks that will spend a lot of time idle, where you could be using the transistors for general purpose tasks that can always be used.

    • @CocoaEm
      @CocoaEm Рік тому +1

      this already is a thing theres a dedicated encryption engine on every modern cpu. some tasks really do need that extra space of the die to be faster.

    • @DDRWakaLaka
      @DDRWakaLaka Рік тому +1

      Like Cell? Which was trash?

    • @triularity
      @triularity Рік тому

      CPUs already having encryption engines is a start. And some CPUs do include embedded GPUs for video - but better having it by default, even if there is no display support. Nowadays, going a step more optimized and including a few tensor cores would be useful with ML being more common.
      Maybe even having multi-precision integer math with common functions used in modular math (not just basic add/multiply operations of SIMD),. So newer (or less mainstream) encryption could still benefit and not just be limited to whatever happens to be in the bundled crypto engine. I personally hate it when crypto libraries don't include low level APIs for some standard algorithm.. so when a variant algorithm is needed to support some protocol, it forces developers to practically reinvent the wheel and roll your own from scratch, rather than re-using the existing implementation for most of it - which is just asking for a broken/insecure implementation. So why should it be all-or-nothing for hardware crypto either?

  • @romanpul
    @romanpul Рік тому +1

    To be fair, Intel actually tried to design a processor with thousands of cores a while ago, the XEON Phi coprocessor. Though in all fairness it was pretty much just a fancy GPGPU which was able to execute x86

  • @kimobrien.
    @kimobrien. 2 роки тому +3

    You can't have unlimited numbers of transistors because eventually you get down to the atomic level. The same with clock speed eventually the distance traveled across a processor from one side to the other is a quarter wavelength of the clock speed. Than the distance the signal travels becomes important. The size of a chip is also limited to that of about the size of a fingernail.

    • @vadimuha
      @vadimuha Рік тому

      There's subatomic level. It's great at parallel computation

  • @SelfMadeSystem
    @SelfMadeSystem Рік тому +2

    GPUs have hundreds of cores, but the code made for GPUs is made specifically for parallelisation.

  • @HuntingKingYT
    @HuntingKingYT 2 роки тому +3

    "Any computer in the last 10 years" - My pc, Dual-Core i3-2120, 10y/o

    • @saricubra2867
      @saricubra2867 2 роки тому

      My 4 core 8 thread i7-4700MQ made in 2013 looks like a last gen Threadripper in comparison.

    • @youtubeshadowbannedme
      @youtubeshadowbannedme 2 роки тому

      @@saricubra2867 i7 4700MQ isn't as fast as you think it is, and it definitley cannot compare to i7 4790K. your i7 chip is around the level of i7 2600K at best, but realistically it's probably closer to i5 2500K. this is of course assuming you didn't win the silicon lottery by a big margin. you would need at least i7 7700HQ to match the i7 4790K at the latter's performance at base speed.

    • @saricubra2867
      @saricubra2867 2 роки тому

      @@youtubeshadowbannedme My i7 outperforms that i5. And yes, it's between the 2600K, or i7-3770K
      I never said that it's equivalent to the 4790K.

    • @saricubra2867
      @saricubra2867 2 роки тому

      @@youtubeshadowbannedme 2500K lacks hyperthreading lmao.

    • @saricubra2867
      @saricubra2867 2 роки тому

      @@youtubeshadowbannedme I tested a family member's laptop with the i7-7700HQ and yes, it's kinda a 4790K at stock.
      On average, laptop CPUs are two years behind equivalent high end i7 from desktop, that changed with 11th gen core generation and 12th too, the gap is smaller. For example, the i7-11800H without throttling outperforms the 10700K that was launched before by one year.

  • @jeffreymelton2200
    @jeffreymelton2200 Рік тому

    I selected the video based on the name of the channel alone! Brilliant naming of the channel. Anyways the video was very informative. I actually learned quite a bit from it. I appreciate the style in which you narrate your videos. making the subject matter incredibly comprehensive, and digestible. Thank you for the content!

  • @lawrencedoliveiro9104
    @lawrencedoliveiro9104 2 роки тому +6

    According to the top500 list, the current fastest supercomputer in the world, RIKEN’s Fugaku, has 7,630,848 cores.
    Of course, they’re not x86 cores, they’re ARM. And it’s not running Windows, it’s Linux. That might help.

    • @mikapeltokorpi7671
      @mikapeltokorpi7671 2 роки тому +2

      Not in single silicon, though.

    • @lawrencedoliveiro9104
      @lawrencedoliveiro9104 2 роки тому +2

      @@mikapeltokorpi7671 Not sure why that’s relevant.

    • @Conenion
      @Conenion 2 роки тому

      > That might help.
      Minor. What /really/ helps is that these HPC machines were built with special purposes in mind. These machines typically run algorithms that scale very well. Like for example solving systems of linear equations. Number crunching stuff.

    • @lawrencedoliveiro9104
      @lawrencedoliveiro9104 2 роки тому

      @@Conenion The problems scale, up to a point. That’s why a supercomputer needs a high-performance interconnect which makes up such a big part of its cost.
      If it wasn't for that, a supercomputer would not be much different from, say, a server farm.

    • @Conenion
      @Conenion 2 роки тому

      @@lawrencedoliveiro9104
      True. They need a high-performance interconnect because Amdahl's law would kick in much earlier without.

  • @iancamarillo
    @iancamarillo Рік тому

    I have this feeling that we’re gonna go back to a single core in a revolutionary design that handles these executions in a different way

  • @kyleeames8229
    @kyleeames8229 2 роки тому +6

    I'm just gonna guess before I see your explanation. Firstly, there are actually relatively few computational problems that can be more efficiently solved with lots of parallelization. Secondly, once core counts go above a certain limit, your chip either has to be really big, or you need an unreasonably large cooling system to keep it from melting a hole in your floor. Ok, I'll see if I'm right!

    • @paklekj4429
      @paklekj4429 2 роки тому +1

      Had to refill the liquid nitrogen every 30min lol

    • @thelazarous
      @thelazarous 2 роки тому

      Well the temperature thing has already been kinda debunked. The original Pentium D is a perfect example; 2 cores, 2x the thermal load. But that's not really a problem with modern dual, quad, or even octuple cores. Today 32 cores requires 250w, in 20 years it'll take 25-50w. 20 years ago 8 full cores on a single package was considered stupid as nothing would ever even use them and if they did they'd melt, now I have 8 full cores in my laptop and they spend plenty of time at 100% usage.

    • @harvey66616
      @harvey66616 2 роки тому

      _"there are actually relatively few computational problems that can be more efficiently solved with lots of parallelization"_ -- uh, what? The class of problems suitable to SIMD architecture is quite large. It's been a significant chunk of research for decades. Modern graphics cards exist, and are in short supply, _because_ there are so many useful applications for that architecture, not just gaming.
      Indeed, the neural network machine learning space alone has myriad applications. And that's just one sub-genre of the larger picture.

  • @prashanthb6521
    @prashanthb6521 2 роки тому +2

    MEMORY BANDWIDTH is the biggest issue. Sysbench memtest on a 1st gen Ryzen 7 (8 cores) brings down the throughput by 80% !!!
    And same test on i5 3rd gen bogs it down by 50% !!!

  • @MindCaged
    @MindCaged 2 роки тому +4

    I still remember having those single-core processors for years and the really annoying problem where the computer would freeze because whatever program was running got stuck in an intensive processing loop or even just an infinite loop and was basically hogging the single-core to itself not letting anything else run. It was such a relief even when I got my first dual core, and I was wondering where this had been for so many years. Now I have a quad-core and to be honest I have to have a lot of programs running at once to fully utilize it, or maybe I have to find some program that can actually utilize all the cores at once, which isn't that many. Also, even if I could find one, it'd probably hit a different bottleneck in either memory access or file access speed.

  • @syarifairlangga4608
    @syarifairlangga4608 Рік тому +1

    as for ordinary consumer, we want 4 high performance core with 5ghz in all cores rather than hundreds of cores

  • @coalhater392
    @coalhater392 2 роки тому +8

    We do have thousands of cores it's called a gpu.

  • @ainzooalgown7589
    @ainzooalgown7589 2 роки тому +2

    there are thousand core computers, all of them are supercomputers, some use the entire wafer to make a CPU example is Ceberbras CS-2 has 850000 cores

  • @singular9
    @singular9 2 роки тому +7

    You could say that we already have thousand + core CPU's called GPU's 😎

    • @saricubra2867
      @saricubra2867 2 роки тому +2

      Thousands of dumb cores that can't handle everything, meanwhile CPUs have a very small amount of smart cores and they are like a Swiss Army knife and can handle everything.

    • @singular9
      @singular9 2 роки тому

      @@saricubra2867 go be boring somewhere else nerd

  • @caribougoo349
    @caribougoo349 2 роки тому +2

    I'd think that GPUs existing and being powerhouses for parallel compute would also be a big factor making high core CPUs more of a niche use case. Probably important to acknowledge when a CPU is actually a good option for parallel tasks

  • @that.schamp
    @that.schamp Рік тому +1

    Some of the information in this presentation is valid and relevant, but the premise is bunk. Part of the problem is: are we talking cluster, computer, socket, or die?
    Clusters - able to apply large numbers of individual computers to a single task - broke the 1000 core mark in the early to mid 90's.
    For single computers, SGI broke 1000 cores with the SN/MIPS arch in the 90's, and their UV2 used Xeons in 4096 core single system images inside of 16k core shared memory systems.
    For both single die and single socket, UC Davis developed a Kilocore processor in 2016.
    It's not that we can't develop these systems - we've done it. They have limited utility, but there is still room for limited commercial success. You're just not going to find a 1000 cores in a general purpose desktop computer anytime soon, except in it's graphics card...

  • @jessepollard7132
    @jessepollard7132 Рік тому

    50 years ago there were smulti processors - which did exactly the same thing as a multi-core unit does. The limit then was about 5 processors as a max (mostly due to the memory contention limits you indicated). Some system got around the contention by using multiple memory busses - and it was up to the programmer (or schedulers or both) to avoid the contention by assigning each processor a different memory map (usually the map was in 64KB units but could be larger), thus allowing each memory bus to operate independantly without contention with other memory busses with a resulting much higher thoughput could be achieved. Some motherboards do have parallel memory busses (which tends to require memory chips to be installed in pairs.

    • @jessepollard7132
      @jessepollard7132 Рік тому

      YUP. I was Seymour Cray that figured out how to handle multiple processors optimally by using a crossbar switch in the Cray systems produced by Cray Research.

  • @WarningStrangerDanger
    @WarningStrangerDanger 2 роки тому +1

    More cores do mean more capable computers, as we all start multitasking harder and many continue to work from home. You need to stream video with a green screen, run a browser with tons of tabs, data-crunching software, video editing software, and still be able to have the agility to move large amounts of files around.

  • @Ryanisthere
    @Ryanisthere 17 днів тому

    as for all things in engineering, there are trade-offs for just building more
    such as cost, space, thermals, or other components not being able to keep up

  • @untermench3502
    @untermench3502 2 роки тому +1

    I started designing micro-controllers in the 80's when, in order to get anything done in a reasonable amount of time required writing code at the machine level where every clock cycle and subroutine had to be optimized.There was only one core and a tiny amount of memory, so everything had to be written as efficiently as possible, now, it's mostly high-level routines that offset inefficiency with speed of execution. Today, memory is cheap, processors are fast and cores are aplenty. There has to be a limit, though. It was fun back then.

    • @Dowlphin
      @Dowlphin 2 роки тому

      We have to stop the madness of perpetual growth, the capitalist-technocratic urge to push onwards with maximum force. It is an escapism to keep externalizing our power into crutches, a distraction from developing our human potential.
      We already have many explorations of the usefulness of old, timid, simple systems and technologies. Just making everything ever-more complex is an unhealthy obsession. It does not make us happier, but unhappier. And we see particularly alarming escalations in recent times. We should really learn to read the signs. Then the domain of the heart can flourish again.

  • @godnyx117
    @godnyx117 Рік тому

    I don't know why but it's MIND BLOWING to me that the first dual core CPU was released only in 2005. I would expect it to be released in the early-mid 90s....

  • @drakealex
    @drakealex Рік тому

    hey, thanks for putting this into laymen's terms, I recently started on my CPU journey. good content.

  • @mastershooter64
    @mastershooter64 Рік тому +1

    >This Is Why Computers Can't Have Thousands of Cores
    My GPU: _Pathetic_

  • @frenchmarty7446
    @frenchmarty7446 Рік тому

    For a given die size and transistor count, you have to balance:
    1.) More branch prediction and larger cache, things that every program takes advantage of by default.
    2.) More/faster I/O and memory bandwidth, which also consumes die space.
    3.) More pipelining/superscalar operations. Basically parallelism on a single core that programmers get for free.
    4.) More cores/threads, something that programmers have to intentionally design around, has memory overhead (locks), and has diminishing returns for most programs (Amdahl's law).

  • @jimwhelan9152
    @jimwhelan9152 Рік тому

    As a kernel developer and designer I claim that symmetric multiprocessing is just as easy and probably easier to do than asymmetric. I always found the communication required for one processor to "control" the others was much more complex than that required to keep the multiple kernel threads from interfering with each other.

  • @slurpieking4337
    @slurpieking4337 2 роки тому +1

    Honestly great video, very entertaining and informative. Good job!

  • @GIJew
    @GIJew Рік тому +1

    Wow all this time I thought my Intel i69420XXGY processor was the issue. Thanks!