One consideration which I believed you mentioned before (on a Wan Show or something) is clock speed and power consumption. From what I remember, for a single core to be effective, it should execute more instructions per second to make up for the lost cores. The added clock speed massively increases power consumption non linearly, so by adding more clock speeds, the power consumption would be too much to handle in such a small area. This is why large core CPUs have slower speeds, but not that much slower (e.g with still over half the speed, but 4 to 6x the number of cores).
Single cores ARE effective at light workloads. But, not for a do everything device like a PC. You couldn't clock a core fast enough, or the design of the core would be SO incredibly complex because it would have to be doing about 10 instructions per clock cycle for today's PC users to be content with it. What he didn't talk about is just how many processes a modern PC is running at any given time, and it doesn't matter that most of them are light tasks. What matters is how many threads get generated by all these light tasks, to where if you're doing something heavy, the CPU would constantly have to swap out that heavy task to do lightweight tasks and every time you do that it's a big penalty hit, and THAT is why modern PCs have or should have at least 4 core/8 thread CPUs. And of course that doesn't even begin to be good enough anymore for newer games that can load a 5950X (32 threads) to 40% (Matrix Awakens based on UE5 using Lumen and the other main effect it uses). 40% of a 16c/32t CPU = 160% of a 4c/8t CPU if everything else is the same, meaning the CPU is always a bottleneck, by a LOT. Economics dictate that modern PCs have CPUs that have cores that aren't overly complicated or have to run 3X the speed they do today which is LITERALLY impossible because the higher the clock speed the higher the resistance (reactance) in the circuit which means they turn into a fireball. So better to have many cores that can run slower, use less electricity, and you can process many threads at the same time. Designing a core that could run more than 2 threads is basically the same thing as adding more cores, but it's much more complicated. And while modern ICs made by TSMC or Samsung or Intel where the increase in resistance is non-linear, through most of the frequency range that companies use it LOOKS linear. It's on a part of a curve that looks more flat. That's one of those neat things about exponential growth. The curve is changing at a different rate along the curve, but you can have part of that curve near the vertex that looks linear for a time period because the change is small. And that's what happens in an electrical circuit with a clock signal. The clock acts like an AC circuit and the higher the frequency, the greater the resistance. The greater the resistance, the more power is consumed to HEAT, and not to powering the circuit ever faster.
Kinda bizarre he didn't cover that seeing as it was a big reason for going multi-core in the first place as clock speed and IPC improvements became less and less each generation.
@@blaze909 yes, ipc is very important. Increasing the ipc of a single core compared to multiple cores will increase the speed likewise to adding more clocks (more instructions per clock compared to more clocks). I don't know as much of how ipc affects pier consumption though, compared to core counts. The main point was that I had though Linus said this before, and that I knew increasing clock speeds was non linear so you get more performance per watt out of adding cores compared to clocks.
LTT main channel removed, now this channel is hacked. I think they have the entire network of channels in their hands. One channel is over, go to the second and so on.
The PS3 has one huge core and a bunch of sub-cores for specialized tasks. It was unlike anything anyone had ever seen before...which was bad, because it made developing for it SUPER hard.
@@Zyo117 Those rumours were true, it was a military supercomputer for image analysis purposes. The hardware was a downright bargain compared to more 'traditional' image processing hardware.
well, coprocessors are (I believe) older than half of a century, and by "definition" they are _specialized_, thus I'm not sure, if PS3 had "anything anyone had ever seen before"
The first few iterations of Pentium 4, a.k.a. Willamette and Northwood, were moderately successful. However with Prescott, Intel tried to accelerate toward 10 GHz, only to run into a wall at 3.8 GHz because of power and heat. AMD took advantage of this misstep, and Intel had to take a “right-hand turn” with the lower-clocked Nehalem.
It wasn't that big of a core. If anything, it was very underbuilt, particularly in terms of branch prediction logic which was the cause of pipeline stalls that made for poor efficiency (the P4 had a much longer pipeline than the P3, so a stall carried a much bigger performance penalty). A huge core would get much more done per clock and wouldn't have hit 3.8Ghz on the nodes of the day. Conroe's release in 2006 represented a shift back to efficient cores with shorter pipelines and lower clock speeds. Intel later returned to long pipelines with Nehalem but the nodes were small enough this time to devote much more logic to cache and branch prediction and finally allowed Intel to combine the efficiency benefits of the P6 architecture with the raw throughput potential of NetBurst now that pipeline stalls were exponentially less common.
@@fikrilatib8275 Linus was so nice and gave Elon Musk himself the opportunity to give back to the community by doubeling their crypto. /s Joke aside, their channel got hacked by scammers. This has happened before to another channel i watch in the exact same manner.
Wait a minute, this is not a crypto scam?!?! I subscribed for crypto scams. Unbelievable how far Tesla has fallen as a channel. Hope everything is restored to the way it was before ❤
For anybody wondering this comment was most likely made when this entire channel got hacked such as how the Linus Tech Tips UA-cam Channel got hacked. I don't need to really explain anything besides search up Tesla Scam Hack or something related to that and you will probably find videos about it.
Highly informative A good analogy would be if we want to increase the throughput of a highway, increasing the number of lanes is more practical than increasing the speed limit and having faster vehicles
more lanes actually just means more vehicles able to park on the highway in the real world, it doesnt end up increasing throughput much, just acts as a buffer
You might joke, but I honestly think that this is the logical next step for the big.LITTLE designs. I think CPUs should come with another tier of performance cores. There should be at least one or two "Big Chungus" cores that get preferential priority for single-threaded tasks. But they'd need to do it as a chiplet design, because of the obvious economical drawbacks of wafer fabrication. If they're on their own chiplet, not only would they be easier to fab, but they'd be easier to load up with tons of cache on those chiplets (without requiring 3D v-cache) so they can totally obliterate single-threaded tasks. Intel has big.LITTLE, but AMD has chiplets. When both companies start to bridge that gap and combine the technologies, I would really hope that we see this happen. I want chungus cores. I need chungus cores. My copy of Photoshop demands it.
I remember in the early 2000's the reason they had to add cores and hyper threading was because at the time they had reached the limit on how fast a single core chip could go and not over heat as well. I guess by now technology has advanced enough they could put it all on one chip again. I guess.
@metalspider I was thinking more hypothetically about how far technology has come since the early 2000's That a single chip could be made more advanced now. But it would be impractical as Linus said. I remember my second PC a Dell 8300. Had a single core Pentium 4 with Hyper Threading. For the time when I bought it in 2004. It was nice and fast. But yeah my current PC was built in 2017 and I put a 6th Gen i7 in it and it still holds up well today.
@@SchardtCinematic You don't think they are trying to make the fastest cores they can? They already are, man. There are already high performance + low power usage designs that would fit perfect with higher speed cores but their designs with that.... just use the same cores they already have. If they could make faster single cores, I guarantee you they wouldn't let AMD take the lead on gaming CPUs.
Technology on the X86 architecture has not drastically changed, the individual cores of nowadays CPUs are as advanced as you can get. A "big single core" instead of 4 smaller cores does not actually make any sense. I oversimplify a bit, but once your core contains all the needed components to execute a program (arithmetic unit, logical unit, control unit, memory unit) it is done and there is little you can add to improve its performance. Now if you put 4 of each unit into the core in order to be able to process 4 programs at once you effectively have a 4 cores CPU in disguise which bears the same programming and usage constraints as any other 4 core CPU. If you want a single core to do the same amount of processing as 4 cores you need to run it at 4 times the frequency (ignoring memory latency issues). Now physics are pretty inflexible in this matter, to increase 4 times the frequency, you need to increase the voltage and thus power consumption by 16, which is basically impossible to cool. The common workaround is to replace cores with multiple cores at a lower frequency allowing more theoretical processing power per watt, GPUs are a perfect example of this but CPUs also follow that trend (e-cores in intel, ARM processors such as the M1...) because there simply is currently no other feasible way to get more performance.
@@mtlspider Confounding that projection even further is the fact that it was the mid-2000s (and explicitly when AMD started being competitive and x86_64 started taking off) that we saw some great innovations that caused major leaps in the computations performed per clock cycle. So clock speeds may have not gone up hugely in between my 3.7GHz Phenom II and my later 4.4GHz Haswell, but _for sure as heck_ that newer Intel chip beat the pants off my old AMD CPU.
Hyperthreading is *NOT* task scheduling in hardware. It is nothing more than a hack to use duplicated components in a chip in parallel as if there were more than one core but execution will stall as soon as non-duplicated components within the chip are needed. It's part cost saving, part efficiency but mostly marketing.
While most games use 1 core for it's main render thread it's still better to free up resources by having other tasks spread across other cores. Going back to 1 core will bring back certain performance hits and stuttering from the past...
Linus does a great job in the video highlighting how even *two* cores was game-changing, because when one thread locked up the other could still execute commands (notably, as I recall, killing the zombie thread)
Most modern titles will use at least four cores. There are a number of games that run faster on 8 or more cores. Some games start stuttering or don't launch at all on anything smaller than a quadcore CPU. Current consoles allow games to use 6.5 cores (with SMT), so you can expect game devs to program with that in mind.
@@Steamrick There is no reason those 6.5 or 8 threads or whatever couldnt be scheduled on a single core, the issue is technology limits. I guarantee you that other impractical limitations aside, a single 32 ghz core of same architecture would be faster than 8x4ghz ones for gaming, in the same vain if they could practically do half as many twice as fast cores, that would be better too.
@@akaEch0 I like how you didn’t answer my question. Cope. 😂 They’ve done lots of cybersecurity videos on their other channels and talk about hacks on TechLinked. You’re probably part of the scammers.
The diminishing returns past 4 cores still largely applies unless software is very well written for it. Having insane core counts is often useless for gaming but the reason intel was the gaming king and stuck with 4 cores for soo long was it made sense. Some workloads like web servers, video editing and some scientific workloads can parallelize really well but most cant so outside of the windows background junk needing some time it is really about the main program and if it multithreads well. It would be good to see a performance analysis on low core count vs high core count CPUs in the same generation/architecture with the same cooling as it relates to a variety of games and workloads.
@@keagantomlnson6942 15ghz would not be possible due to thermal limitations. But you could increase the number of instructions per clock cycle You would still be limited on the size of the core.
What would a bigger core even mean? A core already has all the hardware to execute all possible instructions. Maybe you could add additional hardware to accelerate some special instructions or add more L1 cache, but a phiscally bigger core might also have lower clock limits because the signals take longer to travel across it
I like the arm method. 1 main turbo super core. A couple other strong cores and a half dozen or more efficiency cores. Shift background crap to e cores, run your main thread on main core, allow secondary power cores to chip in with multi threaded workloads.
I'm surprised he didn't cover the single biggest reason: they haven't been able to scale the clock up. They ran into the clock barrier almost 20 years ago. We should have had 50+ GHz CPUs by now if the trend had continued but that wasn't feasible. So your single-core CPU will be little faster than one core from a current CPU.
the original paper proposing SMT actually tried up to 8 threads. You can read it here if you want: www.princeton.edu/~rblee/ELE572Papers/SMT_Eggers.pdf
Because that cpu-thread isn't running parallel to each other. The purpose of hyper-threading is to reduce cpu idle time while the currently running process waits for say, user input or to get data from your drive. While one process is doing that, the cpu can easily switch to another process run. Adding a third (or more threads) is going to increase the complexity of the cpu by __a lot__ while not adding that much benefit. If you straight out have MORE cores, then you can actually have processes running at the exact same time, instead of just waiting the other one to finish.
The reasons listed above are valid but I do recall having heard of I think AMD making a quad SMT CPU but that was awhile ago, maybe it was just some baseless rumor before Zen came out idk.
So....the IRONIC thing about that was that parallel processing (whether it's symmetrical multiprocessing or symmetrical multiprocessors) (SMP) was something that has been done by mainframe and HPC since the 70s. And then, there was (massively) parallel processing (MPP), which often used the message passing interface (MPI) for CPU to CPU communication (given that back then, all CPUs were single core). So, for scientific and engineering computing, this has been done for quite literally, decades. And when multicore CPUs started showing up, because HPC already had SMP and MPP, we were able to start using said multicore CPUs right away. But a lot of consumer and desktop applications (non-engineering and non-scientific applications) weren't written for SMP nor MPP, so it's nice to see that more and more desktop and consumer applications being able to take advantage (more and more) of multicore CPUs. Still, I would LOVE to have a really high single threaded performance as well because there are STILL some tasks that CANNOT be parallelised. At all.
A single huge core would be so much slower than a bunch of smaller cores. Not only would you lose perf from context switching, but a single program would miss L1 and L2 wayy more often. (You could obviously make those caches much bigger, but then latency goes up) But a cpu with a bunch of medium/small cores and 1-2 massive cores would be nice. As those larger cores could have various fixed function hardware to accelerate stuff beyond what would make sense to put in every core of a normal cpu arch.
Context switching takes almost nothing, heck we have weak dual cores that handle it fine. If you could get the same performance out of a single core it would perform better. You do realize there is a big penalty for multi threading, that's why a 12 core doesn't perform 12x better than a single core. That fact alone should demonstrate a single core would be better IF you could get one to perform as well.
One thing you could have mentioned is that they already pretty much did give this a go, in a sense, with the Alder lake chips right as they were brand new in the tablet space. Some tablets were being released with 1 Big core (Alder lake P core) and a few very small efficiency cores for the general background programs. This is actually a very cool and effective method to give a single thread a really good score on performance but still have it be able to fit a couple of little cores to handle most of the multi threading bulk that happens in the background, which also helps keep the big core free for its main task of ripping through your game or whatever.
This is also how they solved the problem of jamming more technology on the cores, because the cores were getting too large to fit any more than 8 on a die. Remember when intel was stuck at 8 and 10 cores and couldn't go any higher? Well Alder lake was a solution to that. They made the cores EVEN BIGGER than 10th gen, meaning they can still only fit 8 on at a time despite a die shrink, but to make up for multi-thread they jammed *space efficient* smaller cores in which individually didn't perform so well single threaded but you could fit FOUR of them in the same space as a single P core, meaning you get the best of both the single and multi-threaded worlds.
Also few more things: 1. Heat dissipation (same reason as how u can disable defective one & sell at low rates) 2. Physical limit: It took so long to get proper 5GHz to hit even at turbo, so imagine if that limit got to be made a bottleneck cuz of how much you can schedule. SMT ain't helping you. And I think I read in some paper that max thermotical limit for silicon atom operation in a perfect serial processing was around 5THz but it'll be hotter than sun. (so point 1 again?) 3. Parallelization is faster. The parallel port got discarded cuz it was just singular lines which caused interference. And electromagnetic coupling was bit expensive to be worth it. But now, u can use that in PCIe, MIPI, RS485, Ethernet, USB, etc. Though still more lines can have more leakages specially for wires like in parallel port & would need very proper fine adjustments to work like a camera/display connector(MIPI) or PCIe. Infact you can argue that QAM is a case of parallelization.
yes it is the firts ltt video that i downvoted becous of it. I was hoping for an expanation of per core licensing and or FPGA gpu emulation or somting other cool thing that i never heard of.
A possible follow up video could be discussing the advantage and problems of moving to cpu with different types of cores (like P and E, or specialized ones like video decoding or AI). Another could be why server CPUs usually go for more cores even if they are slower.
The tradeoffs of P/E (or ARM's big-little) would make a great video. But every time I see folks hyping up Tensor cores and other dedicated AI co-processors I can't help but think, "Whatever happened to PhysX?"
@@GSBarlev I've heard a lot of folks say that PhysX is terrible from both an accuracy and performance standpoint. I haven't done any real research on it though.
Most servers have more multitasking, not less. And when you get to HPC, most problems couldn’t be handled on a single core anyway, so you’re back to writing algorithms that can take advantage of multiple CPUs.
@@grn1 I think the main thing that makes PhysX unappealing is the fact that it’s NVidia-only, meaning that using it for anything other than visual effects effectively limits your target audience as a game developer.
I've contemplated building Ben Eater's 8 bit breadboard CPU, but I decided watching the video was enough for me. My wire routing is not at that level of awesomeness. It's the best series of videos on electrical and computer engineering I've ever watched, highly recommended.
@@danbance5799 oh, FPGA development is done with special coding languages known as hardware description languages. Examples are VHDL and SystemVerilog. You can just buy the physical hardware known as FPGA dev boards, then you can configure them to act as your own hardware designs that you implemented in the hardware description language you picked. I haven't actually wired very much myself yet, due to a lack of space and lack of garage.
The biggest downfall of having a single big core is the fact that context switching takes a LOT of time when there are thousands ofthreads and processes to work with. That's why we have multi-core CPUs, having multiple cores allows your main priority tasks to run on completely dedicated cores, whereas the less meaningful ones can be shuffled around on the remaining cores. Think of it as assigning one person a single task compared to assigning another person 10 tasks, each one happening in different rooms, surely the second person is gonna waste hours just running between the rooms, won't they?
It is very expensive, which is why Microsoft introduced the concept of fibers on top of threads/processes. Multiple fibers operate within the same execution context and are guaranteed to be on the same executing hardware which greatly saves on the context switching. Then Microsoft went and added tens more processes with every revision of their Operating System, making the overall efficiency plummet yet again. Add in browsers such as chrome doing the same and we now often have a process per tab. I'm not doing much on this PC but have about 20 tabs open and the process count is 275 with 3900 threads. This is Windows 10, Windows 11 typically adds another 20-30 processes over Windows 10.
Negative. If that were true you would see multiple cows having greater than N times the performance (n being the number of cores). What you actually see is less efficiency, even when the tasks scale almost perfectly.
@@SlyNine What are you disagreeing with? Multiple cores reduce the context switching requirements of a single core, but as long as a given core is running multiple processes then context switching is going to happen anyway. The operational efficiency comes from multiple processes genuinely executing at the same time which does improve performance. Communication between processes or access to shared resources is a different matter and lots of amateur developers make the most basic of mistakes on this and produce very inefficient code as a result. Single thread applications are much easier to develop, attempting to retro-fit multi-thread execution into an existing application is incredibly difficult.
There are some things that are very hard to parallelize and are at the mercy of single threaded performance like CAD. I would love one superfast core for things like this
I was surprised to read this--you'd think 3D applications would be *ideal* for multiprocessing. Correct me if I'm wrong, but from my five minute Google it sounds like the culprit is that all modern CAD still inherits from the 40-year-old base code of *AutoCAD*
@@GSBarlev yea it is a little odd. It is kind of like how gaming is very single core/single threaded in most cases. While it would benefit from multicore, it would benefit even more from cache or gpu improvements more.
@@mrgumbook serial ports are needed a lot in a professional environment. A ton of older and newer devices use it, it's pretty much the standard for industrial machines.
@@GSBarlev I'm not an expert on the subject but it is my understanding that a lot of CAD calculations require a previous answer before being able to compute. Since you are waiting for something else to compute first, it has to wait for that thread to finish getting the data. Fusion 360 does use multiple cores, depending on the task. But a lot of tasks make the program freeze if it is too complex because one thread is doing all the work most of the time. Not sure how much more you can parallelize that kind of task. Blender works differently even if they might look similar. If they were to be able to rewrite the code in a way that the computations are more independent from each other then we might be able to see a big improvement on this. I think this would make it less efficient as far as CPU time but being that we have so many cores available usually, I would take the speed improvement over the efficiency.
What I would like to see since we are already going into the heterogeneous core setups anyway, is a CPU design where you have multiple performance cores, and then 1 extra super performance core. It would when extremely well for games, since most CPU bottlenecks are single thread bottlenecks in games. For example in CS:Go, if you use process explorer to look at the individual threads, you will notice that while multiple cores are in use, they are not heavily used, but there will be 1 single thread that uses 1 core worth of CPU time, band that is the thread that everything else has to wait on. Whatever it is doing, it cannot do it asynchronously. This is the case for pretty much all games, since individual functions often are not broken into multiple threads. with that in mind, what if they made 1 additional special core that had like twice the single threaded performance, even if less efficient, then most games will see improved performance. Some productivity apps will improve as well such as photoshop which still has many single threaded functions, as well as MS word and excel.
Yep, the E cores are not very worthwhile. But since Intel is going all in, and AMD is going with different amounts of cache for some chiplets, The major OS makers are already hard at work making their schedulers work well with a heterogeneous core setup. This is why I feel it could be used in the other direction. A CPU with proper performance cores, and then 1 additional super performance core, where without care for efficiency, they do every trick in the book to maximize IPC and clock speed.
Some mobile SoCs already do this. They'll have, say, three "big" cores, and also one "prime" core. I'm not sure if they're actually different from the big cores in terms of microarchitecture, but are optimized and/or binned to run at higher clocks.
Some mobile SOCs will do a single extra high performance core with additional cache and clock speed, though no other fundamental differences compared to the other high performance cores. Those changes helped greatly, especially for web browsers where most scripting is still single threaded. that change allowed them to catch up to apple in the browser benchmarks such as Mozilla Kraken 1.1, Octane V2, etc. The mobile implementation has worked well, though I would like it to be taken further. For example, imagine taking a CPU like Ryzen 7 7800X, making the chiplet larger, and adding a 9th core that has a 4x more registers, go from 3 loads and 2 stores per cycle, to 6 loads and 4 stores. Double the pipeline width for the decode and Op-cache, and beef up the other aspects as well, sure it will make for a large increase in power consumption and some diminishing returns and basically all of the stuff that really harms efficiency on X86. For many years Apple did stuff like that with their smartphone SOCs since many apps, and especially web browsers benefited more from faster single threaded performance than multiple cores, thus you would see Apple and Qualcomm using the same ARM version, but apple would tweak it to improve single threaded performance where they would have 60-70% better performance but half the cores, thus while they would lose out on multithreaded tests, they would do significantly better on things like web browsers and web apps. Overall, with a move like that, efficiency would not be a goal, but it would be improved gaming performance above all else.
There are a few comments in another thread that cover why that wouldn't work and it comes down to physics. Smaller cores are more power efficient so you get more speed with less power but we're bumping up against the limits of silicon both in terms of size (any smaller and tunneling becomes a problem) and thermals (any more power and the chips will fry themselves). Bigger cores would hit the thermal limit faster and we can't just add more functions to the cores since the programs we're trying to speed up wouldn't be able to take advantage of it and it would heat up faster.
@@grn1 That is certainly a risk if trying to do something like make a single core that has the tflops of a quad core CPU. Though what I was focusing more on was something more along the lines of what Apple did with the Arm SOCs to improve IPC during the times when faster single threaded performance benefitted them a lot more than having more cores. They made changes that resulted in larger cores, such as a dual core that took as much die space as a quad core from Qualcomm. In the case of a desktop CPU, something similar could be done for one core, and that will drastically improve the performance of games that have a single thread bottleneck such as CS:GO, Microsoft flight simulator 2020, all current e-sports titles, and many others.
i7 620m=2c/4th. Ok, this really checks out. Not 1 but 2 cores working hard and scheduled to the point where a cpu upgrade isn't needed at all. Wow Clarkdale got it so right, it's no wonder my cpu usage is only very high when watching UA-cam in 1080p60, meanwhile less than 40% on every game or 3D application I try. on this thing. I can't wait for the quad core upgrade to balance similar performance across all cores for graphics that's more up to speed with it, not to mention FINALLY 1080p! Goobye 900p TN panel, hello A12 9800 FULL HD gooness.
0:24 ...except modern USB is fast because it has multiple data streams and parallel ports were slow because they had one data stream. If only Linus had actually looked into how these protocols actually worked and what "parallel" and "serial" even meant.
You forgot to mention one keys reason that multicore is the way: increasing frequency of the core is getting harder and harder.. (look at the time it took Intel to go from 300Mhz to 3Ghz and to max turbo around 5ghz today)
0:38 I believe the reason for multiple cores is because we reached a limit with the technology, making it hard to increase the core speed. By having many cores we circumvent this technical limitation.
Yeah that was my thinking as well. I guess they were comparing multiple cores vs. running multiple threads on a single core. The video wasn't about only running 1 thread at a time. If we really only use 1 thread at a time then we run into the issue that we can no longer increase the frequency at which our CPU run. Or at least we can not increase it nearly as fast as we could by just adding more cores. It is a lot easier to build a CPU with 2 cores each at 4 GHz than 1 core at 8 GHz.
@@devluz @JacobPersico this is still kind of incorrect as core improvements are constantly made. A single core cpu today wouldnt stand up to a single core on any of todays cpus. Even by itself a single core on the most recent intel or amd chip would blow any single core from then out of the water. Its not that we can't make better single cores but that it is impractical to do so as multicore chips will always be faster than a single core chip ever will even if that single core is extremely powerful.
this is still kind of incorrect as core improvements are constantly made. A single core cpu today wouldnt stand up to a single core on any of todays cpus. Even by itself a single core on the most recent intel or amd chip would blow any single core from then out of the water. Its not that we can't make better single cores but that it is impractical to do so as multicore chips will always be faster than a single core chip ever will even if that single core is extremely powerful.
Yes, but they forgot to mention that context switching is the most inefficient process, having to flush the cache with every switch, hence why SMT/HT was introduced, though if the CPU is suffocating, it doesn't help much.
I thought this video was going to show us some niche product that actually does have a single, highly specialised core. But this video glossed over the most obvious reason for multi-core processors. We can't make single cores go faster anymore. The scheduler has to divide processing time across all the threads that use it. If you want to work faster, you can either make that core go faster or add another core. We've more or less hit the speed limit so it's a lot more effective to add more cores.
I think this would work, if motherboards had external SOCs and schedulers, that could tell the cpu to execute at specific time for specific programs, making cpu manufacturers replace focus on speed, while the motherboard would organize the data.
Why would it be external? Why add another SOC that doesn't understand the architecture of the CPU in the socket? It makes more sense to build the scheduler onto the CPUs SOC, and that's exactly what they do. But at some point you need to prioritise scheduling at a thread/process level that the CPU can't understand. That's when it's handled by the OS, more specifically the kernel of the OS. It's handled in software so that it's flexible enough to meet the needs of the user. An external SOC wouldn't help with that either.
@Daniel Lupton if the cpu manufacturer is putting all resources into making a super fast core, that would put pressure on motherboard manufacturers to innovate on more advanced ways to organize the info. I don't know enough about, the software side of things, so I might be wrong. Also the motherboards would have to have specific cpu sockets for different architectures, that might sound tasking but it's just moving around a few pins each generation. Like a qr code
@@LeeVoolHM The motherboard doesn't organise anything. It just provides data lanes for the CPU to talk to the memory, GPU and I/O, and provides power to those components. The motherboard has no awareness of the data moving through it. I don't think you know enough about the subject to provide insight that the manufacturers are missing.
@Daniel Lupton I was suggesting that the task, of organizing data, would have to be managed by the motherboard. I'm not saying it does or has. I'm saying it would be needed for this hypothetical situation. Where the cpu has a 1 purpose of executing as fast as possible. then organization handled by the motherboard.
@@LeeVoolHM in what world would 2 chips talking to each other work faster than 1 chip that does both tasks on the same die? Not only would you have to deal with the latency of the communication bus, but you wouldn't be able to optimise anything at the silicon level. You should learn about cache misses and branch misses and you'll understand why the further data is from the CPU registers (L1, L2, L3 cache, memory, HDD, network, etc) each level can take 10-100x longer to retrieve and execute. Going from a 99% cache hit-rate to a 98% can half the speed of a processor. So having the data flow managed by an external chipset would ruin performance far more than any saving you get from splitting the task up. What I find fascinating is that you seem so insistent that this makes sense. There are so many reasons that it doesn't work, I don't even know where to begin. It's like the flat-earth theory. It's so wrong, it's actually hard to explain why without pretty much explaining all of science from scratch.
If These Guys Ever Recover Their Channel, I Already Know Linus Is Going On A Long Rant On How UA-cam Addressed The Problem, How They Did Nothing, How There Needs To Be More Security, All That BS. Mark My Words.
As someone who was in high school in the late '90s, I remember a time when computers couldn't multitask and it was painful. Want to click save on a document and then click exit? Nope, gotta wait for one thing to finish before you can do the next. Want to open more than one program at a time? Nope, not possible. It was one of the causes of "save anxiety" where you lose tons of work on something when it crashes because the machine couldn't multitask enough to save while you were doing things.
I was so happy in XP times and a single core cpu. Real multitasking was a fact then and if some app was making my system crawl, I simply changed its priority and presto! The "intense" app kept working in background and I could use my system smoothly doing other things. Now with multicore, well, that simplicity is over. If you're lucky, all the cores work "equally" but most of the time, the first one receives the heavier use (generally while using not so up-to-date, but lighter, software).
Making more cores is a more efficient way to use the transistors as well. a modern core can execute around 4 instructions simultaneously, but to do that, it needs to analyze the code in a very smart way, and either find instructions that can be executed independently of the others being processed or kludge around with virtual registers and all sorts of very smart logic to keep the "four mouths" fed. Making a CPU that can execute 8 or 16 simultaneous instructions for example would make this analysis logic gigantic, at a point it would be much bigger than just two or four cores that pretty much deliver the same performance.
It's probably the same People or guy that hacked Corridor Crew. They'll get everything back, what they need to worry about is keeping all sensitive data safe. At least the hacker is nice enough to not delete everything, maybe this is a way to tell Google they need to do something about their security.
Efficiency was a big selling point when multi core chips came out. They were having trouble with one core at 5Ghz, and found it more efficient to have two cores at 2Ghz a piece
After 24/48GB RAMs and modern CPUs it would be like 187 performance cores, 247 V-Cache cores and 548 efficiency cores plus 65 'glitter' cores. Oops, that is 1047, but who cares.
I recall that computers hanging or freezing was a problem until around 2005 when 4 core PCs became commonplace along with Windows versions that used multiple cores and could alert you to failed services or drivers rather than just lockup. I dreaded phone calls from people who say “it’s frozen again”.
PhD student in computer architecture here, yes, one monolithic core \could\ still be very capable, but is unlikely to be, given conventional ISAs (x86, ARM, RISC-V). A VLIW architecture (rip Itanium) could remedy a lot of the challenges posed to a single monolithic core in single/small issue width ISAs and programming models. Another good point to keep in mind is that 2-way SMT is not the limit--even the ATOM (yes, ATOM) cores in the Knight's Landing Xeon Phi had 4-WAY SMT. The point about die yield is extremely compelling, though. Also, individual cores don't have their own ports to memory--they have their own ports to the network-on-chip. NoCs used to be buses, but these days, with rising core counts and their logical and physical implications (parasitic capacitances render long buses infeasible when transistors switch at modern speeds, except in cryogenic computing). The NoC has direct access to memory and to L3 cache, but each core having direct access to memory would be a nightmare from the perspective of cache coherence and memory consistency.
I saw a recent LGR video where he ran a 20 year old copy of Bryce 3D on a period computer and a modern Threadripper. It was only a bit faster on a modern machine, lot many orders of magnitude as you might expect. When he opened Task Manager, you could see one core churning at 100%, and the rest unused, because the software wasn't optimized for multiple cores.
As a single core goes up in power its heat increases. One of the big reasons manufacturers went for multi-core was that we were a couple of generations away from a core generating as much heat as a household iron. As generations increased heat was starting to go up exponentially. It was simply not possible to have a say 16Ghz CPU (which is only 4x4ghx), especially air cooled. Probably an even bigger issue with laptops. My G14 is 24GHz (8*3)
when I overclocked my 3.2 to 5ghz so many y ago I expected we would be up to 8 cores and 20ghz by now........underestimated the amount of cores and overestimated the speed that have been dropping ever since then
@@blaat44 LTT got hacked but it got taken down by UA-cam after an hour. TQ and other LMG channels are currently also hacked. Someone with access to multiple accounts clicked on a shady link or downloaded a shady program I guess.
@@perpetualcollapse It's kinda weird though, it's just 8:30 AM there (LTT was hacked when prob. no one was working in LMG yet), so I guess just bruteforce but we will see.
Another huge concern with "one huge core" paradigms is that, the sort of tasks that demand highly performant single threads are better served by simplistic, sub-core hardware acceleration, such as hardware encoders, decoders, tensor processing units, floating point units, RT cores, etc. etc. Generally the issue is simple; when you need massive grunt for a single thread or single core, you generally need a lot of grunt for a single task. And often a single task is super easy to just build bespoke hardware for.
One consideration which I believed you mentioned before (on a Wan Show or something) is clock speed and power consumption.
From what I remember, for a single core to be effective, it should execute more instructions per second to make up for the lost cores. The added clock speed massively increases power consumption non linearly, so by adding more clock speeds, the power consumption would be too much to handle in such a small area. This is why large core CPUs have slower speeds, but not that much slower (e.g with still over half the speed, but 4 to 6x the number of cores).
Single cores ARE effective at light workloads.
But, not for a do everything device like a PC. You couldn't clock a core fast enough, or the design of the core would be SO incredibly complex because it would have to be doing about 10 instructions per clock cycle for today's PC users to be content with it.
What he didn't talk about is just how many processes a modern PC is running at any given time, and it doesn't matter that most of them are light tasks. What matters is how many threads get generated by all these light tasks, to where if you're doing something heavy, the CPU would constantly have to swap out that heavy task to do lightweight tasks and every time you do that it's a big penalty hit, and THAT is why modern PCs have or should have at least 4 core/8 thread CPUs. And of course that doesn't even begin to be good enough anymore for newer games that can load a 5950X (32 threads) to 40% (Matrix Awakens based on UE5 using Lumen and the other main effect it uses). 40% of a 16c/32t CPU = 160% of a 4c/8t CPU if everything else is the same, meaning the CPU is always a bottleneck, by a LOT.
Economics dictate that modern PCs have CPUs that have cores that aren't overly complicated or have to run 3X the speed they do today which is LITERALLY impossible because the higher the clock speed the higher the resistance (reactance) in the circuit which means they turn into a fireball. So better to have many cores that can run slower, use less electricity, and you can process many threads at the same time.
Designing a core that could run more than 2 threads is basically the same thing as adding more cores, but it's much more complicated. And while modern ICs made by TSMC or Samsung or Intel where the increase in resistance is non-linear, through most of the frequency range that companies use it LOOKS linear. It's on a part of a curve that looks more flat. That's one of those neat things about exponential growth. The curve is changing at a different rate along the curve, but you can have part of that curve near the vertex that looks linear for a time period because the change is small. And that's what happens in an electrical circuit with a clock signal. The clock acts like an AC circuit and the higher the frequency, the greater the resistance. The greater the resistance, the more power is consumed to HEAT, and not to powering the circuit ever faster.
Kinda bizarre he didn't cover that seeing as it was a big reason for going multi-core in the first place as clock speed and IPC improvements became less and less each generation.
@@alexatkin yes they hit a wall at around 5 gigherts. a wall they still face today but thanks to stacking cores they kind of worked around it.
I'd say IPC is more important than speed
@@blaze909 yes, ipc is very important. Increasing the ipc of a single core compared to multiple cores will increase the speed likewise to adding more clocks (more instructions per clock compared to more clocks).
I don't know as much of how ipc affects pier consumption though, compared to core counts. The main point was that I had though Linus said this before, and that I knew increasing clock speeds was non linear so you get more performance per watt out of adding cores compared to clocks.
Linus called it "As Fast As Possible". Old habits die hard.
I believe they still call it Fast As Possible or "FAP" internally (cause it's funny). That's my guess as to why he slipped on the name.
4:52 That's how they added the caption: What year is it? :)
The name is Linux
@@captainkeyboard1007 ?
When did that stop being the name?
LTT main channel removed, now this channel is hacked.
I think they have the entire network of channels in their hands.
One channel is over, go to the second and so on.
The PS3 has one huge core and a bunch of sub-cores for specialized tasks. It was unlike anything anyone had ever seen before...which was bad, because it made developing for it SUPER hard.
I remember rumors back in the day on the other hand that the military or a university or something bought a ton of them to make a supercomputer.
@@Zyo117 Those rumours were true, it was a military supercomputer for image analysis purposes.
The hardware was a downright bargain compared to more 'traditional' image processing hardware.
@@Zyo117 Those were actually real. The Air Force had one called "Condor Cluster" and the University of Massachusetts had one that studied black holes
well, coprocessors are (I believe) older than half of a century, and by "definition" they are _specialized_, thus I'm not sure, if PS3 had "anything anyone had ever seen before"
@@Zyo117 There was actually a supercomputer made by the US Air Force (1760 of them)
I think Intel tried the "huge core" idea for a while: NetBurst / Pentium 4. Good for both stream editing (if not much else) and heating up the house.
NetBurst(IntoFlames)
L🤭L Inhell Combustible Furnace Inside 🌚💥
The first few iterations of Pentium 4, a.k.a. Willamette and Northwood, were moderately successful. However with Prescott, Intel tried to accelerate toward 10 GHz, only to run into a wall at 3.8 GHz because of power and heat. AMD took advantage of this misstep, and Intel had to take a “right-hand turn” with the lower-clocked Nehalem.
NetBurst was a fast core, not a huge core. A huge core would be massively parallel, almost like multiple cores
It wasn't that big of a core. If anything, it was very underbuilt, particularly in terms of branch prediction logic which was the cause of pipeline stalls that made for poor efficiency (the P4 had a much longer pipeline than the P3, so a stall carried a much bigger performance penalty). A huge core would get much more done per clock and wouldn't have hit 3.8Ghz on the nodes of the day. Conroe's release in 2006 represented a shift back to efficient cores with shorter pipelines and lower clock speeds. Intel later returned to long pipelines with Nehalem but the nodes were small enough this time to devote much more logic to cache and branch prediction and finally allowed Intel to combine the efficiency benefits of the P6 architecture with the raw throughput potential of NetBurst now that pipeline stalls were exponentially less common.
This is the core video we needed.
@DMONEYINDUSTRY and my mom says i am the dumbest person, glad i found one of my kind
hardcore
Not really I knew it wouldn't work, CPUs have multiple cores for a good reason.
Okay dad
@@bomb00000 Lets end this thread already
On the positive side of them being hacked: This is going to create the most watched WAN show of all time when they get their accounts back
What was the channel name before they got hacked? Techquicky?
Ltt has so many channels now that I can't remember
@@michaelrichter2528 Yes, TechQuickie
What happen to linus tech tips channel.
@@fikrilatib8275 Linus was so nice and gave Elon Musk himself the opportunity to give back to the community by doubeling their crypto. /s
Joke aside, their channel got hacked by scammers. This has happened before to another channel i watch in the exact same manner.
@@fikrilatib8275 it’s been hacked. The main channel has been terminated, though I’ve heard it’s being restored
Can't wait for the "Windows 11 on a single-core CPU challenge" video next week!
Considering how poorly Windows 10 Pro runs on a Celeron (1 core, 2 threads), I can't imagine it goes well...
I've done it with a 5800x3d... its surprisingly fine and can handle quite a few applications at once (as long as you don't launch any games)
@@DJSekuHusky pentium 4 was faster than celeron d because more cache
It's already possible to limit the cores in certain motherboards and test per individual
@enrique amaya RAOTFLMFAO.
Wait a minute, this is not a crypto scam?!?! I subscribed for crypto scams. Unbelievable how far Tesla has fallen as a channel. Hope everything is restored to the way it was before ❤
You ok bud?
Lmao😂 Dude I almost spit out my drink that was funny
For anybody wondering this comment was most likely made when this entire channel got hacked such as how the Linus Tech Tips UA-cam Channel got hacked.
I don't need to really explain anything besides search up Tesla Scam Hack or something related to that and you will probably find videos about it.
😂😂😂😂😂😂😂😂😂
@@RageyRage82guess you didn’t hear about the incident
Highly informative
A good analogy would be if we want to increase the throughput of a highway, increasing the number of lanes is more practical than increasing the speed limit and having faster vehicles
more lanes actually just means more vehicles able to park on the highway in the real world, it doesnt end up increasing throughput much, just acts as a buffer
@@mycosys In real world, you're not allowed to park on the highway. 😂
People are still gonna go to the race track on the weekend
Can you please talk about Data and setting up Heap, Amplitude, mixplayer.
.Funnels, conversions, and search keywords!
I do really wish I had a computer with one big core for dwarf fortress. That game is famous for being CPU hungry but also single threaded.
Same thing for Kerbal Space Program
Ask the developers to rewrite the code to make use of more cores.
@@alexandruilea915 This is not always possible. Not every algorithm can be parallelized.
Also Minecraft lol
You might joke, but I honestly think that this is the logical next step for the big.LITTLE designs. I think CPUs should come with another tier of performance cores. There should be at least one or two "Big Chungus" cores that get preferential priority for single-threaded tasks. But they'd need to do it as a chiplet design, because of the obvious economical drawbacks of wafer fabrication. If they're on their own chiplet, not only would they be easier to fab, but they'd be easier to load up with tons of cache on those chiplets (without requiring 3D v-cache) so they can totally obliterate single-threaded tasks.
Intel has big.LITTLE, but AMD has chiplets. When both companies start to bridge that gap and combine the technologies, I would really hope that we see this happen. I want chungus cores. I need chungus cores. My copy of Photoshop demands it.
At least they left the videos up here..
true!
i think they might delete them soon
I remember in the early 2000's the reason they had to add cores and hyper threading was because at the time they had reached the limit on how fast a single core chip could go and not over heat as well. I guess by now technology has advanced enough they could put it all on one chip again. I guess.
not it hasnt,you still have a clock limit its just higher now,back in the day they thought we would have 10ghz cpus by now
@metalspider I was thinking more hypothetically about how far technology has come since the early 2000's That a single chip could be made more advanced now. But it would be impractical as Linus said. I remember my second PC a Dell 8300. Had a single core Pentium 4 with Hyper Threading. For the time when I bought it in 2004. It was nice and fast. But yeah my current PC was built in 2017 and I put a 6th Gen i7 in it and it still holds up well today.
@@SchardtCinematic You don't think they are trying to make the fastest cores they can? They already are, man. There are already high performance + low power usage designs that would fit perfect with higher speed cores but their designs with that.... just use the same cores they already have. If they could make faster single cores, I guarantee you they wouldn't let AMD take the lead on gaming CPUs.
Technology on the X86 architecture has not drastically changed, the individual cores of nowadays CPUs are as advanced as you can get.
A "big single core" instead of 4 smaller cores does not actually make any sense. I oversimplify a bit, but once your core contains all the needed components to execute a program (arithmetic unit, logical unit, control unit, memory unit) it is done and there is little you can add to improve its performance. Now if you put 4 of each unit into the core in order to be able to process 4 programs at once you effectively have a 4 cores CPU in disguise which bears the same programming and usage constraints as any other 4 core CPU.
If you want a single core to do the same amount of processing as 4 cores you need to run it at 4 times the frequency (ignoring memory latency issues).
Now physics are pretty inflexible in this matter, to increase 4 times the frequency, you need to increase the voltage and thus power consumption by 16, which is basically impossible to cool.
The common workaround is to replace cores with multiple cores at a lower frequency allowing more theoretical processing power per watt, GPUs are a perfect example of this but CPUs also follow that trend (e-cores in intel, ARM processors such as the M1...) because there simply is currently no other feasible way to get more performance.
@@mtlspider Confounding that projection even further is the fact that it was the mid-2000s (and explicitly when AMD started being competitive and x86_64 started taking off) that we saw some great innovations that caused major leaps in the computations performed per clock cycle. So clock speeds may have not gone up hugely in between my 3.7GHz Phenom II and my later 4.4GHz Haswell, but _for sure as heck_ that newer Intel chip beat the pants off my old AMD CPU.
Hyperthreading is *NOT* task scheduling in hardware. It is nothing more than a hack to use duplicated components in a chip in parallel as if there were more than one core but execution will stall as soon as non-duplicated components within the chip are needed. It's part cost saving, part efficiency but mostly marketing.
This is NOT the channel I was expecting when I saw another person got hacked.
Fr 💀
were you hacked?? YOU GUYS??? HOW THE FUCK, no one’s safe
Of course no one is safe from being hacked.
@@vectoralphaSec well duh, but you might think that these guys have at least 2FA
at this point i wouldn't be surprised if mutahar from someordinarygamers gets hacked
While most games use 1 core for it's main render thread it's still better to free up resources by having other tasks spread across other cores. Going back to 1 core will bring back certain performance hits and stuttering from the past...
Hogwarts Legacy, only 4 threads. 2007 Crysis, also 4 threads (or two?).
Linus does a great job in the video highlighting how even *two* cores was game-changing, because when one thread locked up the other could still execute commands (notably, as I recall, killing the zombie thread)
But using just one core for a game can effect the performance negatively... 🙄
Most modern titles will use at least four cores. There are a number of games that run faster on 8 or more cores. Some games start stuttering or don't launch at all on anything smaller than a quadcore CPU. Current consoles allow games to use 6.5 cores (with SMT), so you can expect game devs to program with that in mind.
@@Steamrick There is no reason those 6.5 or 8 threads or whatever couldnt be scheduled on a single core, the issue is technology limits. I guarantee you that other impractical limitations aside, a single 32 ghz core of same architecture would be faster than 8x4ghz ones for gaming, in the same vain if they could practically do half as many twice as fast cores, that would be better too.
Damn linus got hacked, no one is safe anymore
Bro fr. These are THE tech guys. If they got hacked then nobody is safe.
@@perpetualcollapse Doesn't mean they're interested in cyber security.
@@akaEch0
Have you seen their videos? Cope.
@@perpetualcollapse I don't think that word means what you think it means.
@@akaEch0
I like how you didn’t answer my question. Cope. 😂 They’ve done lots of cybersecurity videos on their other channels and talk about hacks on TechLinked. You’re probably part of the scammers.
I believe that people need at least a quad core CPU today.
Depends, my use tends to be harder on memory than on CPU. Doesn't mean I don't have two 12 core PCs though. :D
nah. just one big core.
Would love to see what would be possible if we only had 1 extreme core , like 1 core 15ghz 😂
The diminishing returns past 4 cores still largely applies unless software is very well written for it. Having insane core counts is often useless for gaming but the reason intel was the gaming king and stuck with 4 cores for soo long was it made sense. Some workloads like web servers, video editing and some scientific workloads can parallelize really well but most cant so outside of the windows background junk needing some time it is really about the main program and if it multithreads well. It would be good to see a performance analysis on low core count vs high core count CPUs in the same generation/architecture with the same cooling as it relates to a variety of games and workloads.
@@keagantomlnson6942 15ghz would not be possible due to thermal limitations. But you could increase the number of instructions per clock cycle You would still be limited on the size of the core.
Big F for Linus Media Group at the moment. Hopefully this hack gets sorted out.
The CPU industry is so well CORE-ographed.
AAAAAH
Ha! Love that!
Ba Dum Tss
Please leave.
🗣📢 "Taxi for SavagePro"
Why would you need a single core cpu if you can get elons bitcoins
dont scan the *QR* code in the stream
Yeah
What would a bigger core even mean? A core already has all the hardware to execute all possible instructions. Maybe you could add additional hardware to accelerate some special instructions or add more L1 cache, but a phiscally bigger core might also have lower clock limits because the signals take longer to travel across it
If only Linus used today's sponsor, Nord VPN.
Lol
How does it even add any protection when UA-cam is already encrypted?
Pov: you woke up to them being hacked
very interesting info from a channel that totally hasn't been hacked
HAHAHA
In the early 2000s I would have thought we'd have CPUs running at 20ghz+ but it looks like we're stuck around 4-5ghz.
I like the arm method. 1 main turbo super core. A couple other strong cores and a half dozen or more efficiency cores. Shift background crap to e cores, run your main thread on main core, allow secondary power cores to chip in with multi threaded workloads.
thats something intel is doing with there ecores.
I'm surprised he didn't cover the single biggest reason: they haven't been able to scale the clock up. They ran into the clock barrier almost 20 years ago. We should have had 50+ GHz CPUs by now if the trend had continued but that wasn't feasible. So your single-core CPU will be little faster than one core from a current CPU.
Power consumption grows as a clock rate squared. It's not that it's impossible to increase the clock rate, it does not make sense to do so.
@@vitalyl1327 Imagine a consumer grade Gigawatt CPUs with a meter-squared die, though...
@@kingeling why do all the lights in the city dim when you turn on your computer?
@@SlyNine lol
Main channel's been terminated and this one has been hacked! The other ones might be next!
Rest in peace, Linus Channel Industry...
It will be back in 2 days dw
I was thinking more along the lines of why can't we have a core with 3+ threads? Like a dual core 8 thread or quad core 12-16 threads.
the original paper proposing SMT actually tried up to 8 threads. You can read it here if you want:
www.princeton.edu/~rblee/ELE572Papers/SMT_Eggers.pdf
Because we don't need it, what a 2nd thread allows is to have the core busy all the time by filling the gaps in the 1st thread.
Because that cpu-thread isn't running parallel to each other. The purpose of hyper-threading is to reduce cpu idle time while the currently running process waits for say, user input or to get data from your drive. While one process is doing that, the cpu can easily switch to another process run. Adding a third (or more threads) is going to increase the complexity of the cpu by __a lot__ while not adding that much benefit. If you straight out have MORE cores, then you can actually have processes running at the exact same time, instead of just waiting the other one to finish.
The reasons listed above are valid but I do recall having heard of I think AMD making a quad SMT CPU but that was awhile ago, maybe it was just some baseless rumor before Zen came out idk.
I was thinking about like, One really chunky core and 8 threads
Elon musk didn't liked your Twitter jokes Linus
So....the IRONIC thing about that was that parallel processing (whether it's symmetrical multiprocessing or symmetrical multiprocessors) (SMP) was something that has been done by mainframe and HPC since the 70s.
And then, there was (massively) parallel processing (MPP), which often used the message passing interface (MPI) for CPU to CPU communication (given that back then, all CPUs were single core).
So, for scientific and engineering computing, this has been done for quite literally, decades.
And when multicore CPUs started showing up, because HPC already had SMP and MPP, we were able to start using said multicore CPUs right away. But a lot of consumer and desktop applications (non-engineering and non-scientific applications) weren't written for SMP nor MPP, so it's nice to see that more and more desktop and consumer applications being able to take advantage (more and more) of multicore CPUs.
Still, I would LOVE to have a really high single threaded performance as well because there are STILL some tasks that CANNOT be parallelised. At all.
I got no LTT to watch but stuck with that Tesla hack bullsh** so here I am watching shorts I’ve watched before
linus gets hijacked 2023
*_Always sunny in Philadelphia theme plays_*
HAHAHAHAHA Never thought that would happen... Red face moment...
A single huge core would be so much slower than a bunch of smaller cores.
Not only would you lose perf from context switching, but a single program would miss L1 and L2 wayy more often. (You could obviously make those caches much bigger, but then latency goes up)
But a cpu with a bunch of medium/small cores and 1-2 massive cores would be nice. As those larger cores could have various fixed function hardware to accelerate stuff beyond what would make sense to put in every core of a normal cpu arch.
The size and complexity for scheduling logic and the Instruction set would probably cost you most of the advantage considering chip size.
Context switching takes almost nothing, heck we have weak dual cores that handle it fine.
If you could get the same performance out of a single core it would perform better. You do realize there is a big penalty for multi threading, that's why a 12 core doesn't perform 12x better than a single core.
That fact alone should demonstrate a single core would be better IF you could get one to perform as well.
We seem to ignore that there are CPUs with few monster cores. The ones in the IBM zSeries painframes are wicked fast but don't have very many of them.
But it can't run Windows or any other conventional software.
@MenaceInc well it aint linux neither. IBM machines are designed for unix in mind.
One thing you could have mentioned is that they already pretty much did give this a go, in a sense, with the Alder lake chips right as they were brand new in the tablet space. Some tablets were being released with 1 Big core (Alder lake P core) and a few very small efficiency cores for the general background programs. This is actually a very cool and effective method to give a single thread a really good score on performance but still have it be able to fit a couple of little cores to handle most of the multi threading bulk that happens in the background, which also helps keep the big core free for its main task of ripping through your game or whatever.
This is also how they solved the problem of jamming more technology on the cores, because the cores were getting too large to fit any more than 8 on a die. Remember when intel was stuck at 8 and 10 cores and couldn't go any higher? Well Alder lake was a solution to that. They made the cores EVEN BIGGER than 10th gen, meaning they can still only fit 8 on at a time despite a die shrink, but to make up for multi-thread they jammed *space efficient* smaller cores in which individually didn't perform so well single threaded but you could fit FOUR of them in the same space as a single P core, meaning you get the best of both the single and multi-threaded worlds.
Also few more things:
1. Heat dissipation (same reason as how u can disable defective one & sell at low rates)
2. Physical limit: It took so long to get proper 5GHz to hit even at turbo, so imagine if that limit got to be made a bottleneck cuz of how much you can schedule. SMT ain't helping you. And I think I read in some paper that max thermotical limit for silicon atom operation in a perfect serial processing was around 5THz but it'll be hotter than sun. (so point 1 again?)
3. Parallelization is faster. The parallel port got discarded cuz it was just singular lines which caused interference. And electromagnetic coupling was bit expensive to be worth it. But now, u can use that in PCIe, MIPI, RS485, Ethernet, USB, etc. Though still more lines can have more leakages specially for wires like in parallel port & would need very proper fine adjustments to work like a camera/display connector(MIPI) or PCIe. Infact you can argue that QAM is a case of parallelization.
the last single cores where 5 ghz. crossing that line to 6 ghz still isnt a thing. not without massive cooling.
What a nice video, I sure hope nothing happens to this informative channel
Bro is this techquickie?
This video's time here is numbered
@@ojaskumar521 Yes, before the hack.
@@MaskOfCinder God damn
Oh no, they got this channel now too
Just used my daughters tuition on Tesla coin!! Thanks Linus, surely he didn’t get hacked.
Tell me you're joking
Man, I was hoping you would talk about special use cases for single core super processors.
Clickbait.
yes it is the firts ltt video that i downvoted becous of it. I was hoping for an expanation of per core licensing and or FPGA gpu emulation or somting other cool thing that i never heard of.
Linus trying new password tips he found online ☠️
A possible follow up video could be discussing the advantage and problems of moving to cpu with different types of cores (like P and E, or specialized ones like video decoding or AI). Another could be why server CPUs usually go for more cores even if they are slower.
The tradeoffs of P/E (or ARM's big-little) would make a great video. But every time I see folks hyping up Tensor cores and other dedicated AI co-processors I can't help but think, "Whatever happened to PhysX?"
@@GSBarlev I've heard a lot of folks say that PhysX is terrible from both an accuracy and performance standpoint. I haven't done any real research on it though.
Most servers have more multitasking, not less. And when you get to HPC, most problems couldn’t be handled on a single core anyway, so you’re back to writing algorithms that can take advantage of multiple CPUs.
@@grn1 I think the main thing that makes PhysX unappealing is the fact that it’s NVidia-only, meaning that using it for anything other than visual effects effectively limits your target audience as a game developer.
*EDIT:* I think that is no longer the case.
Hobby CPU designer here (I use FPGA dev boards for implementing them). This was a great video!
I've contemplated building Ben Eater's 8 bit breadboard CPU, but I decided watching the video was enough for me. My wire routing is not at that level of awesomeness. It's the best series of videos on electrical and computer engineering I've ever watched, highly recommended.
@@danbance5799 oh, FPGA development is done with special coding languages known as hardware description languages. Examples are VHDL and SystemVerilog. You can just buy the physical hardware known as FPGA dev boards, then you can configure them to act as your own hardware designs that you implemented in the hardware description language you picked.
I haven't actually wired very much myself yet, due to a lack of space and lack of garage.
The biggest downfall of having a single big core is the fact that context switching takes a LOT of time when there are thousands ofthreads and processes to work with.
That's why we have multi-core CPUs, having multiple cores allows your main priority tasks to run on completely dedicated cores, whereas the less meaningful ones can be shuffled around on the remaining cores. Think of it as assigning one person a single task compared to assigning another person 10 tasks, each one happening in different rooms, surely the second person is gonna waste hours just running between the rooms, won't they?
It is very expensive, which is why Microsoft introduced the concept of fibers on top of threads/processes. Multiple fibers operate within the same execution context and are guaranteed to be on the same executing hardware which greatly saves on the context switching.
Then Microsoft went and added tens more processes with every revision of their Operating System, making the overall efficiency plummet yet again. Add in browsers such as chrome doing the same and we now often have a process per tab. I'm not doing much on this PC but have about 20 tabs open and the process count is 275 with 3900 threads. This is Windows 10, Windows 11 typically adds another 20-30 processes over Windows 10.
Negative. If that were true you would see multiple cows having greater than N times the performance (n being the number of cores). What you actually see is less efficiency, even when the tasks scale almost perfectly.
@@SlyNine What are you disagreeing with?
Multiple cores reduce the context switching requirements of a single core, but as long as a given core is running multiple processes then context switching is going to happen anyway. The operational efficiency comes from multiple processes genuinely executing at the same time which does improve performance.
Communication between processes or access to shared resources is a different matter and lots of amateur developers make the most basic of mistakes on this and produce very inefficient code as a result. Single thread applications are much easier to develop, attempting to retro-fit multi-thread execution into an existing application is incredibly difficult.
Anyone else here watching to see if the channel gets recovered?
It will be back in 2 days dw
Wait that was one month ago!!!
There are some things that are very hard to parallelize and are at the mercy of single threaded performance like CAD. I would love one superfast core for things like this
I was surprised to read this--you'd think 3D applications would be *ideal* for multiprocessing. Correct me if I'm wrong, but from my five minute Google it sounds like the culprit is that all modern CAD still inherits from the 40-year-old base code of *AutoCAD*
@@GSBarlev yea it is a little odd. It is kind of like how gaming is very single core/single threaded in most cases. While it would benefit from multicore, it would benefit even more from cache or gpu improvements more.
@@GSBarlev probably the same reason enterprise motherboards often include serial ports. Legacy baby.
@@mrgumbook serial ports are needed a lot in a professional environment. A ton of older and newer devices use it, it's pretty much the standard for industrial machines.
@@GSBarlev I'm not an expert on the subject but it is my understanding that a lot of CAD calculations require a previous answer before being able to compute. Since you are waiting for something else to compute first, it has to wait for that thread to finish getting the data. Fusion 360 does use multiple cores, depending on the task. But a lot of tasks make the program freeze if it is too complex because one thread is doing all the work most of the time. Not sure how much more you can parallelize that kind of task. Blender works differently even if they might look similar.
If they were to be able to rewrite the code in a way that the computations are more independent from each other then we might be able to see a big improvement on this. I think this would make it less efficient as far as CPU time but being that we have so many cores available usually, I would take the speed improvement over the efficiency.
RIP Techquickie 2012-2023
What I would like to see since we are already going into the heterogeneous core setups anyway, is a CPU design where you have multiple performance cores, and then 1 extra super performance core. It would when extremely well for games, since most CPU bottlenecks are single thread bottlenecks in games. For example in CS:Go, if you use process explorer to look at the individual threads, you will notice that while multiple cores are in use, they are not heavily used, but there will be 1 single thread that uses 1 core worth of CPU time, band that is the thread that everything else has to wait on. Whatever it is doing, it cannot do it asynchronously. This is the case for pretty much all games, since individual functions often are not broken into multiple threads.
with that in mind, what if they made 1 additional special core that had like twice the single threaded performance, even if less efficient, then most games will see improved performance. Some productivity apps will improve as well such as photoshop which still has many single threaded functions, as well as MS word and excel.
Yep, the E cores are not very worthwhile. But since Intel is going all in, and AMD is going with different amounts of cache for some chiplets, The major OS makers are already hard at work making their schedulers work well with a heterogeneous core setup. This is why I feel it could be used in the other direction. A CPU with proper performance cores, and then 1 additional super performance core, where without care for efficiency, they do every trick in the book to maximize IPC and clock speed.
Some mobile SoCs already do this. They'll have, say, three "big" cores, and also one "prime" core. I'm not sure if they're actually different from the big cores in terms of microarchitecture, but are optimized and/or binned to run at higher clocks.
Some mobile SOCs will do a single extra high performance core with additional cache and clock speed, though no other fundamental differences compared to the other high performance cores. Those changes helped greatly, especially for web browsers where most scripting is still single threaded. that change allowed them to catch up to apple in the browser benchmarks such as Mozilla Kraken 1.1, Octane V2, etc.
The mobile implementation has worked well, though I would like it to be taken further. For example, imagine taking a CPU like Ryzen 7 7800X, making the chiplet larger, and adding a 9th core that has a 4x more registers, go from 3 loads and 2 stores per cycle, to 6 loads and 4 stores. Double the pipeline width for the decode and Op-cache, and beef up the other aspects as well, sure it will make for a large increase in power consumption and some diminishing returns and basically all of the stuff that really harms efficiency on X86.
For many years Apple did stuff like that with their smartphone SOCs since many apps, and especially web browsers benefited more from faster single threaded performance than
multiple cores, thus you would see Apple and Qualcomm using the same ARM version, but apple would tweak it to improve single threaded performance where they would have 60-70% better performance but half the cores, thus while they would lose out on multithreaded tests, they would do significantly better on things like web browsers and web apps.
Overall, with a move like that, efficiency would not be a goal, but it would be improved gaming performance above all else.
There are a few comments in another thread that cover why that wouldn't work and it comes down to physics. Smaller cores are more power efficient so you get more speed with less power but we're bumping up against the limits of silicon both in terms of size (any smaller and tunneling becomes a problem) and thermals (any more power and the chips will fry themselves). Bigger cores would hit the thermal limit faster and we can't just add more functions to the cores since the programs we're trying to speed up wouldn't be able to take advantage of it and it would heat up faster.
@@grn1 That is certainly a risk if trying to do something like make a single core that has the tflops of a quad core CPU. Though what I was focusing more on was something more along the lines of what Apple did with the Arm SOCs to improve IPC during the times when faster single threaded performance benefitted them a lot more than having more cores. They made changes that resulted in larger cores, such as a dual core that took as much die space as a quad core from Qualcomm. In the case of a desktop CPU, something similar could be done for one core, and that will drastically improve the performance of games that have a single thread bottleneck such as CS:GO, Microsoft flight simulator 2020, all current e-sports titles, and many others.
i7 620m=2c/4th. Ok, this really checks out. Not 1 but 2 cores working hard and scheduled to the point where a cpu upgrade isn't needed at all. Wow Clarkdale got it so right, it's no wonder my cpu usage is only very high when watching UA-cam in 1080p60, meanwhile less than 40% on every game or 3D application I try. on this thing. I can't wait for the quad core upgrade to balance similar performance across all cores for graphics that's more up to speed with it, not to mention FINALLY 1080p! Goobye 900p TN panel, hello A12 9800 FULL HD gooness.
0:24 ...except modern USB is fast because it has multiple data streams and parallel ports were slow because they had one data stream. If only Linus had actually looked into how these protocols actually worked and what "parallel" and "serial" even meant.
What exactly is your take here? Would you have preferred if Linus compared PATA to SATA?
You forgot to mention one keys reason that multicore is the way: increasing frequency of the core is getting harder and harder.. (look at the time it took Intel to go from 300Mhz to 3Ghz and to max turbo around 5ghz today)
I'm laughing seeing the Tesla logo on this account
Well this is an interesting Thursday morning. All of LMG has been hacked, and the main LTT channel has been terminated because of it
0:38 I believe the reason for multiple cores is because we reached a limit with the technology, making it hard to increase the core speed. By having many cores we circumvent this technical limitation.
Yeah that was my thinking as well. I guess they were comparing multiple cores vs. running multiple threads on a single core. The video wasn't about only running 1 thread at a time. If we really only use 1 thread at a time then we run into the issue that we can no longer increase the frequency at which our CPU run. Or at least we can not increase it nearly as fast as we could by just adding more cores. It is a lot easier to build a CPU with 2 cores each at 4 GHz than 1 core at 8 GHz.
Yeah it seems weird to skip over the most obvious reason.
@@devluz @JacobPersico this is still kind of incorrect as core improvements are constantly made. A single core cpu today wouldnt stand up to a single core on any of todays cpus. Even by itself a single core on the most recent intel or amd chip would blow any single core from then out of the water. Its not that we can't make better single cores but that it is impractical to do so as multicore chips will always be faster than a single core chip ever will even if that single core is extremely powerful.
this is still kind of incorrect as core improvements are constantly made. A single core cpu today wouldnt stand up to a single core on any of todays cpus. Even by itself a single core on the most recent intel or amd chip would blow any single core from then out of the water. Its not that we can't make better single cores but that it is impractical to do so as multicore chips will always be faster than a single core chip ever will even if that single core is extremely powerful.
Yes, but they forgot to mention that context switching is the most inefficient process, having to flush the cache with every switch, hence why SMT/HT was introduced, though if the CPU is suffocating, it doesn't help much.
I bought so much tsla crypto thanks linus
They stole my cat and ruined my vintage pokemon card collection, damn crypto burgles
...you got hacked.
@@Miss_ClaireHodl you will get 100 cat return on your 1 cat investment
thank you Tesla
I'm so glad Tesla is doing tech videos now
Yes, Elon was Linus this whole time.
Love this new Tesla Tech Tips channel... so much good info!
I only gave them 2 bitcoin... only an idiot would give them more
@@GregoryShtevensh I only gave them zero bitcoin, since I like my bitcoin.
@@TheBendixSA you have bitcoin?
It's cool to see Linus doing a TechQuickie, as much as I love Riley and James and all the other hosts on the LTT staff that do such a good job at it.
I thought this video was going to show us some niche product that actually does have a single, highly specialised core.
But this video glossed over the most obvious reason for multi-core processors. We can't make single cores go faster anymore. The scheduler has to divide processing time across all the threads that use it. If you want to work faster, you can either make that core go faster or add another core. We've more or less hit the speed limit so it's a lot more effective to add more cores.
Today is an interesting day for Linus
How the heck did they manage to hack them all
Malware with a Red Line stealer that runs hidden in the background
I think this would work, if motherboards had external SOCs and schedulers, that could tell the cpu to execute at specific time for specific programs, making cpu manufacturers replace focus on speed, while the motherboard would organize the data.
Why would it be external? Why add another SOC that doesn't understand the architecture of the CPU in the socket?
It makes more sense to build the scheduler onto the CPUs SOC, and that's exactly what they do. But at some point you need to prioritise scheduling at a thread/process level that the CPU can't understand. That's when it's handled by the OS, more specifically the kernel of the OS. It's handled in software so that it's flexible enough to meet the needs of the user. An external SOC wouldn't help with that either.
@Daniel Lupton if the cpu manufacturer is putting all resources into making a super fast core, that would put pressure on motherboard manufacturers to innovate on more advanced ways to organize the info. I don't know enough about, the software side of things, so I might be wrong. Also the motherboards would have to have specific cpu sockets for different architectures, that might sound tasking but it's just moving around a few pins each generation. Like a qr code
@@LeeVoolHM The motherboard doesn't organise anything. It just provides data lanes for the CPU to talk to the memory, GPU and I/O, and provides power to those components. The motherboard has no awareness of the data moving through it.
I don't think you know enough about the subject to provide insight that the manufacturers are missing.
@Daniel Lupton I was suggesting that the task, of organizing data, would have to be managed by the motherboard. I'm not saying it does or has. I'm saying it would be needed for this hypothetical situation. Where the cpu has a 1 purpose of executing as fast as possible. then organization handled by the motherboard.
@@LeeVoolHM in what world would 2 chips talking to each other work faster than 1 chip that does both tasks on the same die?
Not only would you have to deal with the latency of the communication bus, but you wouldn't be able to optimise anything at the silicon level.
You should learn about cache misses and branch misses and you'll understand why the further data is from the CPU registers (L1, L2, L3 cache, memory, HDD, network, etc) each level can take 10-100x longer to retrieve and execute. Going from a 99% cache hit-rate to a 98% can half the speed of a processor. So having the data flow managed by an external chipset would ruin performance far more than any saving you get from splitting the task up.
What I find fascinating is that you seem so insistent that this makes sense. There are so many reasons that it doesn't work, I don't even know where to begin. It's like the flat-earth theory. It's so wrong, it's actually hard to explain why without pretty much explaining all of science from scratch.
If These Guys Ever Recover Their Channel, I Already Know Linus Is Going On A Long Rant On How UA-cam Addressed The Problem, How They Did Nothing, How There Needs To Be More Security, All That BS. Mark My Words.
Agreed.
Small thing, May I ask why you capitalized every word?
@@DeputyGamer it's a way how people type, don't worry abt it
I guess you could call this channel Teslaquickie
OMG, another channel hacked
What's happening?
its strange looks like the hacker didn't delete the videos i feel like the hacker knows linus and didn't proceed to delete the videos i maybe wrong
No he dont care about ltt he just forgot. He hided all videos on other ltt channels
it looks like Linus' servers were hacked in the first place. Someone has to answer there for clicking onto a mail attachment
This is why you don't use the same password for all your channels lol
they just need to hack one computer that has acces to the rest of their network
This channel getting hacked wasn't something I was expecting
One core to rule them all?
As someone who was in high school in the late '90s, I remember a time when computers couldn't multitask and it was painful. Want to click save on a document and then click exit? Nope, gotta wait for one thing to finish before you can do the next. Want to open more than one program at a time? Nope, not possible. It was one of the causes of "save anxiety" where you lose tons of work on something when it crashes because the machine couldn't multitask enough to save while you were doing things.
I was so happy in XP times and a single core cpu. Real multitasking was a fact then and if some app was making my system crawl, I simply changed its priority and presto! The "intense" app kept working in background and I could use my system smoothly doing other things.
Now with multicore, well, that simplicity is over. If you're lucky, all the cores work "equally" but most of the time, the first one receives the heavier use (generally while using not so up-to-date, but lighter, software).
what.. My amiga 500 multitasked, 0 issues....
Going back to pentium
Return of the King
Making more cores is a more efficient way to use the transistors as well. a modern core can execute around 4 instructions simultaneously, but to do that, it needs to analyze the code in a very smart way, and either find instructions that can be executed independently of the others being processed or kludge around with virtual registers and all sorts of very smart logic to keep the "four mouths" fed.
Making a CPU that can execute 8 or 16 simultaneous instructions for example would make this analysis logic gigantic, at a point it would be much bigger than just two or four cores that pretty much deliver the same performance.
It's probably the same People or guy that hacked Corridor Crew. They'll get everything back, what they need to worry about is keeping all sensitive data safe. At least the hacker is nice enough to not delete everything, maybe this is a way to tell Google they need to do something about their security.
Efficiency was a big selling point when multi core chips came out. They were having trouble with one core at 5Ghz, and found it more efficient to have two cores at 2Ghz a piece
Can’t wait to buy a 1000 core processor
After 24/48GB RAMs and modern CPUs it would be like 187 performance cores, 247 V-Cache cores and 548 efficiency cores plus 65 'glitter' cores. Oops, that is 1047, but who cares.
I recall that computers hanging or freezing was a problem until around 2005 when 4 core PCs became commonplace along with Windows versions that used multiple cores and could alert you to failed services or drivers rather than just lockup. I dreaded phone calls from people who say “it’s frozen again”.
Wow, thanks Tesla for informing me on cores!
PhD student in computer architecture here, yes, one monolithic core \could\ still be very capable, but is unlikely to be, given conventional ISAs (x86, ARM, RISC-V). A VLIW architecture (rip Itanium) could remedy a lot of the challenges posed to a single monolithic core in single/small issue width ISAs and programming models. Another good point to keep in mind is that 2-way SMT is not the limit--even the ATOM (yes, ATOM) cores in the Knight's Landing Xeon Phi had 4-WAY SMT. The point about die yield is extremely compelling, though.
Also, individual cores don't have their own ports to memory--they have their own ports to the network-on-chip. NoCs used to be buses, but these days, with rising core counts and their logical and physical implications (parasitic capacitances render long buses infeasible when transistors switch at modern speeds, except in cryogenic computing). The NoC has direct access to memory and to L3 cache, but each core having direct access to memory would be a nightmare from the perspective of cache coherence and memory consistency.
No way Linus got tech tipped
I saw a recent LGR video where he ran a 20 year old copy of Bryce 3D on a period computer and a modern Threadripper. It was only a bit faster on a modern machine, lot many orders of magnitude as you might expect. When he opened Task Manager, you could see one core churning at 100%, and the rest unused, because the software wasn't optimized for multiple cores.
This is some weird content Tesla is putting out, especially with this Linus guy!
As a single core goes up in power its heat increases. One of the big reasons manufacturers went for multi-core was that we were a couple of generations away from a core generating as much heat as a household iron. As generations increased heat was starting to go up exponentially. It was simply not possible to have a say 16Ghz CPU (which is only 4x4ghx), especially air cooled. Probably an even bigger issue with laptops. My G14 is 24GHz (8*3)
Damn now Linus got Tesla’d 🤣🤣🤣….nah but seriously I hope y’all get your channel back.
here we go this channel also got hacked
why tf the scam livestream is still live on multiple multi-million subscriber count channels? holy shit youtube...
I hope someone back all the videos up
So weird seeing this after watching him do it live. (also, why are the Techquickie uploads so delayed on Floatplane?)
thanks linus i really needed to learn about tesla
How I’m I going to get my tech tips now?
when I overclocked my 3.2 to 5ghz so many y ago I expected we would be up to 8 cores and 20ghz by now........underestimated the amount of cores and overestimated the speed that have been dropping ever since then
Linus got havked
Is this his main channel?
@@blaat44
LTT got hacked but it got taken down by UA-cam after an hour. TQ and other LMG channels are currently also hacked. Someone with access to multiple accounts clicked on a shady link or downloaded a shady program I guess.
@@perpetualcollapse I have olso heard hackers use chrome extensions to stel all passwords when using chrome.
@@perpetualcollapse It's kinda weird though, it's just 8:30 AM there (LTT was hacked when prob. no one was working in LMG yet), so I guess just bruteforce but we will see.
Lets prey that linus gets his channel back
Another huge concern with "one huge core" paradigms is that, the sort of tasks that demand highly performant single threads are better served by simplistic, sub-core hardware acceleration, such as hardware encoders, decoders, tensor processing units, floating point units, RT cores, etc. etc.
Generally the issue is simple; when you need massive grunt for a single thread or single core, you generally need a lot of grunt for a single task. And often a single task is super easy to just build bespoke hardware for.