One thing worth mentioning: GPUs actually do 4 dimensional matrix calculations rather than 3, because with 3 dimensions, Rotation and magnification require matrix multiplication while translation requires matrix addition. By adding an arbitrary dimension, the GPU is able to unify all three key transformations under a single multiplicative architecture.
Here’s a key acronym to remember about GPUs: “SIMD”. That‘s “Single-Instruction, Multiple-Data”. It has to do with the fact that a GPU can operate on a hundred or a thousand vertices or pixels at once in parallel, but it has to perform exactly the same calculation on all of them. Whereas a single CPU core can be described as “SISD” -- “Single-Instruction, Single-Data”. With multiple CPU cores, you get “MIMD” -- “Multiple-Instruction, Multiple-Data”, where each instruction sequence can be doing entirely different things to different data. Or in other words, multithreading. So even with all their massive parallelism, GPUs are still effectively single-threaded.
I am an autodidactic ,and I beleive in the "first principle" of things... I want to understand this Intuitively... Is there any video that can properly put into perspective all these that you say in such a way I little child can understand?
@@lawrencedoliveiro9104 I am an autodidactic ,and I beleive in the "first principle" of things... I want to understand this Intuitively... Is there any video that can properly put into perspective all these that you say in such a way I little child can understand?.... How do these vectors work? How are they represented, how can all these not look abstract? Human beings made these things so if someone has done it, then it's no more abstract....I want to Intuitively understand all these... It should have to be school of the 4 walls of anywhere, I beleive if a man seeks he would find...I want to know these things...thus my everyday search of all these topics... I just Intuitively understood how binary numbers can be used to represent any other number, but how does it translate to videos? Audios and everything we see digitally?..how does just o and 1 define everything we feel that is completely tangential to 1 and 0?
Hi missed one of the most important things. Why they are more efficient. Running GPU load in parallell don´t make it more power efficient. It also don´t use less die surface. Whats is done is that the GPU it just have one very narrow sett of instruction that is the same width and same length regardless. Typically it can be a 128 bit by 128 bit instruktion that can run on 2x64 bit, 4x32 bit or 8x16 bit. And it can only run a very limited number of instruction. Also, there is no branch, and no branch prediction. This cuts away a load of piples. In a GPU every pipeline does say 128 bit of instruction for every clock regardless of instruction. If you only need 16 by 16 bit of instruction, though luck, still have to fill up the whole 128 by 128 bit register. Also, a CPU got a FPU and a IPU and logic, and SIMD pipes. And every pipe have duplicates for bransch prediction. Some have 4 way branch prediction, so they have 4 pipelines for one instruction. So in a normal CPU you might have 4 pipes for integer, 4 for floating, 4 for SIMD and 4 for general logic (often combined with integer). Total of 16 pipes just to calculate one value. Why.. well to get the speed up. .. the clock speed that is and keep branch prediction errors down. A GPU just does one type of pipeline (in the older days there was two different, one for transform and one for rendering, but in most modern one they are integrated in the same pipeline). Also, there is one instruction for every 128 or 256 set of data. That is if you uses 32 bit on 128 bit pipe you get a 1:4 instruction reduction... because most data uses the same instruction. Typical 128 bit data can be a HDR rendering scen where two colors are mixed RGBA mixed with a other RGBA. Insteed of running first the R, then the G and then the B and so on. They just put the whole load in to the pipe and calculate all the data with the same instruction. Almost everything in graphics come in set of four. If there is a polygon, well triangle. It have 3 corners, and then a additional value for the polygon. So it still have 4 values. Some Graphics load use half precision. Then you can do two pixel or two polygon calculation at the same time in the same pipe. Actually, CPU have had that capability since Pentium 3. But its still just have one output per core. (even if its a 128bit simd output, most modern CPU even have 256 bit simd) The other part is the missing of branch .This remove a ton of problem for the chip designer. Firstly you don´t need any branch prediction. Secondly you don´t need and branching instructions and pipes. Removing the whole logic pipe. Thow you can still make branch like calculation multiplying with a matrix that gives a very set 0 or 1 value in the end matrix. But the GPU can never make any decision about it. This is also the main drawback with the GPU. it will continually calculate the same set of work orders on and on and it will only calculate set given work order. It can´t make work order. This makes the load very predictable. The GPU know what it will do several 10 000 clocks in for hand. This helps parallellism very much. But it can´t make a instruction. So the CPU still have to make a instruction for the GPU. But the load of the CPU can be very low. For example the CPU kan tell the GPU to render a trea of data. The trea in turn is a given list of objects, that have a given list of subobjekt, that have a given list of vertexes, that have a given list of textures and so on. This way the CPU can give the GPU very limited amount of information to make quite a lot of work. This have not always been the case. GPU prior to 2000 have to have specific list of textures and vertexes directly from the CPU, giving the CPU quite a lot of work load. The problem nowdays is that the game dev want the word to be "living" They there for want as little amount of data to be pre fixed to a set trea of object. There for the CPU can be running in feeding the GPU with 1000 and 1000 of objects for every frame. In DX11 this is a problem, because in DX11 just one CPU core can do the GPU feeding. Someone just never thought this would be a problem. This have been in DX since the first version. Finally in DX12 it will be updated.
+matsv201 Modern GPUs do have branching, albeit very primitive, but even that's getting better due to AMDs non-synchronized shaders which allows for much better efficiency and foregoing some inherent weaknesses of SIMD. I've read some modern architectures even considering adopting branch prediction quite soon.
+matsv201 thanks for putting that much effort into a youtube comments section :O still, do you think that much detail would've made a general explanation of GPU vs. CPU easier to understand? :D
Henrix98 Yea that tactics i usualy go for :)... actually im more likely to be wrong, the more i write.... on the other side... people that dont understand how things work usually over simplyfies them... Is really anying when someone force you to tell something complicated in a simple way
it's amazing that he didn't explain SIMD or Vector computing, because that's exactly what the difference is - and it's the real answer to why CPU and GPU are different. instead of looping over an expression of 256 array elements, you create huge registers that are a gang of 256 floating point elements. a CPU would have 256 threads progressing at different rates on each array element. a GPU would force all 256 items to be on the same instruction because it's working on these huge registers. this is why you can't run all CPU algorithms efficiently on a GPU. ie: for each i in [0,256]: c[i] = a[i] * b[i] ... is CPU thinking. Each thread progresses at its own rate. GPU thinking is: float32_times256 c,b,a; c = b * a; .... where c=b*a is one instruction, with three huge operands.
Holy Cow! I've been trying to understand the fundamental difference by googling or watching content like this video and your single comment taught me more than anything. Thank you!
1:27 That's a suboptimal way of tessellating circles. All the triangles meeting in the center will increase the chance of visible artifacts occurring. (UA-cam smartass answer mitigation disclaimer: I know it's not a circle and I saw that the triangles don't meet. Leaving a smaller copy of the same polygon for the center doesn't solve the problem though, does it?)
***** Alright, fair point. But it still gives people wrong ideas. I had to learn this the hard way. Would be nice if future generations of fools who tessellate their own polygons would be unknowingly lead to have a better intuition about this. But I guess it's just much easier to do it the way you did which, again, would be a fair point.
+Penny Lane There is also a point where if you had to learn stuff the hard way, you probably would be the one teaching those who learned it the easy way. Not being a smarty-pats or anything.. just a bright side :)
+Penny Lane Who on earth still tesselates their own polygons? Most 3D modeling programs either have support for N-gons or can clean up this kind of mess automatically.
I've been told that for computer modelling, GPU's aren't always a good solution. I've been told that for certain kinds of Finite Element Analysis, you can generally only calculate one step at a time, one node at a time, as the result of one node can affect the calculation required for all the other nodes it influences, and the same is true for those nodes at each time step. So these still need high speed CPU's to do the grunt work, as the task isn't easy to parallelize.
Does anybody have any more information about this? I always wondered how they can parallelize finite difference and coupled sets of differential equations. Because the results of one computation are the input to the equations of the next element in the mesh.
One of the features of OS X a few versions back was the ability to off-load repetitive, non-graphical tasks to the GPU rather than the CPU to operate faster.
+Ryan Couture In the case of a game, the CPU runs the game's logic (the rules of the game), input handling, sound mixing and AI. The GPU gets tasked with rendering the visuals and a good chunk of the physics. What the GPU needs to actually draw the scene tends to be sent to it by the CPU at load time, with the occasional update as objects fall in and out of relevance (eg. unloading the model for an enemy where all instances of it in the level are dead, loading in the projectiles for a new weapon the player just picked up). Game physics is a good candidate for the GPU because it tends to have lots of independent entities which need to be processed. Let's say you set off a trap with a bomb hidden in a pile of boxes. It's going to result in pieces of debris flying everywhere. All those pieces of debris need the equations of Newtonian mechanics crunched for them to figure out where they end up in each frame. The GPU can crunch a thousand such pieces at once and update the list of where everything which needs to be drawn is located and oriented. Then once it's done with the physics, it can tell the CPU "Okay, I'm done with the physics. The results are in this block of memory if you need them for something else." then the CPU can tell it to start drawing while it does something else.
Would it be theoretically possible to run Windows using only a GPU? (With no CPU) Assuming you have a modified version of Windows & Motherboard for it to work...?
Frank Einstein Depends a bit how the specific architecture is made. There may be essential logic missing on some designs since it’s not required for normal gpu workloads. And it’d not exactly be a basic software mod. You’d need to compile to an entirely different instruction set. I know no compiler that could even do that. Intel’s Xeon Phi can run a system though and it works very much like a gpu in many ways.
It is extremely easy to complain about a tiny glich in the game while sitting in front of a console without any clue how everything works. When u think of about the math involved in making it, it is mind blowing what a human mind is capable of.
+sami aklilu When you think about thinking about the human mind, or any animal brain for that matter, your mind should be blown at the fact that you can think about that...
it is extremely easy to fix issues before release, but they rush it. creating a game is not hard. creating a game engine is very hard, but often game dev's don't make the engine anyway and 99.9% of the bugs are not because of the engine, but made by level designers..
+Mystyc Cheez If you're REALLY interested in knowing how it works read something like "Computer Systems Architecture; A Networking Approach" by Rob Williams, published by Addison Wesley
+Mystyc Cheez I suppose one can't explain extensive topics like this in 5 min with a satisfying deep of details. Even explaining how transistors work can be difficult to explain in such hurry.
What did you think this video was missing? I thought he explained how a GPU in general worked pretty well, it's essentially just a linear algebra machine.
Perfect, I just wished he had actually said that a GPU has lots of tiny processors (compute units) physically. Yes most people know this, but given how entry-level this explanation was (on purpose), the target audience might not know it. Very clear answer, I'm impressed :O
*GPU:* Graphic objects can generally be broken into pieces and rendered in *parallel* at the machine level. *CPU:* Generally sequential by design to care of problems which have many *_dependencies_* which *cannot be solved in parallel* by their very nature. Ex. Seq1: *A = b + c;* Seq2: *D = A + E;* Seq3: *y = D - A;* Obviously, you have to do the thing in *sequence* in order to get the value of *y.* However, the *tasks* of *_relegating_* as to which problem (i.e. gaming or any other apps) during *execution time* is mainly a *function of the OS.* This is so as *Gaming Apps* is not *all* a *_graphic rendering_* task but also a combination of *logic, rules, and some AI.* Those are the reasons why you need *both.* Kinda like *_specialization_* of tasks.
+Steven Whiting That's because when you're running through billions of permutations of input passwords and trying to find which one matches your stolen hash, you have a situation where each run is independent, so can be easily converted to a parallel form. Chances are that with a typical high-end GPU you'd be calculating over 1000 hashes at once and if each core can do 200k per second, that's a total throughput of 200M per second.
I think a video on real time ray tracing would be interesting. There's a good chance that it'll be the future of graphics but it'll require different silicon to todays GPUs
Thank you so much for the content and the people on the screen for their willingness to share knowledge! Bravo good sir, it takes a special kind of talent to explain something complicated this well and understandably. Exactly the kind of people you'd want to have tea/coffee/beer with!
Nice shout out to Bitcoin , but for clarification purposes Bitcoin mining on GPUs is now obsolete and most of it is done of Specialty ASICs. Mining is so competitive that If you attempt to mine with a GPU or even older generation ASIC you will lose money unless you have access to free electricity. The general point of mentioning mining on GPUs is important though as hashing double rounds of SHA256 is much more efficient on a GPU than CPU, although ASIC's are far superior to both. For Mining - ASIC > FPGA > GPU > CPU
wow, so well explained, as simple as possible but no simpler. Its such a big area i cant believe he explained it so compactly and clearly. Hi is a real genus.
Nicely done. I use nVidia GPUs to do huge matrix operations in statistical modeling to speed up model estimation. Bayesian MCMC isn't really practical on large scales without the parallelism offered by GPU computing.
It really isn't necessary to use floating point operations but it is done with floating point operations because there is hardware that does the floating point operations very well so they are making use of that. An example to clarify what I mean is doing square roots. You can do square roots without using floating point ops you can just use integer calculations then at the end you just place the decimal point in the correct position.
I run very, very large AutoCAD files and they would take forever and a day to load. My supervisor has the exact same machine I do and he loads the same files in a fraction of the time. I asked him how and he said that he installed a graphics card. Just for displaying 2D images, the GPU makes all the difference. I got one and it is like magic.
+DLC Spider Well not only is it implementation dependent but that's completely inaccurate for the most part. In the vast majority of scenes, be them game or otherwise you make a scene description. Whether the camera moves coordinates or the scene moves coordinates is pretty much irrelevant to the end result, but moving MANY vertices of ALL objects in the scene takes a MUCH larger time than moving 1 point (the camera) in the scene and updating the POV vector (a second point). From there the culling and rendering happens the same as it would no matter what moved. Moving vertices is expensive, moving the camera is cheap, though either will net you a valid result.
For a VP of Technology at "ARM" this was an extremely low tech explanation. But then all of my extra parallel processing is done on a NVIDIA card. A CPU today has eight or more cores. I hope the rest of this series has some more meat to the bone. We are not morons out here.
Our professor on Star Clusters told us GPUs are far more effective in calculating clusters evolution. Since it is not possible to analytically derive the evolution for more than two bodies, it has to be done numerically. 'It's all just about data crunching', he said.
I do 3D modelling recreationally and I hate working with triangles. I prefer quads because those types of polygons can be structured into bands. With those bands, a looping structure can select/manipulate multiple polygons at once. For example, imagine designing a human model. You can run bands of quads to form cylinder parts like arms, neck etc. From there, you can manipulate entire bands at once instead of going polygon after polygon.
+Embeh It has to do with how the command is structure. You send a command to your cpu like "multiply 3 5", "multiply 1 2",..... when you send it to your GPU, you can send "multiply 3 5 1 2 ...." and it will run the multiplication-loop for all the values in that one request. A CPU is built, so it can do many different calculations. A GPU is built so it can do the same type of calculation over and over again. Saving a lot of time.
+Embeh Not really. A tpical CPU Core has an ALU (Arithmetical logical unit: the part, that does the calculations) and a control unit which tells the ALU what to to. In a GPU core, there is only one control unit per dozens of ALUs, which lets them all do the exact same calculations, but on different data. (SIMD)
+TheKlaMike I believe it's because a CPU can do very long and complicated computations containing a lot of steps as its cores are larger and more complex, e.g. 'add this to this, then multiply that by this, then take this, square root it, then divide it by the previous number, then copy it to this...' etc. But when dealing with simple calculations it can only do a few at a time as CPUs tend to only have a few cores. But each core in a GPU is quite small and simple and specialized for graphical calculations rather than general purpose computing. Hundreds or thousands of them can fit on a single chip, so while each core would need to take many small steps to perform a complex calculation that a CPU core could do in one step, it can do thousands of simple, specialized steps at once.
+shwetank verma CPU has a higher clock-speed, and a pipeline where multiple operations may be done in a single clock-cycle if they don't depend on each others' registers (they are basically decoded and rammed into the pipeline, as the other end divides them up and executes them). each core will typically also have multiple ALUs available for parallel integer arithmetic, ... normally the core count is small, with lots of resources invested into making each core fast. GPU tends to have a larger number of simpler (slower) cores each with vector units, and running at a lower clock-speed. each core isn't all that fast on its own, and deals rather poorly with things like branching, and can usually only do one operation per clock cycle, .... however, if your operation consists of large amounts of parallel operations, it can be spread across a large number of cores, with each doing their own work on their own data. instruction sets are also a bit different. for example, the x86 ISA uses operations that allow doing things like loading/storing memory and performing arithmetic at the same time. this can save clock-cycles. the contrast is a load/store model, where load/store operations and arithmetic operations are handled separately, which combined with being limited to a single operation per clock cycle, additionally limits how fast stuff can get done.
+TheKlaMike Well. i suppose you mean faster. They are equally good in regard that they produce the same result. And a GPU will produce the result using less energy. A CPU chop up the calculation in several serial bits. This make it so every bit in sequence can be done really fast making a high clock speed possible. The number of steps is usually around 12-25. A GPU have fewer serial steps. Typically 3-10. This ironically mean that a CPU really does not do one single calculation faster. If both just make one calculation, and that is is. Both CPU and GPU is about equally fast. Where a CPU shines is when it need to do calculation in sequence. If the calculation depend on the result of a prior calculation the higher clock speed make it so it pops out them faster in seaqvens.. basically raw clock speed. But this is not really what a CPU is made for. CPU is made for calculating a number then making a comparison, than taking a decision. A CPU is not only faster at this. A GPU is not even capable of doing this at all.
2:30 Almost all 3D engines have the camera static and they move everything around the camera. I really feel like this video only gave surface level differences and didn't talk about the true hardware differences :/
+KingFredrickVI I feel like that's actually a myth. The matrix comes out exactly the same when you do it either way, so it's really just a matter of your point of view.
+Dolkarr You're right. Am I moving the camera - and then working out where the points are in relation to the camera - or am I moving the scene - working out where the camera is in relation to the points? Same difference. Am I adding 3? Or am I subtracting minus 3? Or am I subtracting the subtraction of positive 3 from zero, from zero? Relativity. It's really all the same thing.
+Ahmed M.AbdElMoteleb (SAIKO) The topic of culling isn't relevant to the discussion about whether or not the objects are transformed or if the camera is transformed. Culling is the process of deciding which objects that are invisible so that you do not have to fetch, transform, rasterize and shade them and therefore are not considered when rendering each object onto the camera.
+KingFredrickVI This is absolutely true. It's to deal with the problem of floating point inaccuracies at large scales, and it's called the Floating Origin method. Any game over a certain seamless world size pretty much has to do it this way. Otherwise, as the player gets farther and farther away from 0,0,0 in world coordinates, objects in the world will start to move away from their positions as the amount of precision you can use decreases over distance, and they basically snap to the closest possible precision coordinate they can use. So, if you want to simulate say an entire universe or galaxy or a solar system, like Celestia, or Elite, or KSP or whatever, or an entire real sized planet or even just a largish region, you have to use floating origin where the world moves around the player, and keep track of the player location and object locations in a separate, more precise manner (usually a large fixed point precision number), and translate between the positions in this overall world/universe model and positions in the game view model, using scaling tricks and what not to make far away objects appear far away even though they're actually relatively close by. The problem with making an engine with higher precision numbers for the vector locations is the more precise you make the numbers, the harder it is on processor and the slower the math is performed, until you get to the point that you simply can't simulate a large world without really keeping the number of objects which calculations need to be performed on to an absolute minimum. I know for single precision floating point numbers, things start getting screwy over about 10000 units away from the origin. A double gets significantly larger precision, but still, won't work very well on scales much above a solar system. Usually, rather than constantly moving the world around the player, which would be heavy on the processing power since you have to individually relocate each object in the coordinate system, you set a boundary where the player moves around in the coordinates but when they reach a certain distance from the origin, the player snaps back to the origin and everything in the world moves with them.
Gpu --> graphic rendering independently of cpu Cpu --> take user input and give it to gpu to draw the picture on the screen.. That's cool.. What is "vlar" and "foo" by the way?
I'm using a Tesla GPU at my university for accelerating a certain physics simulation. Works quite well, I'm getting a speedup of around 10, compared to normal CPU processing.
Graphics are just one possible application for the use of GPU's but remember that GPU's are designed as better number crunchers than CPU's. Possible uses are password cracking, calculating PI, Chaos theory, stock market prediction or anything that requires highly parallel computing. The guy being interviewed seemed to be only focused on graphics work which only the tip of the iceberg.
with chaos theory is it fear to state machine will be one day be alive on exploiting that field, look the insects like bedbug are too small yet alive, could the principal of life be all about neurons or atoms arranged to achieve chaos in oscillation, with signals of data in the respective body. Putin one said the first to build AI will rule the world. What he didn't figure out was once it has been understand/masted by any one, that person will rule the AI, therefore the first one to understand or figure out AI will actually rule the world. its going be easy to manipulate AI while you don't have those expensive servers. Just access to AI will be more than enough, with internet of thing you can imagine how powerful a one man army will look like.
+DevilsCookies Not sure what program you're talking about, but some rendering techniques don't work that well on GPUs. They suck at recursion and branching, which are key components of ray-tracing. Not to say that GPUs can't do recursion or branching. They can. They're just much slower at it than the CPU is. Enough so that when writing code for the GPU, you want to avoid branching whenever possible. If you're rendering a sphere in a ray-tracer, you need an if() statement in the intersection test to check if the quadratic formula has a negative discriminant, which indicates a miss.
C4D = Cinema 4D and well, i thought 3D Animations is rendered faster by a GPU, because it's graphics you are rendering, and the GPU is a graphics processing unit.
Recently I made the mistake of trying to produce an entire 3d graphics engine from scratch in pjs, all starting with the following function: var raster = function(x, y, z) { return [x/z, y/z] }; This function allows me to take any point (x, y, z) and map it to the screen. This is the most vital, core component of the entire program. I eventually added cameraPos to the function so I could move anywhere. But there was one fatal problem. I could not rotate anything. I attempted to produce a function that took in 4 vectors (newx, newy, newz,) and (x, y, z) to produce a new imaginary grid that treats newx, newy, newz like x y z and transforms the point. I then tried to produce a function that generates 3 of the vectors the previous function calls for based on pitch, yaw, and roll, but I have yet to get it working.
+Dylan Cannisi CUDA, the best thing for lots of things... It helps gamers, but it also helps high performance computing people. The stuff you need to model the Big Bang, is the same as what you need to play Quake at 8k and 200 fps.
+Kneedragon1962 CUDA is a programming language API for C/C++, Fortran, etc. You wouldn't use it to program graphics. It stands for Compute Unified Device Architecture. It unlocks you the ability to to do mathematics on vectors and other small computations. I'm just saying what I use it for, it allows me to build complicated models, mostly SVM's, I wouldn't be able to do on a CPU because it would take to long.
Dylan Cannisi CUDA is a language and a suite of tools that allow you to do programming on the video hardware, which is not usually accessible to the programmer for general purpose computing. It provides access to hardware that is not normally used this way or accessible to a general software programmer. And it provides some tools for very wide parrallelisation of code. Many programming languages support parallel programming, multiple threads for example, but they anticipate a fairly small number of threads. CUDA supports a whole different view of this. I was at TAFE in'95 when nVidia started talking about it, and the Linux people were starting to use WolfPack to do distributed processing. We have a blanket term for this stuff today, we call it cloud computing...
*+Dylan Cannisi* Yes, Nvidia did an amazing job with CUDA. It's unfortunate that it's not a hardware agnostic standard though. Doesn't seem right to have vendor lock-in for something of this nature.
Adrian Right back at day one, nVidia tried very hard to sell the concept and make it an industry standard, they were shouting it from the rooftops while I was at college, but the various other manufacturers were lukewarm, and they all wrote their own proprietary versions, which all faded into the past like Flash player...
that feeling when you're explaining to people that it's a $1000 card that just draws virtual triangles. I guess they're pretty good at adding and subtracting floats and doubles, but the triangle drawing is where the action is.
Most of the modern display devices provide a secondary display output for to use a second monitor. But how to display the content of the secondary linear framebuffer on the secondary monitor using a different resolution as the primary monitor for a selfmade bootmanager(no display driver loaded) with and without an UEFI-Bios?
Hii sir.. can you help me...... I have amd A10 R7 7800 processor and amd radeon 2 GB graphic card model HD6450..... Soo can you suggest me they both work together and give best performance or graphic card is useless with this processor.....
I'm ot sure what you mean by a triangle always being flat since the four sides of a pyramid are made of triangles and they aren't on planar surface? the bottom of course isn't a triangle
If you divide a rectqngle or square in half diagonally, you would get 2 triangles. Do that in certain ways for other shapes, and you will still get triangles. About them always being flat, it means they are always on the same plane. A triangle has the smallest amount of sides possible for it to be a 2d figure. It will always be on the same plane. For a rectangle, 3 points or vertexes could be on one plane, but the fourth one could be on a different one, making it not coplanar.
is't the big difference architecture between those (kinda forgot but in school they teach something like this ) and was named by architects who made it , i know one was cheap to make because he could not take and give information at same time( that should be cpu ) and gpu could do it but it was much more expensive to produce harvard and von neumann i think. and slower one is von neumann expensive one is Harvard
it is very simple, a cpu is designed to handle any computational algorithm you can throw at it provided the computational process capability of the cpu is adequate to do so, eg, number of cores, and processing frequency (overall speed) of cores, in a cpu as well as their general features and capabilities, which usually will always far outweigh the ability of a single gpu. A gpu is usually a single core processor with differing architecture to that of a cpu because of it`s general purpose, designed with a specific process type with which to handle or process, eg, - to compose, process and output a video or image you can view as an output on, for instance, a monitor. However, the one thing to bear in mind is that a gpu cannot perform it`s duties without the aid of a host processor, in other words, it cannot function without the aid of, or conjunctional aid of a cpu running alongside it solely, this is why when you build your own PC, you usually need to install a cpu AND a graphics card too. For the gpu to function properly as intended, it requires the running aid of the central processing unit (cpu) in order to tell the gpu how to function at all in the first place, usually without this specific set-up, your gpu, or graphics card would just be pretty much a novelty door stop, and pretty much useless on it`s own . . .😀😀
The other thing you can do extremely well on graphics hardware, is model a scene in 3d, with a large number of units, like sugar cubes. Astrophysicists want to model the universe after the big bang, or look at the growth of galaxies, or model dark matter. Meteorologists want to predict the weather 10 days out. Climate people want to know about global warming. Economists want to anticipate world trade next year. Boeing want a more efficient wing tip shape on the next airliner. The navy want a submarine with is 600 ft long and capable of travelling at over 30 mph while making almost no sound. The way a graphic processor treats a scene, as a series of data points which are pixels, works extremely well when you try to model the behaviour of other systems, like the ones I listed. This is what is behind CUDA and other efforts to use graphics hardware. It's very good at doing the stuff you need for large scale simulations.
6 років тому
I thought that the triangles were flat only in euclidean space? as 2+2 is only 4 in decimal
this guy is wrong. the gpu doesnt handle the camera. camera's are something high level and engine dependent. what really happens at the low level is that the "cam" is fixed in place and always facing the same direction. you just move your opengl objects around collectively to create the sense of moving camera.
Maybe I missed a thing. But. If a GPU does "3D" and a CPU does "1D" (or 1d over 4-8 etc threads), why are not CPU's become GPU's? Is this really a firmware issue?
No. It's because the CPU doesn't have a few thousand cores, so while the GPU can do a whole lot of work all at the same time, the CPU does 1 (or two, or 8, or 20, or maybe 80 if you're lucky and get a ton of money) at a time.
A good analogy i came up is guns. Say, bullet is a task and gun is a processor.This way: CPU is obviously a gatling gun. It offers fast shooting and fast switching by using instruction level parallelism e.g. each barrel is at different stage of loading bullet. GPU is like a mortar. It shoots slower, but hits multiple targets at once and more efficient against special targets(e.g. tanks, bunkers, vector and matrix math). FPGA is like a gun with PVC pipe barrel. You can build it that way so it hits a single target as efficient as possible, but rebuilding it for another target takes a lot of time. ASIC is like FPGA, but not rebuildable. It is even more efficient at shooting the poor bastard task manufacturer built it against, but absolutely useless against any other.
+Henri Hänninen Well... If talking about GPU the river analogy works quite well. But a CPU.. well no. Its more like a water pump... well... hmm.. the water analogy dont work on CPU... the whole point is that they is that they make a bunsh of diffrent things, not only calculation, but also decition.
Technically Correct. The way multitasking was faked before was using the cycles of a sole processor more efficiently. Remember most applications are waiting for the user to do something so with the extra processing power of a lone processor, music could be played. The only way seriously do two things at once with the same clock cycle was to add cores. So sure enough, my phone has six cores now. Smartphones now also have OpenCL compatible GPUs. Which means a computation heavy task could be sent to the GPU to complete. It seems though ZERO applications in the app store utilize this ability.
+Marcus Godiali modern CPUs are in fact capable of parallel computing. americanswan mentioned this, but im sure youve heard of CPUs with multiple cores; they can send instructions to each of the cores, which is effectively parallel computing. on top of that, each core (when speaking of Intel CPUs) generally has 2 ALUS (Arithmetic and Logic Unit) that can be operated at the same time if needed and the conditions are appropriate.
Vaes Joren tho what you said is true i don't think you can call multithreading parallel sense it's only doing one task it's more schedualed then parallel tho like tothfirytoob said multiple cores are multithreading
Intel Core 2 architecture starts with 4 integer pipelines for executing 4 integer instructions simultaneous, if the instructions are pairable and if there are no dependencies between them.
but though gpu has a lot more processing cores named shader cores(vertex shader and fractal shader),their clock speed is much slower than cpu.So overall gpu isnt fast enough for running os ,programs and other shits
They currently make separate neural cores(chips), but I guess it’s more for using the AI(already learned model) rather that training it(ML). If I am not wrong those chips are already in many new devices. But for learning I guess the best is GPU itself
One thing worth mentioning: GPUs actually do 4 dimensional matrix calculations rather than 3, because with 3 dimensions, Rotation and magnification require matrix multiplication while translation requires matrix addition. By adding an arbitrary dimension, the GPU is able to unify all three key transformations under a single multiplicative architecture.
I was scrolling down just hoping somebody would mention this!
Yes... affine transformations
@@abhishekgy38 homogeneous transformations, not affine
That's actually talked about in the one of the Triangles and Pixels videos (playlist linked in description).
Yes, he mentioned this with transparency.
CPU - 10 Ph.D guys sitting in a room trying to solve one super hard problem.
GPU - 1000 preschoolers drawing between lines.
end of the day GPU's are taking over tho .. they use gpu for computations now days anyways
@@evenprime1658 Nothing is taking over anything. GPU is also a CPU, CPU is also a GPU. Now let's enjoy TV.
Explained in the simplest form
@@DS-Pakaemon isn't GPU is a subset of CPU?
@@DS-Pakaemon they are both PUs.
"you've never seen a triangle that isn't flat."
I just came from the non-euclidean geometry video.
A triangle made of straight lines...
@@Street_Cyberman Triangles in non-euclidean geometry are made of straight lines.
You've still only seen a flat representation.
Lol
@@IronicHavoc Nah, it was drawn on an anti-sphere, so it wasn't flat... it was curved... and its angles were all 90 degrees.
Here’s a key acronym to remember about GPUs: “SIMD”. That‘s “Single-Instruction, Multiple-Data”. It has to do with the fact that a GPU can operate on a hundred or a thousand vertices or pixels at once in parallel, but it has to perform exactly the same calculation on all of them.
Whereas a single CPU core can be described as “SISD” -- “Single-Instruction, Single-Data”. With multiple CPU cores, you get “MIMD” -- “Multiple-Instruction, Multiple-Data”, where each instruction sequence can be doing entirely different things to different data. Or in other words, multithreading.
So even with all their massive parallelism, GPUs are still effectively single-threaded.
Don't modern CPUs have SIMD instructions like AVX though?
Yes, but the vectors that your typical present-day CPU operates on are short ones, with something like 4 or 8 elements at most.
Thank you, best explanation I've seen
I am an autodidactic ,and I beleive in the "first principle" of things...
I want to understand this Intuitively...
Is there any video that can properly put into perspective all these that you say in such a way I little child can understand?
@@lawrencedoliveiro9104 I am an autodidactic ,and I beleive in the "first principle" of things...
I want to understand this Intuitively...
Is there any video that can properly put into perspective all these that you say in such a way I little child can understand?....
How do these vectors work?
How are they represented, how can all these not look abstract? Human beings made these things so if someone has done it, then it's no more abstract....I want to Intuitively understand all these...
It should have to be school of the 4 walls of anywhere, I beleive if a man seeks he would find...I want to know these things...thus my everyday search of all these topics...
I just Intuitively understood how binary numbers can be used to represent any other number, but how does it translate to videos? Audios and everything we see digitally?..how does just o and 1 define everything we feel that is completely tangential to 1 and 0?
Hi missed one of the most important things. Why they are more efficient.
Running GPU load in parallell don´t make it more power efficient. It also don´t use less die surface.
Whats is done is that the GPU it just have one very narrow sett of instruction that is the same width and same length regardless. Typically it can be a 128 bit by 128 bit instruktion that can run on 2x64 bit, 4x32 bit or 8x16 bit. And it can only run a very limited number of instruction.
Also, there is no branch, and no branch prediction. This cuts away a load of piples. In a GPU every pipeline does say 128 bit of instruction for every clock regardless of instruction. If you only need 16 by 16 bit of instruction, though luck, still have to fill up the whole 128 by 128 bit register.
Also, a CPU got a FPU and a IPU and logic, and SIMD pipes. And every pipe have duplicates for bransch prediction. Some have 4 way branch prediction, so they have 4 pipelines for one instruction. So in a normal CPU you might have 4 pipes for integer, 4 for floating, 4 for SIMD and 4 for general logic (often combined with integer). Total of 16 pipes just to calculate one value. Why.. well to get the speed up. .. the clock speed that is and keep branch prediction errors down.
A GPU just does one type of pipeline (in the older days there was two different, one for transform and one for rendering, but in most modern one they are integrated in the same pipeline). Also, there is one instruction for every 128 or 256 set of data. That is if you uses 32 bit on 128 bit pipe you get a 1:4 instruction reduction... because most data uses the same instruction.
Typical 128 bit data can be a HDR rendering scen where two colors are mixed RGBA mixed with a other RGBA. Insteed of running first the R, then the G and then the B and so on. They just put the whole load in to the pipe and calculate all the data with the same instruction. Almost everything in graphics come in set of four. If there is a polygon, well triangle. It have 3 corners, and then a additional value for the polygon. So it still have 4 values.
Some Graphics load use half precision. Then you can do two pixel or two polygon calculation at the same time in the same pipe. Actually, CPU have had that capability since Pentium 3. But its still just have one output per core. (even if its a 128bit simd output, most modern CPU even have 256 bit simd)
The other part is the missing of branch .This remove a ton of problem for the chip designer. Firstly you don´t need any branch prediction. Secondly you don´t need and branching instructions and pipes. Removing the whole logic pipe. Thow you can still make branch like calculation multiplying with a matrix that gives a very set 0 or 1 value in the end matrix. But the GPU can never make any decision about it.
This is also the main drawback with the GPU. it will continually calculate the same set of work orders on and on and it will only calculate set given work order. It can´t make work order. This makes the load very predictable. The GPU know what it will do several 10 000 clocks in for hand. This helps parallellism very much. But it can´t make a instruction.
So the CPU still have to make a instruction for the GPU. But the load of the CPU can be very low. For example the CPU kan tell the GPU to render a trea of data. The trea in turn is a given list of objects, that have a given list of subobjekt, that have a given list of vertexes, that have a given list of textures and so on. This way the CPU can give the GPU very limited amount of information to make quite a lot of work. This have not always been the case. GPU prior to 2000 have to have specific list of textures and vertexes directly from the CPU, giving the CPU quite a lot of work load.
The problem nowdays is that the game dev want the word to be "living" They there for want as little amount of data to be pre fixed to a set trea of object. There for the CPU can be running in feeding the GPU with 1000 and 1000 of objects for every frame. In DX11 this is a problem, because in DX11 just one CPU core can do the GPU feeding. Someone just never thought this would be a problem. This have been in DX since the first version. Finally in DX12 it will be updated.
+matsv201 Modern GPUs do have branching, albeit very primitive, but even that's getting better due to AMDs non-synchronized shaders which allows for much better efficiency and foregoing some inherent weaknesses of SIMD. I've read some modern architectures even considering adopting branch prediction quite soon.
+matsv201 thanks for putting that much effort into a youtube comments section :O
still, do you think that much detail would've made a general explanation of GPU vs. CPU easier to understand? :D
QuantumFluxable Well. i had little to do.. but no problem, stop reading when you get bored.
You wrote so much that I assume you are right
Henrix98 Yea that tactics i usualy go for :)... actually im more likely to be wrong, the more i write....
on the other side... people that dont understand how things work usually over simplyfies them... Is really anying when someone force you to tell something complicated in a simple way
2:45 - Nearly bringing up the 'w' coordinate and quick fix by baptizing it "transparency" :P
This is one of the better simplified explanations of 3D graphics I've seen. Particularly the part about why triangles are used.
it's amazing that he didn't explain SIMD or Vector computing, because that's exactly what the difference is - and it's the real answer to why CPU and GPU are different. instead of looping over an expression of 256 array elements, you create huge registers that are a gang of 256 floating point elements. a CPU would have 256 threads progressing at different rates on each array element. a GPU would force all 256 items to be on the same instruction because it's working on these huge registers. this is why you can't run all CPU algorithms efficiently on a GPU. ie: for each i in [0,256]: c[i] = a[i] * b[i] ... is CPU thinking. Each thread progresses at its own rate. GPU thinking is: float32_times256 c,b,a; c = b * a; .... where c=b*a is one instruction, with three huge operands.
I understand all of that
Holy Cow! I've been trying to understand the fundamental difference by googling or watching content like this video and your single comment taught me more than anything. Thank you!
Modern compilers can achieve this on CPUs as well. This works like this only for independent operations
Awesome
woobly computerphile camera XDDD nice one hahaha.
Wibbly-wobbly camery-wamery
Stuff
shut your face! Don't forget Things.
Yes xD
Feels like watching the office
I didn’t understand half of what he was talking about but I could listen to him all day.
1:27 That's a suboptimal way of tessellating circles. All the triangles meeting in the center will increase the chance of visible artifacts occurring.
(UA-cam smartass answer mitigation disclaimer: I know it's not a circle and I saw that the triangles don't meet. Leaving a smaller copy of the same polygon for the center doesn't solve the problem though, does it?)
Must I put 'for illustration only' on every animation I do? :) >Sean
***** Alright, fair point. But it still gives people wrong ideas. I had to learn this the hard way. Would be nice if future generations of fools who tessellate their own polygons would be unknowingly lead to have a better intuition about this. But I guess it's just much easier to do it the way you did which, again, would be a fair point.
+Penny Lane There is also a point where if you had to learn stuff the hard way, you probably would be the one teaching those who learned it the easy way.
Not being a smarty-pats or anything.. just a bright side :)
+Penny Lane Who on earth still tesselates their own polygons? Most 3D modeling programs either have support for N-gons or can clean up this kind of mess automatically.
+Penny Lane Can you explain to me why? You got me curious and I want to know.And what would be a proper way of doing this?
We've learned all sorts of things about CPUs. Some more videos on GPUs would be really nice.
I've been told that for computer modelling, GPU's aren't always a good solution. I've been told that for certain kinds of Finite Element Analysis, you can generally only calculate one step at a time, one node at a time, as the result of one node can affect the calculation required for all the other nodes it influences, and the same is true for those nodes at each time step. So these still need high speed CPU's to do the grunt work, as the task isn't easy to parallelize.
I'm interested in this too
Does anybody have any more information about this? I always wondered how they can parallelize finite difference and coupled sets of differential equations. Because the results of one computation are the input to the equations of the next element in the mesh.
One of the features of OS X a few versions back was the ability to off-load repetitive, non-graphical tasks to the GPU rather than the CPU to operate faster.
You should have asked him exactly what role a CPU plays vs. a GPU in something like a game. What exactly does the GPU need from the CPU, etc.
+Ryan Couture In the case of a game, the CPU runs the game's logic (the rules of the game), input handling, sound mixing and AI. The GPU gets tasked with rendering the visuals and a good chunk of the physics. What the GPU needs to actually draw the scene tends to be sent to it by the CPU at load time, with the occasional update as objects fall in and out of relevance (eg. unloading the model for an enemy where all instances of it in the level are dead, loading in the projectiles for a new weapon the player just picked up).
Game physics is a good candidate for the GPU because it tends to have lots of independent entities which need to be processed. Let's say you set off a trap with a bomb hidden in a pile of boxes. It's going to result in pieces of debris flying everywhere. All those pieces of debris need the equations of Newtonian mechanics crunched for them to figure out where they end up in each frame. The GPU can crunch a thousand such pieces at once and update the list of where everything which needs to be drawn is located and oriented. Then once it's done with the physics, it can tell the CPU "Okay, I'm done with the physics. The results are in this block of memory if you need them for something else." then the CPU can tell it to start drawing while it does something else.
+Roxor128 Simply amazing.
@Roxor128: That's a brilliant explanation. Thanks.
Would it be theoretically possible to run Windows using only a GPU? (With no CPU)
Assuming you have a modified version of Windows & Motherboard for it to work...?
Frank Einstein
Depends a bit how the specific architecture is made. There may be essential logic missing on some designs since it’s not required for normal gpu workloads.
And it’d not exactly be a basic software mod. You’d need to compile to an entirely different instruction set. I know no compiler that could even do that.
Intel’s Xeon Phi can run a system though and it works very much like a gpu in many ways.
It is extremely easy to complain about a tiny glich in the game while sitting in front of a console without any clue how everything works. When u think of about the math involved in making it, it is mind blowing what a human mind is capable of.
+sami aklilu When you think about thinking about the human mind, or any animal brain for that matter, your mind should be blown at the fact that you can think about that...
it is extremely easy to fix issues before release, but they rush it. creating a game is not hard. creating a game engine is very hard, but often game dev's don't make the engine anyway and 99.9% of the bugs are not because of the engine, but made by level designers..
Can you guys eventually do a video on how a cpu works? thanks
+Mystyc Cheez cpu is easy gpu is real piece of cake
+Mystyc Cheez
There is a book called "But how do it know?". Highly recommended.
+Mystyc Cheez If you're REALLY interested in knowing how it works read something like "Computer Systems Architecture; A Networking Approach" by Rob Williams, published by Addison Wesley
+Mystyc Cheez I suppose one can't explain extensive topics like this in 5 min with a satisfying deep of details. Even explaining how transistors work can be difficult to explain in such hurry.
+Mystyc Cheez Buy "But how do it know" Explain how a CPU works, and you can buy "Logicly" to simulate the CPU
An entire episode with no explanations on how stuff actually works. Well done.
MrDajdawg , yea seriously. This video was super annoying.
What did you think this video was missing? I thought he explained how a GPU in general worked pretty well, it's essentially just a linear algebra machine.
It wasn't easy for my professor to explain it during a whole semester, why do you think it would be easy explaining it in a 6 minute video?
Perfect, I just wished he had actually said that a GPU has lots of tiny processors (compute units) physically. Yes most people know this, but given how entry-level this explanation was (on purpose), the target audience might not know it.
Very clear answer, I'm impressed :O
*GPU:* Graphic objects can generally be broken into pieces and rendered in *parallel* at the machine level.
*CPU:* Generally sequential by design to care of problems which have many *_dependencies_* which *cannot be solved in parallel* by their very nature.
Ex.
Seq1: *A = b + c;*
Seq2: *D = A + E;*
Seq3: *y = D - A;*
Obviously, you have to do the thing in *sequence* in order to get the value of *y.*
However, the *tasks* of *_relegating_* as to which problem (i.e. gaming or any other apps) during *execution time* is mainly a *function of the OS.* This is so as *Gaming Apps* is not *all* a *_graphic rendering_* task but also a combination of *logic, rules, and some AI.* Those are the reasons why you need *both.* Kinda like *_specialization_* of tasks.
Password cracking with a GPU is also faster than the CPU.
+Steven Whiting That's because when you're running through billions of permutations of input passwords and trying to find which one matches your stolen hash, you have a situation where each run is independent, so can be easily converted to a parallel form.
Chances are that with a typical high-end GPU you'd be calculating over 1000 hashes at once and if each core can do 200k per second, that's a total throughput of 200M per second.
+Roxor128 Do you think we'll need something like MD5-256 in the future?
+SirCutRy MD5 is effectively dead cryptographically. It's flawed in its design, not because its digest is too small.
Vulcapyro What are the flaws? Isn't MD5 still used in password hashing?
Zhiyuan Qi How can it be fixed? More scrambling?
I think a video on real time ray tracing would be interesting. There's a good chance that it'll be the future of graphics but it'll require different silicon to todays GPUs
Damn you predicted the future
Time traveler confirmed
I agree, what you said from the past is now happening in the present.
Watching this video on one monitor, seeing my GTX970 busy folding away on the other monitor. How appropriate...
B Snacks Not sure VRAM does a lot for folding, but if you wish...
+Robert Faber Thank you for mentioning folding, I learned something new today
Folding? as in protein folding?
am I the only one who finds the way he says pixel weird, the stress seems to be in completely the wrong place
+Ascdren
Not if you consider what it stands for.
+Ascdren he's saying it as you would if you say it as picture element pic*e*l
his accent seems of England. what language are we speaking again?
pick-SELL 😂
Thank you so much for the content and the people on the screen for their willingness to share knowledge! Bravo good sir, it takes a special kind of talent to explain something complicated this well and understandably. Exactly the kind of people you'd want to have tea/coffee/beer with!
Nice shout out to Bitcoin , but for clarification purposes Bitcoin mining on GPUs is now obsolete and most of it is done of Specialty ASICs. Mining is so competitive that If you attempt to mine with a GPU or even older generation ASIC you will lose money unless you have access to free electricity. The general point of mentioning mining on GPUs is important though as hashing double rounds of SHA256 is much more efficient on a GPU than CPU, although ASIC's are far superior to both.
For Mining - ASIC > FPGA > GPU > CPU
+sanisidrocr ASIC is to GPU and GPU is to CPU.
TheKlaMike ASIC's can be designed in any way and be less optimal than CPUs for mining if not designed for parallel throughput.
what about quantum computer?
sanisidrocr HAH! Little did they know what happens to GPU because of mining in 1 year
wow, so well explained, as simple as possible but no simpler. Its such a big area i cant believe he explained it so compactly and clearly. Hi is a real genus.
He is excellent at explaining these things. Please have him on more.
Nicely done. I use nVidia GPUs to do huge matrix operations in statistical modeling to speed up model estimation. Bayesian MCMC isn't really practical on large scales without the parallelism offered by GPU computing.
It really isn't necessary to use floating point operations but it is done with floating point operations because there is hardware that does the floating point operations very well so they are making use of that. An example to clarify what I mean is doing square roots. You can do square roots without using floating point ops you can just use integer calculations then at the end you just place the decimal point in the correct position.
I run very, very large AutoCAD files and they would take forever and a day to load. My supervisor has the exact same machine I do and he loads the same files in a fraction of the time. I asked him how and he said that he installed a graphics card. Just for displaying 2D images, the GPU makes all the difference. I got one and it is like magic.
Really nice simple explanation of a complicated area!
When you describe basically what the gpu is doing it really put into perspective how complicated modern graphics is.
2:30
Games actually don't move the camera around. Everything else gets rotated around the camera.
+DLC Spider Well not only is it implementation dependent but that's completely inaccurate for the most part.
In the vast majority of scenes, be them game or otherwise you make a scene description. Whether the camera moves coordinates or the scene moves coordinates is pretty much irrelevant to the end result, but moving MANY vertices of ALL objects in the scene takes a MUCH larger time than moving 1 point (the camera) in the scene and updating the POV vector (a second point). From there the culling and rendering happens the same as it would no matter what moved.
Moving vertices is expensive, moving the camera is cheap, though either will net you a valid result.
I adore his crystal-clear step by step explanations. Great vid!
For a VP of Technology at "ARM" this was an extremely low tech explanation. But then all of my extra parallel processing is done on a NVIDIA card. A CPU today has eight or more cores. I hope the rest of this series has some more meat to the bone. We are not morons out here.
This is a layman audience, if you want an indepth technical discussion go to a developer forum.
Such a pleasure to watch , wonderfull thanks !
Our professor on Star Clusters told us GPUs are far more effective in calculating clusters evolution. Since it is not possible to analytically derive the evolution for more than two bodies, it has to be done numerically. 'It's all just about data crunching', he said.
"wobbly Computerphile camera" (laughs)
"can it fix my bad focussing?" ... my this is a self-deprecating episode isn't it? ;)
thanks for adding your info so all humans can learn
thanks again
I do 3D modelling recreationally and I hate working with triangles.
I prefer quads because those types of polygons can be structured into bands. With those bands, a looping structure can select/manipulate multiple polygons at once.
For example, imagine designing a human model. You can run bands of quads to form cylinder parts like arms, neck etc. From there, you can manipulate entire bands at once instead of going polygon after polygon.
best explanation of gpu vs cpu I ever heard
I am currently doing "high throughput computing" :)
D-Wave made a quantum computer, get on it Computerphile. I am kind of super hyped right now.
When it comes to game development 3D in unity do you need a powerful gpu?
I've been always curious of this, thanks for a video to explain it!
very well explained, thanks for the video!
The spice must flow!
(If you don't get the joke straight away, rewatch the video.)
+Cíat Ó Gáibhtheacháin That was my first reaction, too. :-)
Yes!!!
But the only thing you didn't say is how the gpu achieves that ability to do parallel computing. Does it just have 10.000 Cpus in it?
+Embeh I suppose he means it works in parallel with the CPU
+Embeh Yes, modern CPUs have multiple cores (faster ones in the 1000s).
+Embeh It has to do with how the command is structure.
You send a command to your cpu like "multiply 3 5", "multiply 1 2",.....
when you send it to your GPU, you can send "multiply 3 5 1 2 ...." and it will run the multiplication-loop for all the values in that one request.
A CPU is built, so it can do many different calculations. A GPU is built so it can do the same type of calculation over and over again. Saving a lot of time.
+Embeh see CUDA cores :D
+Embeh Not really. A tpical CPU Core has an ALU (Arithmetical logical unit: the part, that does the calculations) and a control unit which tells the ALU what to to.
In a GPU core, there is only one control unit per dozens of ALUs, which lets them all do the exact same calculations, but on different data. (SIMD)
What about hardware wise? What makes a CPU better at single computations than a GPU physically?
+TheKlaMike Same question here. I would really like to know the physics behind all this
+TheKlaMike I believe it's because a CPU can do very long and complicated computations containing a lot of steps as its cores are larger and more complex, e.g. 'add this to this, then multiply that by this, then take this, square root it, then divide it by the previous number, then copy it to this...' etc. But when dealing with simple calculations it can only do a few at a time as CPUs tend to only have a few cores.
But each core in a GPU is quite small and simple and specialized for graphical calculations rather than general purpose computing. Hundreds or thousands of them can fit on a single chip, so while each core would need to take many small steps to perform a complex calculation that a CPU core could do in one step, it can do thousands of simple, specialized steps at once.
+shwetank verma CPU has a higher clock-speed, and a pipeline where multiple operations may be done in a single clock-cycle if they don't depend on each others' registers (they are basically decoded and rammed into the pipeline, as the other end divides them up and executes them).
each core will typically also have multiple ALUs available for parallel integer arithmetic, ... normally the core count is small, with lots of resources invested into making each core fast.
GPU tends to have a larger number of simpler (slower) cores each with vector units, and running at a lower clock-speed. each core isn't all that fast on its own, and deals rather poorly with things like branching, and can usually only do one operation per clock cycle, ....
however, if your operation consists of large amounts of parallel operations, it can be spread across a large number of cores, with each doing their own work on their own data.
instruction sets are also a bit different. for example, the x86 ISA uses operations that allow doing things like loading/storing memory and performing arithmetic at the same time. this can save clock-cycles.
the contrast is a load/store model, where load/store operations and arithmetic operations are handled separately, which combined with being limited to a single operation per clock cycle, additionally limits how fast stuff can get done.
+TheKlaMike There is a very long thread already going on about this question down below.
+TheKlaMike
Well. i suppose you mean faster. They are equally good in regard that they produce the same result. And a GPU will produce the result using less energy.
A CPU chop up the calculation in several serial bits. This make it so every bit in sequence can be done really fast making a high clock speed possible. The number of steps is usually around 12-25.
A GPU have fewer serial steps. Typically 3-10.
This ironically mean that a CPU really does not do one single calculation faster. If both just make one calculation, and that is is. Both CPU and GPU is about equally fast.
Where a CPU shines is when it need to do calculation in sequence. If the calculation depend on the result of a prior calculation the higher clock speed make it so it pops out them faster in seaqvens.. basically raw clock speed.
But this is not really what a CPU is made for. CPU is made for calculating a number then making a comparison, than taking a decision. A CPU is not only faster at this. A GPU is not even capable of doing this at all.
2:30 Almost all 3D engines have the camera static and they move everything around the camera.
I really feel like this video only gave surface level differences and didn't talk about the true hardware differences :/
+KingFredrickVI I feel like that's actually a myth. The matrix comes out exactly the same when you do it either way, so it's really just a matter of your point of view.
+KingFredrickVI Sounds false. How do you do multicam rendering like stereoscopic this way? And how do you animate camera? That would be tedious.
+Dolkarr You're right.
Am I moving the camera - and then working out where the points are in relation to the camera - or am I moving the scene - working out where the camera is in relation to the points?
Same difference.
Am I adding 3? Or am I subtracting minus 3? Or am I subtracting the subtraction of positive 3 from zero, from zero?
Relativity. It's really all the same thing.
+Ahmed M.AbdElMoteleb (SAIKO) The topic of culling isn't relevant to the discussion about whether or not the objects are transformed or if the camera is transformed. Culling is the process of deciding which objects that are invisible so that you do not have to fetch, transform, rasterize and shade them and therefore are not considered when rendering each object onto the camera.
+KingFredrickVI This is absolutely true. It's to deal with the problem of floating point inaccuracies at large scales, and it's called the Floating Origin method. Any game over a certain seamless world size pretty much has to do it this way. Otherwise, as the player gets farther and farther away from 0,0,0 in world coordinates, objects in the world will start to move away from their positions as the amount of precision you can use decreases over distance, and they basically snap to the closest possible precision coordinate they can use.
So, if you want to simulate say an entire universe or galaxy or a solar system, like Celestia, or Elite, or KSP or whatever, or an entire real sized planet or even just a largish region, you have to use floating origin where the world moves around the player, and keep track of the player location and object locations in a separate, more precise manner (usually a large fixed point precision number), and translate between the positions in this overall world/universe model and positions in the game view model, using scaling tricks and what not to make far away objects appear far away even though they're actually relatively close by.
The problem with making an engine with higher precision numbers for the vector locations is the more precise you make the numbers, the harder it is on processor and the slower the math is performed, until you get to the point that you simply can't simulate a large world without really keeping the number of objects which calculations need to be performed on to an absolute minimum.
I know for single precision floating point numbers, things start getting screwy over about 10000 units away from the origin. A double gets significantly larger precision, but still, won't work very well on scales much above a solar system.
Usually, rather than constantly moving the world around the player, which would be heavy on the processing power since you have to individually relocate each object in the coordinate system, you set a boundary where the player moves around in the coordinates but when they reach a certain distance from the origin, the player snaps back to the origin and everything in the world moves with them.
Gpu --> graphic rendering independently of cpu
Cpu --> take user input and give it to gpu to draw the picture on the screen..
That's cool..
What is "vlar" and "foo" by the way?
I'm using a Tesla GPU at my university for accelerating a certain physics simulation. Works quite well, I'm getting a speedup of around 10, compared to normal CPU processing.
Graphics are just one possible application for the use of GPU's but remember that GPU's are designed as better number crunchers than CPU's. Possible uses are password cracking, calculating PI, Chaos theory, stock market prediction or anything that requires highly parallel computing. The guy being interviewed seemed to be only focused on graphics work which only the tip of the iceberg.
Just use a quantum computer instead when they get more common.
with chaos theory is it fear to state machine will be one day be alive on exploiting that field, look the insects like bedbug are too small yet alive, could the principal of life be all about neurons or atoms arranged to achieve chaos in oscillation, with signals of data in the respective body.
Putin one said the first to build AI will rule the world. What he didn't figure out was once it has been understand/masted by any one, that person will rule the AI, therefore the first one to understand or figure out AI will actually rule the world. its going be easy to manipulate AI while you don't have those expensive servers. Just access to AI will be more than enough, with internet of thing you can imagine how powerful a one man army will look like.
And why do programs like C4D use the Processor for rendering a 3 dimensional space?
+DevilsCookies Not sure what program you're talking about, but some rendering techniques don't work that well on GPUs. They suck at recursion and branching, which are key components of ray-tracing. Not to say that GPUs can't do recursion or branching. They can. They're just much slower at it than the CPU is. Enough so that when writing code for the GPU, you want to avoid branching whenever possible.
If you're rendering a sphere in a ray-tracer, you need an if() statement in the intersection test to check if the quadratic formula has a negative discriminant, which indicates a miss.
C4D = Cinema 4D
and well, i thought 3D Animations is rendered faster by a GPU, because it's graphics you are rendering, and the GPU is a graphics processing unit.
"you've never seen a triangle that isn't flat."
mmm, what about Spherical triangle
Spheres are usually being rendered smoothly. Nonetheless it consists of very tiny triangles which the processor smooths out
Recently I made the mistake of trying to produce an entire 3d graphics engine from scratch in pjs, all starting with the following function:
var raster = function(x, y, z) {
return [x/z, y/z]
};
This function allows me to take any point (x, y, z) and map it to the screen. This is the most vital, core component of the entire program. I eventually added cameraPos to the function so I could move anywhere. But there was one fatal problem. I could not rotate anything. I attempted to produce a function that took in 4 vectors (newx, newy, newz,) and (x, y, z) to produce a new imaginary grid that treats newx, newy, newz like x y z and transforms the point. I then tried to produce a function that generates 3 of the vectors the previous function calls for based on pitch, yaw, and roll, but I have yet to get it working.
Triangles can exist on curved surfaces. You can even have a triangle with 3 right angles when placed on the surface of a sphere.
good video !
i hope there will be more detailed videos about gpus und cpus !
The white pixel moving over the turned off monitor on the left is getting on my nerves... D:
In what situations are objects modelled? Would it only be in CAD situations ?
“ That is coming”, prophetic words.
Cuda; the best thing for the advancement of Machine Learning Research ever.
+Dylan Cannisi CUDA, the best thing for lots of things... It helps gamers, but it also helps high performance computing people. The stuff you need to model the Big Bang, is the same as what you need to play Quake at 8k and 200 fps.
+Kneedragon1962 CUDA is a programming language API for C/C++, Fortran, etc. You wouldn't use it to program graphics. It stands for Compute Unified Device Architecture. It unlocks you the ability to to do mathematics on vectors and other small computations. I'm just saying what I use it for, it allows me to build complicated models, mostly SVM's, I wouldn't be able to do on a CPU because it would take to long.
Dylan Cannisi CUDA is a language and a suite of tools that allow you to do programming on the video hardware, which is not usually accessible to the programmer for general purpose computing. It provides access to hardware that is not normally used this way or accessible to a general software programmer. And it provides some tools for very wide parrallelisation of code. Many programming languages support parallel programming, multiple threads for example, but they anticipate a fairly small number of threads. CUDA supports a whole different view of this. I was at TAFE in'95 when nVidia started talking about it, and the Linux people were starting to use WolfPack to do distributed processing. We have a blanket term for this stuff today, we call it cloud computing...
*+Dylan Cannisi* Yes, Nvidia did an amazing job with CUDA. It's unfortunate that it's not a hardware agnostic standard though. Doesn't seem right to have vendor lock-in for something of this nature.
Adrian Right back at day one, nVidia tried very hard to sell the concept and make it an industry standard, they were shouting it from the rooftops while I was at college, but the various other manufacturers were lukewarm, and they all wrote their own proprietary versions, which all faded into the past like Flash player...
Anyone else notice this madman believes HE is a GPU? Great video!
Please start adding subtitle !
that feeling when you're explaining to people that it's a $1000 card that just draws virtual triangles.
I guess they're pretty good at adding and subtracting floats and doubles, but the triangle drawing is where the action is.
Normalize your audio.
Normalize your face.
Nopiw
My speakers are fine.
Nopiw Nah
Most of the modern display devices provide a secondary display output for to use a second monitor. But how to display the content of the secondary linear framebuffer on the secondary monitor using a different resolution as the primary monitor for a selfmade bootmanager(no display driver loaded) with and without an UEFI-Bios?
can u please explain what is the difference between computer machine and computer device??
So I need a better gpu for gaming computer?
"Specialist" versus "General" computing?
PS, after watching the video, the parallel computing is also a lot more important in a GPU. :)
Hii sir.. can you help me......
I have amd A10 R7 7800 processor and amd radeon 2 GB graphic card model HD6450.....
Soo can you suggest me they both work together and give best performance or graphic card is useless with this processor.....
can Computerphile do an episode on algorithmic art/music?
They also use GPU 's in Lattice QCD calculations.
You should tweak your audio and reduce the heavy bass more often, half of the videos on this channel want to blow my woofer.
you have just got a follower
I'm ot sure what you mean by a triangle always being flat since the four sides of a pyramid are made of triangles and they aren't on planar surface? the bottom of course isn't a triangle
If you divide a rectqngle or square in half diagonally, you would get 2 triangles. Do that in certain ways for other shapes, and you will still get triangles. About them always being flat, it means they are always on the same plane. A triangle has the smallest amount of sides possible for it to be a 2d figure. It will always be on the same plane. For a rectangle, 3 points or vertexes could be on one plane, but the fourth one could be on a different one, making it not coplanar.
pyramids are made of multiple triangles though. One triangle will always fit in one plane.
is't the big difference architecture between those (kinda forgot but in school they teach something like this ) and was named by architects who made it , i know one was cheap to make because he could not take and give information at same time( that should be cpu ) and gpu could do it but it was much more expensive to produce
harvard and von neumann i think. and slower one is von neumann expensive one is Harvard
Computerphile seem to be on a boat?
it is very simple, a cpu is designed to handle any computational algorithm you can throw at it provided the computational process capability of the cpu is adequate to do so, eg, number of cores, and processing frequency (overall speed) of cores, in a cpu as well as their general features and capabilities, which usually will always far outweigh the ability of a single gpu.
A gpu is usually a single core processor with differing architecture to that of a cpu because of it`s general purpose, designed with a specific process type with which to handle or process, eg, - to compose, process and output a video or image you can view as an output on, for instance, a monitor.
However, the one thing to bear in mind is that a gpu cannot perform it`s duties without the aid of a host processor, in other words, it cannot function without the aid of, or conjunctional aid of a cpu running alongside it solely, this is why when you build your own PC, you usually need to install a cpu AND a graphics card too.
For the gpu to function properly as intended, it requires the running aid of the central processing unit (cpu) in order to tell the gpu how to function at all in the first place, usually without this specific set-up, your gpu, or graphics card would just be pretty much a novelty door stop, and pretty much useless on it`s own . . .😀😀
The other thing you can do extremely well on graphics hardware, is model a scene in 3d, with a large number of units, like sugar cubes. Astrophysicists want to model the universe after the big bang, or look at the growth of galaxies, or model dark matter. Meteorologists want to predict the weather 10 days out. Climate people want to know about global warming. Economists want to anticipate world trade next year. Boeing want a more efficient wing tip shape on the next airliner. The navy want a submarine with is 600 ft long and capable of travelling at over 30 mph while making almost no sound. The way a graphic processor treats a scene, as a series of data points which are pixels, works extremely well when you try to model the behaviour of other systems, like the ones I listed. This is what is behind CUDA and other efforts to use graphics hardware. It's very good at doing the stuff you need for large scale simulations.
I thought that the triangles were flat only in euclidean space? as 2+2 is only 4 in decimal
this guy is wrong. the gpu doesnt handle the camera. camera's are something high level and engine dependent. what really happens at the low level is that the "cam" is fixed in place and always facing the same direction. you just move your opengl objects around collectively to create the sense of moving camera.
Maybe I missed a thing. But. If a GPU does "3D" and a CPU does "1D" (or 1d over 4-8 etc threads), why are not CPU's become GPU's? Is this really a firmware issue?
No. It's because the CPU doesn't have a few thousand cores, so while the GPU can do a whole lot of work all at the same time, the CPU does 1 (or two, or 8, or 20, or maybe 80 if you're lucky and get a ton of money) at a time.
Would it be easier to say that the table top is a circle, rather than a bunch of triangles?
A good analogy i came up is guns. Say, bullet is a task and gun is a processor.This way:
CPU is obviously a gatling gun. It offers fast shooting and fast switching by using instruction level parallelism e.g. each barrel is at different stage of loading bullet.
GPU is like a mortar. It shoots slower, but hits multiple targets at once and more efficient against special targets(e.g. tanks, bunkers, vector and matrix math).
FPGA is like a gun with PVC pipe barrel. You can build it that way so it hits a single target as efficient as possible, but rebuilding it for another target takes a lot of time.
ASIC is like FPGA, but not rebuildable. It is even more efficient at shooting the poor bastard task manufacturer built it against, but absolutely useless against any other.
Cool video.
This video has aged well. Can we have an update please.
When you tell a joke to an engineer and he takes it seriously and fixes it instead. XDXD
- 6:16
Soo... Cpu is one fast flowing river, while GPU is many smaller brooks, and both can carry the same amount of water but in different ways?
+Henri Hänninen Well...
If talking about GPU the river analogy works quite well. But a CPU.. well no. Its more like a water pump... well... hmm.. the water analogy dont work on CPU... the whole point is that they is that they make a bunsh of diffrent things, not only calculation, but also decition.
The GPU carries more water, definitely, if all of the brooks are flowing.
So a C.P.U is't capable of parallel computing? ( sorry if this is a stupid question, I just want to know the answer) .
Technically Correct. The way multitasking was faked before was using the cycles of a sole processor more efficiently. Remember most applications are waiting for the user to do something so with the extra processing power of a lone processor, music could be played. The only way seriously do two things at once with the same clock cycle was to add cores. So sure enough, my phone has six cores now. Smartphones now also have OpenCL compatible GPUs. Which means a computation heavy task could be sent to the GPU to complete. It seems though ZERO applications in the app store utilize this ability.
+Marcus Godiali modern CPUs are in fact capable of parallel computing. americanswan mentioned this, but im sure youve heard of CPUs with multiple cores; they can send instructions to each of the cores, which is effectively parallel computing. on top of that, each core (when speaking of Intel CPUs) generally has 2 ALUS (Arithmetic and Logic Unit) that can be operated at the same time if needed and the conditions are appropriate.
Vaes Joren tho what you said is true i don't think you can call multithreading parallel sense it's only doing one task it's more schedualed then parallel tho like tothfirytoob said multiple cores are multithreading
Intel Core 2 architecture starts with 4 integer pipelines for executing 4 integer instructions simultaneous, if the instructions are pairable and if there are no dependencies between them.
but though gpu has a lot more processing cores named shader cores(vertex shader and fractal shader),their clock speed is much slower than cpu.So overall gpu isnt fast enough for running os ,programs and other shits
I need help on upgrading my PC: GeForce GTX 950 AMD FX(tm)-4300 Quad-Core Processor 16GB RAM What should i upgrade next? CPU or GPU?
cpu
Gpu
***** yeah, but I'd get a better gpu and a slightly better cpu
Felony First thing is cpu for sure
***** I geuss I don't know to much about bottlenecking, got an i7 that'll last me for the next few years lol
A light field camera will help your focus problems. And a tripod.
By the time this video was uploaded you couldn't mine bitcoins with a GPU anymore. They use specialized ASIC chips for that nowadays.
i didnt understand anything... why are gpus better for 3d? or matrixes? i dont get it...
so whats the difference ?
Still at a higher level of abstraction to me. :p
Why not just mount the camera?
GPU sounds ideal for machine learning.
They currently make separate neural cores(chips), but I guess it’s more for using the AI(already learned model) rather that training it(ML). If I am not wrong those chips are already in many new devices. But for learning I guess the best is GPU itself