Thanks for a great video. I worked at Synnex a few years ago doing Post Production Testing of Smart NICs of AWS Nitro Cards. The CPU used was a custom design from Annapurna Labs manufactured by TSMC. The management at Synnex is a motley collection of jerks but the Technical Staff is a bunch of sharp dudes. I had a blast and learned more than I could ever imagine.
Isn't that always the way. The question is: what happened next that replaced this on the mainframe. Were seeing some interesting ideas such as compute on memory, are we going to see a degree of look ahead so that the memory works out what the cache is going to want before the cache asks for it??
Exactly. And interestingly, all these DPU things seem more like a souped-up glorified Raspberry Pi that plugs into a PCIe slot rather than running as a standalone box or board in its own housing. This video actually made the whole thing make better sense to me from a use case standpoint, especially the part where Patrick is explaining the whole licensed core dilemma where some of those CPU cores that would otherwise go toward hosting more users' VM sessions instead end up handling the infrastructure, communications and "operator console" functions that could very easily go to a separate coprocessor system without issue. Really, my only problem seems to be in the choice of terminology the "industry" decided would become common parlance. Based on what this video has taught me, "DPU" is way too vague. Anything could be called a "data processing unit" (Because, of course they process data, but WHAT data is that exactly?). These things should be called Infrastructure Management Coprocessors, or IMCs, as that terminology is more accurate to the function of the devices in question. But, that's just my opinion as a non-engineer, non-IT technician, average Joe Q. Public commenting on UA-cam videos and forum threads.
Or the Commodore Amiga architecture with what could be called the first accelerator based architecture thanks to it's way ahead of time chipset that made it the miracle it was for the time! (And of course, ahead of time on the consumer computer market not the mainframe market)
For a while there performance capabilities were ahead of even service provider demands, so you could just offload everything onto the CPU in the form of proprietary driver packages, and have the bare minimum hardware on an add-in card doing the actual interface work over DMA. But now performance gains have fallen behind scale service demands, so offloading tasks to other processors to maximize CPU availability is necessary again. So the hardware companies are scrambling to meet the new demand, this is why we're seeing T1000s being used to accelerate databases, Intel's external AVX and Database accelerators (which was born from Xeon Phi, and in turn gave birth to Intel Arc), the sudden explosion in focus on hardware AI acceleration (Tesla, Coral, Greyskull, Grok, etc), smart network cards, and even the return of hardware RAID cards (in the form of beefed up PCIe switches with an integrated storage management layer).
Glad you did this, as I was having trouble classifying these new cards. Yes, they seem like a segment, and I can see use-cases ... or could be re-purposed. Good to hear I am not the only one confused. :)
"Data Processing Unit" might be the vaguest possible name for a computing device and I'm baffled about how the industry came to agree on that nomenclature.
Yeah, I second your motion on that. How does Infrastructure Management Coprocessor, IMC, sound? In my opinion, IMC sounds a LOT more like what these devices are doing. Of course, they are also like glorified Raspberry Pi boards, but which plug into a PCIe slot instead of being a standalone device, so maybe they can also be called Console Control Processors, or CCPs. Just a thought.
Are there any processing units that do not process any data at all? lol Xilinx uses the same name for their "Deep Learning Processing Unit," an IP-core for ML. Now I can see it makes more sense than Nvidia's naming.
arm/risc-v cores combined with an unlocked FPGA co-proccessor is the solution i prefer most because it enables end user reconfiguration of the types of hardware acceleration along with the ability to update said acceleration capabilities down the line via software updates.
agreed that flexibility is really exciting Heck one cool feature I saw was eBPF being turning into FPGA programming allowing for accelerated networking at the smartnic level (so potentially something like Cillium could use to accelerate Kubernetes based firewall and networking in general!)
Seeing you categorizing these parts and coming up with a definition to distinguish between various similar but unequal parts reminds me much of my law work. :D
This definitely feels like we're going full circle - started with mainframes, then back down to desktop style servers, and now going back to mainframes... except this time, the mainframe can be infinitely expanded, almost Lego-style, by tacking another "building block" server in, and just linking it into the fabric
So in essence these are hypervisor accelerators. Reminds me of the IBM's "scale up" pSeries architecture with service processors managing the system -- though these service processors are definitely *not* in data path.
I really understood what was the point for DPU when I understood the motivation is Data Centric Architecture, vs. offloading processes, it's really a conceptual shift
History might not be on their side when it comes to the "push everything to the edge" philosophy. I'm all for hardware offloading but there is a point of diminishing returns. And maybe adding a full server to an AIC just to do TCP and IPSEC/TLS acceleration is falling into that ballpark. Not that long ago in the earlier days of cloud computing (mid 2000s) there was a push to move processing to the network edge and I'm feeling like there's some repeating themes with DPUs. A good example was SSL accelerators. Expensive boxes promising to take heavy cryptographic processing off your servers so you could process more transactions with fewer servers. Neato! What happened of course was those edge boxes got overloaded and slowed down traffic, and unlike servers were not easily upgradable, they were also yet another failure point and security target and introduced extra hops (latency). Turned out burning a couple of cores on your CPU for SSL was cheaper, used less power, was easier to manage, and ever faster cores were being added to CPUs so quickly it stopped mattering. Looking at the sheet for the BlueField-2 I see nothing a CPU can't do quite well: AES, SHA, TLS, RNG, Compression/decompression. I have to say I do find the idea of regular expression acceleration quite cute because I live and breathe regex but honestly, when your traffic scales up what do you think is going to crumble first, your 16-128 core Xeon/EPYC server, or the 8x Armv8 A72 cores on your NIC (the same cores as in a Raspberry Pi)? Can anybody suggest an application where they've thought; I really need extra CPUs to handle this specific workload and the best place for those CPUs would be in a low power ARM server on my NIC? I suppose time will tell if I'm speaking from experience or a failure of imagination.
You can bag the xilinx smartNIC accelerator cards on digikey. It's more set to run acceleration algorithms on the FPGA, and has no ARM cores. The HBM variant isn't all that expensive either, a few thousand last I checked.
@asdrubale bisanzio fair enough Looking for budget sources is challenging for these, but I'd expect some of the earlier accelerator dev boards that were snapped up by the crypto miners to come on the market in a few years. I think the popularity of that board is what lead to the alveo line. I mentioned the HBM alveo because it's far less expensive than any of the HBM enabled virtex us+ line. It's also one if the few that have PCIe 4, and in CCIX mode at that. Only x8 lanes, though, but it has two ports pinned to the x16 edgecard connector.
Is one of the key requirements for the DPU definition that it has the ability to connect to an ethernet network? If so, then the term DPU is probably too general to describe this specific use case. For example, if someone put a GPU + Arm CPU combo on a PCIe card and stuck it in a server, where it can act as both a PCIe root and endpoint (simultaneously), but does not have a network connection, then shouldn't that also be considered a DPU? Another example would be one of the Xilinx Versal FPGA cards which have an ARM A72 + FPGA + AI accelerator, but don't necessarily need to have a network connection (I think they do, but that would be more of an out-of-band port). Could that be considered a DPU?
High speed networking is one of the requirements in the video and on the site, but not specifying Ethernet only at this point. Actually the BlueField-2 DPUs can do InfiniBand as well
@@ServeTheHomeVideo Then I think that DPU is too general of a term here, where Smart NIC would probably be more precise. It's sort of along the lines of calling GPUs "SIMD Processors", but then requiring anything called a "SIMD Processor" to be able to render 3D graphics (i.e. something like a Phi wouldn't fit into the definition of "SIMD Processor"). To me it seems like DPU should describe some sort of asynchronous accelerator, where a GPU would not be a DPU, but one of the Phi cards would be (even though they have no networking capability as I recall).
Or is the thought more along the lines of Data Processing == Data Stream Processing, and is abbreviated? i.e. streaming data over a network (I think that would be fundamentally different from something like a Phi card).
@@ServeTheHomeVideo Sorry, I watched that part and forgot about the distinction. And by your explanation, SmartNIC doesn't adequately describe these accelerators. I'm not sure that DPU is precise enough though considering your exclusionary definition, regardless of if the industry is standardizing around it (the industry has standardized around plenty of poorly defined definitions before, mostly for marketing purposes). With that standardized definition though, how would you define something that looks similar to a DPU, but isn't meant for high speed networking or packet processing acceleration? Something like a SoC with an AI accelerator in it for example, which may have an out-of-band access port, as well as PCIe root + endpoint connectivity? If I recall correctly, Xilinx has a new Versal card that pretty much meets that definition, where the FPGA is not programmable, and instead acts more like a google TPU with an ARM core running Linux.
Hey Patrick just wanted to give some info as I just did this for my friend.....to verify tour UA-cam channel you need to navigate to the help and feedback menu>verify your account ...it will redirect you to their web page where you have to fill out some formalities and it will do the rest
So on the one side you have the idea of moving a DPU into your server as an additional Layer in the stack, which is happening in the "general purpose" server space, while on the other side you have the NICs moving *closer* to the workload, even be included directly on the die of the workload chips themselves, cutting out the middleman of PCIe and an external NIC. I feel only one of these should be called a DPU, since they're so fundamentally different.
I think the future lies in general purpose FPGA like accelerators like the Xilinx Versal chips, connected to a high speed interconnect bus. So for example for TLS acceleration, the CPU would configure the versal cores to do AES encryption / decryption. If a TLS tunnel needs to be accelarated, the CPU would instruct a "SmartNIC" to DMA all packets on that Connection to the Accelerator card, where the decryption/encryption happens. As the Versal Cores have a pretty configurable pipeline, you could use the same Card / Accelerator to do AI inference as well, by just reconfiguring it on-the-fly / using some cores on it for Task A and some for Task B and such.
situation: there are 14 competing standards. 14 !? Ridiculous .. we need to develop one universal standard that covers everyone's use cases ! Situation: There are 15 competing standards !
Trying to crunch this into basic English, the only thing I can put to this is like a "system-on-a-card" design that made seldom appearances in the past, but scaled way down in size but also designed to do a single-purpose a part of a larger deployment when deployed in-numbers. I suppose it's like a loose mash-up on the Raspberry Pi and Intel Compute Element cards?
@@ServeTheHomeVideo I have one on pre-order. They've said it's a full 2004 (so it should be adressable) but it also acts as network card for the device/server. Want to see if Proxmox can see those 2x25Gb ports
Have you had any luck getting DPUs to run on windows 11 pro for workstations or a sinilar solution to access high speed remote storage without using cpu resources?
If you are just doing remote storage, NVMeoF and RDMA do not require a DPU. We do have a guide to getting the BlueField-2 DPU working on Windows 11 Pro www.servethehome.com/hardware-arm-linux-on-x86-windows-11-how-to-get-nvidia-bluefield-2-dpu-running-on-windows-11-pro/
@@ServeTheHomeVideo I'd like to be performing compute intensive operations over some network medium that offers speeds as close as possible to native pcie 4.0/5.0 attached storage In your experience would you recommend going bluefield or just the connectx?
This whole idea seems counter productive in a time cpu cores are the getting cheaper and more dense. It's adding complexity and a part that is very much static and not scalable. That's not even getting into the security aspect. I just can't see the advantage except for very special workloads that only Microsoft or Amazon would deal with. The average enterprise would never use these to full capacity and even if they do its going to be cheaper and easier to throw x86 cores at it.
Do you see these moving into enterprise market? As I see it, this is targeted at hyperscalers and has little to no application elsewhere. Your average datacenter or office (which is 10G/25G max usually) on VMWare doesn't need these speeds.
@asdrubale bisanzio sure, hyperscalers have their own software stacks. I meant like project Monterey. Yeah it’s cool that you can run esxi and vsan there but who actually needs that? DPUs are needed for 100gbit and up with things like nvme-oF. Maybe some enterprise storage solutions will use it but actually buying DPU for your average server? I just don’t see it. And what do you mean dell is dropping it?
You have to remember 2022 and beyond chips and systems are much bigger. 100GbE is now fairly cost effective to use and VMware will be a clear use case. VMware needs DPUs to really do more a more hybrid cloud like model.
@@ServeTheHomeVideoI don't know, I don't think systems will get much bigger. We're at the thermal limits already and the only way forward is horizontal scaling. Hyperscalers deploy 1U systems for a reason. And then you no longer need fat links. Regular hosts rarely saturate even 10gbit. Only exception is usually storage systems. There DPUs make sense. Running Ceph OSDs on DPU - yes please. Although, current DPUs are too slow for that. You need pretty beefy servers to run NVMe based ceph cluster. Regarding hybrid cloud, you mean using DPU for disaggregation of resources across local and cloud? That makes sense. But seems likes DPU is a huge overkill just for that. You really need just a simple bridge from PCIe to the network. Everything else is fairly useless as links between local and cloud are slow and need little resources to handle usual tasks like firewall, encryption.
@asdrubale bisanzio VMWare is doing very well. The split from Dell is probably some business decision that doesn't affect anything. They keep growing and they're still leading the virtualization market. Vsphere is on another level compared to any other product. Especially when you pair it with vsan and nsx. But I agree, my feeling from working with it and looking at limitations (there're specific limits to host many hosts and VMs you can manage), it is pretty much for virtualization of your "average server". They try to get a slice of other markets like recent integration of kubernetes control plane, support for GPUs, optane but I rather doubt many people run it in setups where DPUs would realize their full value. It just doesn't make sense to me to pay so much money for it at that scale. If you're that big you probably can afford building something your own using open source products like openstack, proxmox, opennebula. If you even need virtualization at that point.
@asdrubale bisanzio it just seems VMware is the only one who provides the whole package and it just works. Sure there’re solutions like proxmox but they’re not there in terms of performance, reliability, ease of use, features etc. Yeah, you can’t really compare proxmox and openstack. The latter is definitely more of a collection of building blocks than a solution. I more meant that with open source it means you’re building it on your own. No HCLs, no support, no guarantees of any kind. It’s a big risk and for some it makes sense to buy VMware and be done with it. Just the storage alone. For any kind of serious deployment you would want ceph. Managing it is a big task on its own. Even selecting hardware can be non trivial.
I am working in a telecommunication company which develops 5G Base-Station and i am working as a developer in PHY Layer. We are planning to buy a DPU. So which metrics we should be care about ? Marvell , Nvidia-Bluefield-3 , AMD Pensando , Intel IPU etc. do you have any recomendation or guideline ? Thank you for your supportive video by the way ?
Thanks for this high level overview of what DPUs are. I DO wonder though - that if it has a PCIe 4.0 x16 slot, that means that you still need a host systems for said PCIe slot, don't you?
@@ServeTheHomeVideo But what I mean is that you still need a system to provide the PCIe x16 slot that this card connects to, don't you? Or are you able to run it WITHOUT plugging it into a PCIe x16 slot on SOME sort of a host system? (i.e. you give it power, and away you go, e.g. you can run it off a GPU crypto mining riser.)
DPUs could be practical even for consumer applications. What if you could literally rent a GPU from the cloud, have it recognized as an actual GPU, and use it to accelerate (high latency) classic desktop applications, e.g. rendering?
Very refreshing! It would be interesting to see the optimum setup including price for a startup Kube cluster. For example, is it better to buy four 1U dual 16 core 256 gig 10 SSD servers or for around the same price, a single server with three DPUs to use to run web APIs?
So this is the end of H/W RAID controllers? What is that ultimate DPU on-host device for use with blazing fast edge-side read-intensive PMEM storage that is just being rolled out in the industry? Any takes, Patrick?
Y would go for the marvel solution like 10 in a cluster or even more a stream device with 1 class for example if you want to convert multiple gpu outputs in optical solutions so you won't need to have direct connections 😉
Okay. I think I MIGHT have it now. So, basically a "DPU" is intended to be an infrastructure element, separate from the main CPU in a server, but serving somewhat of a limited-function general purpose processing role as a coprocessor to keep the infrastructure-related computational tasks out of the main CPU cores so as to provide "maximum resources" to the users who are accessing (or trying to access) the server's main CPUs for their workloads. In simpler terms, the so-called "DPU" acts as a management coprocessor for handling all the networking, communications and security tasks specific to that use case so as to NOT occupy ANY main CPU cores with those tasks. Then why can't we just call it an Infrastructure Management Coprocessor, or IMC? Why the hell call it the vague term of "data processing unit" if its function is actually something specific in a similar manner to a GPU being specifically designed for processing data and instructions relating to graphics, animation rendering and video effects? THAT is my real issue with "DPU". It isn't the "what" or "why" so much as the terminology itself I take exception to. I don't like things being vague. I don't go around calling the GPU in my laptop a "data processing unit". The thing processes a specific kind of data, not just "data" generally. You have to call these things what they are. You positively CANNOT be vague, else you wind up confusing technician and end customer alike, and nobody gets anything done right when that happens.
I could care less about the networking aspect overall, but to add multiple hardware hardware platforms (x86/64 metal w/ARM add-in cards) into the resource pool of a hypervisor would be insane, but that seems too simple and will likely never make it to market...lol
💘 this channel. However this video dunno . . . Consultant? . . . Reporter? . . . The five column pictorial puts downward pressure on sales for products on the right half of that slide. Bean counters now are armed with that slide. Kindest regards, friends and neighbours.
Hm... After 1/3 of this video I still do not know what is a Deepee Yu. A search on the web shows: Its a Data Processing Unit. And than the speaker explains the DPU while using a plethora of another terms, of which most are unknown to me. I cannot escape one conclusion: I have become a Dinosaur. The speed in which new ideas, and than true to form, 3-6 letter acronyms denoting these, is faster than my puny capacity to absorb them.
You have some great content in your videos. But I just can't watch them due to your video editing. Every few seconds the video just glitches/jumps where you cut content together. I find it super distracting.
Arm powered add-in cards for x86/64 ? Perhaps things such as the Hackintosh realm won't die ? All hail Virtualization, multiple-platform enabled boxes and of course, Donald Trump ;)
Thanks for a great video. I worked at Synnex a few years ago doing Post Production Testing of Smart NICs of AWS Nitro Cards. The CPU used was a custom design from Annapurna Labs manufactured by TSMC. The management at Synnex is a motley collection of jerks but the Technical Staff is a bunch of sharp dudes. I had a blast and learned more than I could ever imagine.
Throwback to the mainframe days with everything having its own processor or coprocessor,etc
Isn't that always the way. The question is: what happened next that replaced this on the mainframe.
Were seeing some interesting ideas such as compute on memory, are we going to see a degree of look ahead so that the memory works out what the cache is going to want before the cache asks for it??
Exactly. And interestingly, all these DPU things seem more like a souped-up glorified Raspberry Pi that plugs into a PCIe slot rather than running as a standalone box or board in its own housing. This video actually made the whole thing make better sense to me from a use case standpoint, especially the part where Patrick is explaining the whole licensed core dilemma where some of those CPU cores that would otherwise go toward hosting more users' VM sessions instead end up handling the infrastructure, communications and "operator console" functions that could very easily go to a separate coprocessor system without issue. Really, my only problem seems to be in the choice of terminology the "industry" decided would become common parlance. Based on what this video has taught me, "DPU" is way too vague. Anything could be called a "data processing unit" (Because, of course they process data, but WHAT data is that exactly?). These things should be called Infrastructure Management Coprocessors, or IMCs, as that terminology is more accurate to the function of the devices in question. But, that's just my opinion as a non-engineer, non-IT technician, average Joe Q. Public commenting on UA-cam videos and forum threads.
Then physical security was often enough. Now we have to update firmware, microcode, software on all these components...
Or the Commodore Amiga architecture with what could be called the first accelerator based architecture thanks to it's way ahead of time chipset that made it the miracle it was for the time! (And of course, ahead of time on the consumer computer market not the mainframe market)
For a while there performance capabilities were ahead of even service provider demands, so you could just offload everything onto the CPU in the form of proprietary driver packages, and have the bare minimum hardware on an add-in card doing the actual interface work over DMA. But now performance gains have fallen behind scale service demands, so offloading tasks to other processors to maximize CPU availability is necessary again.
So the hardware companies are scrambling to meet the new demand, this is why we're seeing T1000s being used to accelerate databases, Intel's external AVX and Database accelerators (which was born from Xeon Phi, and in turn gave birth to Intel Arc), the sudden explosion in focus on hardware AI acceleration (Tesla, Coral, Greyskull, Grok, etc), smart network cards, and even the return of hardware RAID cards (in the form of beefed up PCIe switches with an integrated storage management layer).
Glad you did this, as I was having trouble classifying these new cards. Yes, they seem like a segment, and I can see use-cases ... or could be re-purposed. Good to hear I am not the only one confused. :)
When you mentioned the audio jack at 19:54 my brain added in the sound of dtmf dialing and modems negotiating the connection parameters.
"Data Processing Unit" might be the vaguest possible name for a computing device and I'm baffled about how the industry came to agree on that nomenclature.
Yeah, I second your motion on that. How does Infrastructure Management Coprocessor, IMC, sound? In my opinion, IMC sounds a LOT more like what these devices are doing. Of course, they are also like glorified Raspberry Pi boards, but which plug into a PCIe slot instead of being a standalone device, so maybe they can also be called Console Control Processors, or CCPs. Just a thought.
Are there any processing units that do not process any data at all? lol
Xilinx uses the same name for their "Deep Learning Processing Unit," an IP-core for ML. Now I can see it makes more sense than Nvidia's naming.
At least Intel calls them IPU (infrastructure processing unit).
Oh man, you prepared amazing material, and I thought it would be a bit boring but it turned out I couldn't take my eyes off.
Thanks
Ha! Thanks
Finally, Patrick uses both Raging and Hotness in the same sentence :p
On the Intel side: I had to chuckle a bit when you said "I am not going to say who" and Microsoft was quoted on the slide you showed
arm/risc-v cores combined with an unlocked FPGA co-proccessor is the solution i prefer most because it enables end user reconfiguration of the types of hardware acceleration along with the ability to update said acceleration capabilities down the line via software updates.
agreed that flexibility is really exciting
Heck one cool feature I saw was eBPF being turning into FPGA programming allowing for accelerated networking at the smartnic level (so potentially something like Cillium could use to accelerate Kubernetes based firewall and networking in general!)
Seeing you categorizing these parts and coming up with a definition to distinguish between various similar but unequal parts reminds me much of my law work. :D
This definitely feels like we're going full circle - started with mainframes, then back down to desktop style servers, and now going back to mainframes... except this time, the mainframe can be infinitely expanded, almost Lego-style, by tacking another "building block" server in, and just linking it into the fabric
The industry is a pendulum that is cycling between two extreme trends.
So in essence these are hypervisor accelerators. Reminds me of the IBM's "scale up" pSeries architecture with service processors managing the system -- though these service processors are definitely *not* in data path.
I really understood what was the point for DPU when I understood the motivation is Data Centric Architecture, vs. offloading processes, it's really a conceptual shift
excellent overview of different levels of NICs....I really like the idea of offloading data storage mgmt I/O ops off the main CPU in a DPU.
History might not be on their side when it comes to the "push everything to the edge" philosophy. I'm all for hardware offloading but there is a point of diminishing returns. And maybe adding a full server to an AIC just to do TCP and IPSEC/TLS acceleration is falling into that ballpark.
Not that long ago in the earlier days of cloud computing (mid 2000s) there was a push to move processing to the network edge and I'm feeling like there's some repeating themes with DPUs.
A good example was SSL accelerators. Expensive boxes promising to take heavy cryptographic processing off your servers so you could process more transactions with fewer servers. Neato!
What happened of course was those edge boxes got overloaded and slowed down traffic, and unlike servers were not easily upgradable, they were also yet another failure point and security target and introduced extra hops (latency).
Turned out burning a couple of cores on your CPU for SSL was cheaper, used less power, was easier to manage, and ever faster cores were being added to CPUs so quickly it stopped mattering.
Looking at the sheet for the BlueField-2 I see nothing a CPU can't do quite well: AES, SHA, TLS, RNG, Compression/decompression.
I have to say I do find the idea of regular expression acceleration quite cute because I live and breathe regex but honestly, when your traffic scales up what do you think is going to crumble first, your 16-128 core Xeon/EPYC server, or the 8x Armv8 A72 cores on your NIC (the same cores as in a Raspberry Pi)?
Can anybody suggest an application where they've thought; I really need extra CPUs to handle this specific workload and the best place for those CPUs would be in a low power ARM server on my NIC?
I suppose time will tell if I'm speaking from experience or a failure of imagination.
Thank you - Saved me days in research - nice work
When can I get some of these for the homelab? :P
Ebay in a few years
You can bag the xilinx smartNIC accelerator cards on digikey. It's more set to run acceleration algorithms on the FPGA, and has no ARM cores. The HBM variant isn't all that expensive either, a few thousand last I checked.
Same, I really want to try some stuff with Kubernetes on those (Control Plane on DPU, Worker nodes inside the servers)
@asdrubale bisanzio fair enough
Looking for budget sources is challenging for these, but I'd expect some of the earlier accelerator dev boards that were snapped up by the crypto miners to come on the market in a few years. I think the popularity of that board is what lead to the alveo line.
I mentioned the HBM alveo because it's far less expensive than any of the HBM enabled virtex us+ line. It's also one if the few that have PCIe 4, and in CCIX mode at that. Only x8 lanes, though, but it has two ports pinned to the x16 edgecard connector.
Is one of the key requirements for the DPU definition that it has the ability to connect to an ethernet network? If so, then the term DPU is probably too general to describe this specific use case.
For example, if someone put a GPU + Arm CPU combo on a PCIe card and stuck it in a server, where it can act as both a PCIe root and endpoint (simultaneously), but does not have a network connection, then shouldn't that also be considered a DPU?
Another example would be one of the Xilinx Versal FPGA cards which have an ARM A72 + FPGA + AI accelerator, but don't necessarily need to have a network connection (I think they do, but that would be more of an out-of-band port). Could that be considered a DPU?
High speed networking is one of the requirements in the video and on the site, but not specifying Ethernet only at this point. Actually the BlueField-2 DPUs can do InfiniBand as well
@@ServeTheHomeVideo Then I think that DPU is too general of a term here, where Smart NIC would probably be more precise.
It's sort of along the lines of calling GPUs "SIMD Processors", but then requiring anything called a "SIMD Processor" to be able to render 3D graphics (i.e. something like a Phi wouldn't fit into the definition of "SIMD Processor").
To me it seems like DPU should describe some sort of asynchronous accelerator, where a GPU would not be a DPU, but one of the Phi cards would be (even though they have no networking capability as I recall).
Or is the thought more along the lines of Data Processing == Data Stream Processing, and is abbreviated? i.e. streaming data over a network (I think that would be fundamentally different from something like a Phi card).
@@hjups Went into SmartNIC v. DPU in the video. DPU is what the industry is using.
@@ServeTheHomeVideo Sorry, I watched that part and forgot about the distinction. And by your explanation, SmartNIC doesn't adequately describe these accelerators.
I'm not sure that DPU is precise enough though considering your exclusionary definition, regardless of if the industry is standardizing around it (the industry has standardized around plenty of poorly defined definitions before, mostly for marketing purposes).
With that standardized definition though, how would you define something that looks similar to a DPU, but isn't meant for high speed networking or packet processing acceleration?
Something like a SoC with an AI accelerator in it for example, which may have an out-of-band access port, as well as PCIe root + endpoint connectivity? If I recall correctly, Xilinx has a new Versal card that pretty much meets that definition, where the FPGA is not programmable, and instead acts more like a google TPU with an ARM core running Linux.
Hey Patrick just wanted to give some info as I just did this for my friend.....to verify tour UA-cam channel you need to navigate to the help and feedback menu>verify your account ...it will redirect you to their web page where you have to fill out some formalities and it will do the rest
Excellent article!
Thanks for the overview.
So on the one side you have the idea of moving a DPU into your server as an additional Layer in the stack, which is happening in the "general purpose" server space, while on the other side you have the NICs moving *closer* to the workload, even be included directly on the die of the workload chips themselves, cutting out the middleman of PCIe and an external NIC.
I feel only one of these should be called a DPU, since they're so fundamentally different.
I think the future lies in general purpose FPGA like accelerators like the Xilinx Versal chips, connected to a high speed interconnect bus.
So for example for TLS acceleration, the CPU would configure the versal cores to do AES encryption / decryption. If a TLS tunnel needs to be accelarated, the CPU would instruct a "SmartNIC" to DMA all packets on that Connection to the Accelerator card, where the decryption/encryption happens.
As the Versal Cores have a pretty configurable pipeline, you could use the same Card / Accelerator to do AI inference as well, by just reconfiguring it on-the-fly / using some cores on it for Task A and some for Task B and such.
You sure love FPGAs. I love that
situation: there are 14 competing standards. 14 !? Ridiculous .. we need to develop one universal standard that covers everyone's use cases ! Situation: There are 15 competing standards !
Update:
Intel now has DPU.
It calls them IPU (Infrastructure Processing Unit).
Gave them the feedback that the Intel IPU would be considered "exotic" under the continuum
Trying to crunch this into basic English, the only thing I can put to this is like a "system-on-a-card" design that made seldom appearances in the past, but scaled way down in size but also designed to do a single-purpose a part of a larger deployment when deployed in-numbers.
I suppose it's like a loose mash-up on the Raspberry Pi and Intel Compute Element cards?
Where would you class the CCR2004 that mikrotik is launching now?
Still waiting to be able to purchase one.
@@ServeTheHomeVideo I have one on pre-order. They've said it's a full 2004 (so it should be adressable) but it also acts as network card for the device/server. Want to see if Proxmox can see those 2x25Gb ports
Very informative interface video 🎉❤😊
Thanks!
You are bits welcome. 😜
Awesome!
Can I make use of a DPU or smart NIC to increase performance of a TrueNAS scale based NAS-System?
Even more firmware and os upgrading/patching , oh my!
Have you had any luck getting DPUs to run on windows 11 pro for workstations or a sinilar solution to access high speed remote storage without using cpu resources?
If you are just doing remote storage, NVMeoF and RDMA do not require a DPU. We do have a guide to getting the BlueField-2 DPU working on Windows 11 Pro www.servethehome.com/hardware-arm-linux-on-x86-windows-11-how-to-get-nvidia-bluefield-2-dpu-running-on-windows-11-pro/
@@ServeTheHomeVideo I'd like to be performing compute intensive operations over some network medium that offers speeds as close as possible to native pcie 4.0/5.0 attached storage
In your experience would you recommend going bluefield or just the connectx?
This whole idea seems counter productive in a time cpu cores are the getting cheaper and more dense. It's adding complexity and a part that is very much static and not scalable. That's not even getting into the security aspect. I just can't see the advantage except for very special workloads that only Microsoft or Amazon would deal with. The average enterprise would never use these to full capacity and even if they do its going to be cheaper and easier to throw x86 cores at it.
Do you see these moving into enterprise market? As I see it, this is targeted at hyperscalers and has little to no application elsewhere. Your average datacenter or office (which is 10G/25G max usually) on VMWare doesn't need these speeds.
@asdrubale bisanzio sure, hyperscalers have their own software stacks. I meant like project Monterey. Yeah it’s cool that you can run esxi and vsan there but who actually needs that? DPUs are needed for 100gbit and up with things like nvme-oF. Maybe some enterprise storage solutions will use it but actually buying DPU for your average server? I just don’t see it.
And what do you mean dell is dropping it?
You have to remember 2022 and beyond chips and systems are much bigger. 100GbE is now fairly cost effective to use and VMware will be a clear use case. VMware needs DPUs to really do more a more hybrid cloud like model.
@@ServeTheHomeVideoI don't know, I don't think systems will get much bigger. We're at the thermal limits already and the only way forward is horizontal scaling. Hyperscalers deploy 1U systems for a reason. And then you no longer need fat links. Regular hosts rarely saturate even 10gbit. Only exception is usually storage systems. There DPUs make sense. Running Ceph OSDs on DPU - yes please. Although, current DPUs are too slow for that. You need pretty beefy servers to run NVMe based ceph cluster.
Regarding hybrid cloud, you mean using DPU for disaggregation of resources across local and cloud? That makes sense. But seems likes DPU is a huge overkill just for that. You really need just a simple bridge from PCIe to the network. Everything else is fairly useless as links between local and cloud are slow and need little resources to handle usual tasks like firewall, encryption.
@asdrubale bisanzio VMWare is doing very well. The split from Dell is probably some business decision that doesn't affect anything. They keep growing and they're still leading the virtualization market. Vsphere is on another level compared to any other product. Especially when you pair it with vsan and nsx. But I agree, my feeling from working with it and looking at limitations (there're specific limits to host many hosts and VMs you can manage), it is pretty much for virtualization of your "average server". They try to get a slice of other markets like recent integration of kubernetes control plane, support for GPUs, optane but I rather doubt many people run it in setups where DPUs would realize their full value. It just doesn't make sense to me to pay so much money for it at that scale. If you're that big you probably can afford building something your own using open source products like openstack, proxmox, opennebula. If you even need virtualization at that point.
@asdrubale bisanzio it just seems VMware is the only one who provides the whole package and it just works. Sure there’re solutions like proxmox but they’re not there in terms of performance, reliability, ease of use, features etc.
Yeah, you can’t really compare proxmox and openstack. The latter is definitely more of a collection of building blocks than a solution. I more meant that with open source it means you’re building it on your own. No HCLs, no support, no guarantees of any kind. It’s a big risk and for some it makes sense to buy VMware and be done with it. Just the storage alone. For any kind of serious deployment you would want ceph. Managing it is a big task on its own. Even selecting hardware can be non trivial.
Play with the Mikrotik one please.
If these DPUs run full blown Ubuntu w/ Docker then when can I offload my silly Next.js website app to it
We are going to have the Marvell Octeon 10 review up in December. That... yes 100%
I am working in a telecommunication company which develops 5G Base-Station and i am working as a developer in PHY Layer. We are planning to buy a DPU. So which metrics we should be care about ? Marvell , Nvidia-Bluefield-3 , AMD Pensando , Intel IPU etc. do you have any recomendation or guideline ? Thank you for your supportive video by the way ?
sounds mainframy
Thanks for this high level overview of what DPUs are.
I DO wonder though - that if it has a PCIe 4.0 x16 slot, that means that you still need a host systems for said PCIe slot, don't you?
No you do not. It can run as a PCIe root port off of the x16.
@@ServeTheHomeVideo
But what I mean is that you still need a system to provide the PCIe x16 slot that this card connects to, don't you?
Or are you able to run it WITHOUT plugging it into a PCIe x16 slot on SOME sort of a host system?
(i.e. you give it power, and away you go, e.g. you can run it off a GPU crypto mining riser.)
DPUs could be practical even for consumer applications.
What if you could literally rent a GPU from the cloud, have it recognized as an actual GPU, and use it to accelerate (high latency) classic desktop applications, e.g. rendering?
A homelabber can dream.
This reminds me of the Killer NICs from over a decade ago.
Very refreshing! It would be interesting to see the optimum setup including price for a startup Kube cluster. For example, is it better to buy four 1U dual 16 core 256 gig 10 SSD servers or for around the same price, a single server with three DPUs to use to run web APIs?
Great video thank you
So if I'm understanding this vaguely correctly - If virtual machines are a way to nest servers in software, DPU's nest them in hardware?
So this is the end of H/W RAID controllers? What is that ultimate DPU on-host device for use with blazing fast edge-side read-intensive PMEM storage that is just being rolled out in the industry? Any takes, Patrick?
Would really love it if y'all went more into NVMeoF! Please... =)
What would you want us to cover specifically?
Where do you place the HPE J2000 enclosure? Thanks!
Good lay of the land!
Y would go for the marvel solution like 10 in a cluster or even more a stream device with 1 class for example if you want to convert multiple gpu outputs in optical solutions so you won't need to have direct connections 😉
Without "Lime" category, no classification is complete :)
Okay. I think I MIGHT have it now. So, basically a "DPU" is intended to be an infrastructure element, separate from the main CPU in a server, but serving somewhat of a limited-function general purpose processing role as a coprocessor to keep the infrastructure-related computational tasks out of the main CPU cores so as to provide "maximum resources" to the users who are accessing (or trying to access) the server's main CPUs for their workloads. In simpler terms, the so-called "DPU" acts as a management coprocessor for handling all the networking, communications and security tasks specific to that use case so as to NOT occupy ANY main CPU cores with those tasks. Then why can't we just call it an Infrastructure Management Coprocessor, or IMC? Why the hell call it the vague term of "data processing unit" if its function is actually something specific in a similar manner to a GPU being specifically designed for processing data and instructions relating to graphics, animation rendering and video effects? THAT is my real issue with "DPU". It isn't the "what" or "why" so much as the terminology itself I take exception to. I don't like things being vague. I don't go around calling the GPU in my laptop a "data processing unit". The thing processes a specific kind of data, not just "data" generally. You have to call these things what they are. You positively CANNOT be vague, else you wind up confusing technician and end customer alike, and nobody gets anything done right when that happens.
Are the Xilinx cards all DPU or also Excotic?
So they're ESXi management systems?
Everything is shifting to PCI with CPUs as peripherals
I could care less about the networking aspect overall, but to add multiple hardware hardware platforms (x86/64 metal w/ARM add-in cards) into the resource pool of a hypervisor would be insane, but that seems too simple and will likely never make it to market...lol
So what im hearing you say is that yes it can run Crysis.
💘 this channel.
However this video dunno . . .
Consultant? . . .
Reporter? . . .
The five column pictorial puts downward pressure on sales for products on the right half of that slide.
Bean counters now are armed with that slide.
Kindest regards, friends and neighbours.
You heard it here first...Cisco buys Fungible
Or likely Pensando
We can tell Patrick likes Nvidia better than intel 🤣🤣
Hm... After 1/3 of this video I still do not know what is a Deepee Yu. A search on the web shows: Its a Data Processing Unit. And than the speaker explains the DPU while using a plethora of another terms, of which most are unknown to me. I cannot escape one conclusion: I have become a Dinosaur. The speed in which new ideas, and than true to form, 3-6 letter acronyms denoting these, is faster than my puny capacity to absorb them.
Sounds Broadcom Stingray is folded...
Jar gone.
You have some great content in your videos. But I just can't watch them due to your video editing. Every few seconds the video just glitches/jumps where you cut content together. I find it super distracting.
mb
I cringe every time you handle these devices without an ESD strap.
Grounded under table off camera. Not perfect but fine for this.
Arm powered add-in cards for x86/64 ? Perhaps things such as the Hackintosh realm won't die ? All hail Virtualization, multiple-platform enabled boxes and of course, Donald Trump ;)