80
46 414

Day 1 15:30: The Head Bubba Memorial and HOTI Closing Remarks

3:11

Day 1 13:00: Keynote: Connectivity for AI Everywhere: The Role of Chiplets

1:00:04

Day 1 11:25: Quality-of-Service Provision for BXI3-based Interconnection Networks

19:22

Day 1 11:25: A new Mechanism to Identify Congesting Packets in HP Interconnection Networks

29:48

Day 1 11:00: Platinum Sponsor Talk: Universal Memory Interface (UMI)

14:43

Day 1 10:00: Keynote: Keeping: Pace Micro-Models, Large Language Models, and the Co-Evolution ...

1:02:00

Day 1 14:00: Panel: Bandwidth Scaling for AI Interconnect – More Wavelengths vs More Fiber?

Moderator: Dr. Katharine Schmidtke
Panelists: Dr. Dan Kuchta (IBM Research), Dr. Peter Winzer (Nubis Communications), Dr. Rebecca Schaevitz (MixxTech, Inc), Dr. Amit Nagra (Intel), Dr. Alan Liu (Quintessent Inc.), Dr. Bardia Pezeshki (Avicena Tech)

Відео

Day 1 15:30: The Head Bubba Memorial and HOTI Closing Remarks

3:11

Day 1 15:30: The Head Bubba Memorial and HOTI Closing Remarks

Переглядів 624 місяці тому

Speaker: Dan Pitt

Day 1 13:00: Keynote: Connectivity for AI Everywhere: The Role of Chiplets

1:00:04

Day 1 13:00: Keynote: Connectivity for AI Everywhere: The Role of Chiplets

Переглядів 9404 місяці тому

Speaker: Tony Chan Carusone (CTO, Alphawave semi)

Day 1 11:25: Quality-of-Service Provision for BXI3-based Interconnection Networks

19:22

Day 1 11:25: Quality-of-Service Provision for BXI3-based Interconnection Networks

Переглядів 734 місяці тому

Miguel Sánchez de la Rosa, Gabriel Gabriel Gomez-Lopez, Francisco J. Andújar, Jesús Escudero-Sahuquillo, José L. Sánchez, Francisco J. Alfaro and Pierre-Axel Lagadec Technical Paper Session B: Interconnection Network Session Chair: Xiaoyi Lu (UC Merced)

Day 1 11:25: A new Mechanism to Identify Congesting Packets in HP Interconnection Networks

29:48

Day 1 11:25: A new Mechanism to Identify Congesting Packets in HP Interconnection Networks

Переглядів 1664 місяці тому

Full title: A new Mechanism to Identify Congesting Packets in High-Performance Interconnection Network Cristina Olmedilla, Jesus Escudero-Sahuquillo, Pedro Javier Garcia, Francisco J. Quiles, Wenhao Sun, Long Yan, Yunping Lyu and Jose Duato Technical Paper Session B: Interconnection Network Session Chair: Xiaoyi Lu (UC Merced)

Day 1 11:00: Platinum Sponsor Talk: Universal Memory Interface (UMI)

14:43

Day 1 11:00: Platinum Sponsor Talk: Universal Memory Interface (UMI)

Переглядів 2064 місяці тому

Full title: Universal Memory Interface (UMI): A practical solution to break the memory wall for AI ASICs Speaker: Ramin Farjadrad (Co-Founder and CEO, Eliyan)

Day 1 10:00: Keynote: Keeping: Pace Micro-Models, Large Language Models, and the Co-Evolution ...

1:02:00

Day 1 10:00: Keynote: Keeping: Pace Micro-Models, Large Language Models, and the Co-Evolution ...

Переглядів 1804 місяці тому

Full Title: Keeping Pace: Micro-Models, Large Language Models, and the Co-Evolution of Chip and Data Center Design in a Rapidly Shifting AI Landscape Speaker: Mark Baciak (CTO, NeuralFabric)

Day 1 09:00: Rail-only: A Low-Cost High-Performance Network for Training LLMs with Trillion Params

32:14

Day 1 09:00: Rail-only: A Low-Cost High-Performance Network for Training LLMs with Trillion Params

Переглядів 6604 місяці тому

Weiyang Wang, Manya Ghobadi, Kayvon Shakeri, Ying Zhang and Naader Hasani Technical Paper Session A: Networks for Large Language Models Session Chair: Shelby Lockhart (AMD)

Day 1 09:00: Characterizing Communication in Distributed Parameter-Efficient-Fine-Tuning for LLMs

28:44

Day 1 09:00: Characterizing Communication in Distributed Parameter-Efficient-Fine-Tuning for LLMs

Переглядів 1664 місяці тому

Nawras Alnaasan, Horng-Ruey Huang, Aamir Shafi, Hari Subramoni and Dhabaleswar K. Panda Technical Paper Session A: Networks for Large Language Models Session Chair: Shelby Lockhart (AMD)

5:26

Day 1 08:50: Introduction and Welcome

Переглядів 3724 місяці тому

Matthew Dosanjh

10:55

Day 2 15:15: Closing Remarks

Переглядів 354 місяці тому

Artem Polyakov

Day 2 14:50: Invited Talk: Ultra Ethernet Consortium (UEC) overview

34:39

Day 2 14:50: Invited Talk: Ultra Ethernet Consortium (UEC) overview

Переглядів 5274 місяці тому

Speaker: Uri Elzur (Intel)

Day 2 14:20: Invited Talk: Chip-let Interconnect Test and Repair

36:02

Day 2 14:20: Invited Talk: Chip-let Interconnect Test and Repair

Переглядів 1124 місяці тому

Speaker: Sreejit Chakraverty (Ampere Computing)

4:45

Day 2 14:00 - Nutanix

Переглядів 474 місяці тому

Sponsor Videos

2:16

Day 2 14:00 - Lenovo

Переглядів 314 місяці тому

Sponsor Videos

Day 2 13:00: OHIO: Improving RDMA Network Scalability in MPI_Alltoall

26:24

Day 2 13:00: OHIO: Improving RDMA Network Scalability in MPI_Alltoall

Переглядів 964 місяці тому

Day 2 13:00: OHIO: Improving RDMA Network Scalability in MPI_Alltoall

Day 2 13:00: Demystifying the Communication Characteristics for Distributed Transformer Models

33:39

Day 2 13:00: Demystifying the Communication Characteristics for Distributed Transformer Models

Переглядів 1924 місяці тому

Day 2 13:00: Demystifying the Communication Characteristics for Distributed Transformer Models

Day 2 11:30: Invited Talk: Can Interconnects Keep up with AI? YES.

29:18

Day 2 11:30: Invited Talk: Can Interconnects Keep up with AI? YES.

Переглядів 5864 місяці тому

Day 2 11:30: Invited Talk: Can Interconnects Keep up with AI? YES.

15:35

Day 2 11:00: GigaIO

Переглядів 814 місяці тому

Day 2 11:00: GigaIO

Day 2 10:00: Keynote: Powering Llama 3: Peek into Meta’s Massive Infrastructure for Generative AI

1:01:55

Day 2 10:00: Keynote: Powering Llama 3: Peek into Meta’s Massive Infrastructure for Generative AI

Переглядів 4164 місяці тому

Day 2 10:00: Keynote: Powering Llama 3: Peek into Meta’s Massive Infrastructure for Generative AI

Day 2 09:10: Unified Collective Communication (UCC): An Unified Library for CPU, GPU, and DPU

32:34

Day 2 09:10: Unified Collective Communication (UCC): An Unified Library for CPU, GPU, and DPU

Переглядів 2104 місяці тому

Day 2 09:10: Unified Collective Communication (UCC): An Unified Library for CPU, GPU, and DPU

Day 2 09:10: Towards a Standardized Representation for Deep Learning Collective Algorithms

19:23

Day 2 09:10: Towards a Standardized Representation for Deep Learning Collective Algorithms

Переглядів 1064 місяці тому

Day 2 09:10: Towards a Standardized Representation for Deep Learning Collective Algorithms

5:38

Day 2 09:00: Welcome

Переглядів 234 місяці тому

Day 2 09:00: Welcome

Day 3 10:00: OFI Libfabric, APIs for MPI & CCL

1:33:55

Day 3 10:00: OFI Libfabric, APIs for MPI & CCL

Переглядів 2144 місяці тому

Day 3 10:00: OFI Libfabric, APIs for MPI & CCL

Day 3 08:00 Linear Electro-Optical Interface

1:51:48

Day 3 08:00 Linear Electro-Optical Interface

Переглядів 3394 місяці тому

Day 3 08:00 Linear Electro-Optical Interface

Day 3 15:00: ASTRA-sim and Chakra: Co-design Exploration for Distributed Machine Learning Platforms

1:57:35

Day 3 15:00: ASTRA-sim and Chakra: Co-design Exploration for Distributed Machine Learning Platforms

Переглядів 4814 місяці тому

Day 3 15:00: ASTRA-sim and Chakra: Co-design Exploration for Distributed Machine Learning Platforms

Day 3 12:30 - Chip-let Interconnect Test and Repair

2:02:50

Day 3 12:30 - Chip-let Interconnect Test and Repair

Переглядів 1664 місяці тому

Day 3 12:30 - Chip-let Interconnect Test and Repair

Day 3 14:00 - Principles and Practice of Scalable and Distributed Deep Neural Networks

2:44:23

Day 3 14:00 - Principles and Practice of Scalable and Distributed Deep Neural Networks

Переглядів 1834 місяці тому

Day 3 14:00 - Principles and Practice of Scalable and Distributed Deep Neural Networks

Day 3 11:00 - Leveraging SmartNICs for HPC and Data Center Applications

2:30:00

Day 3 11:00 - Leveraging SmartNICs for HPC and Data Center Applications

Переглядів 2174 місяці тому

Day 3 11:00 - Leveraging SmartNICs for HPC and Data Center Applications

Day 3 08:00: High-Performance and Smart Networking Technologies for HPC and AI

2:28:50

Day 3 08:00: High-Performance and Smart Networking Technologies for HPC and AI

Переглядів 2104 місяці тому

Day 3 08:00: High-Performance and Smart Networking Technologies for HPC and AI

КОМЕНТАРІ

@samyogdhital День тому
ua-cam.com/video/FUsfQvSZEm4/v-deo.html
@UdayBhaskarprataapagiri 21 день тому
Thanks for the talk. Can you share the link for the handout?
@devonstart2758 Місяць тому
I met bubba. indeed i think i was at the Credit Suisse event.. but i described him as 90% roy orbison and 5% elvis. he was a super cool guy, he was very much as you said, generous. i was a low lever advertising sales person and he treated my like anyone else.
@uyyk2645 Місяць тому
Hellow, awesome tutorial! But I cannot find the slides. The link in the first page redirects to another site. Where can I get them? Thank you.
@abhbhat2 2 місяці тому
Very clearly explained the trends and challenges. 👍
@garyroey4445 3 місяці тому
Speak to usa btc miners Corz cleanspark marathon
@黃奕鈞-p2q 4 місяці тому
Hello, are the PDF files of the handouts available? Thank you
@souravzzz 4 місяці тому
Excellent work! Very insightful.
@ZhengZhou-n8o 5 місяців тому
This video is fantastic for those who wants to know the 101 of HPC network, key concepts are explained very well. The great comparision also help a lot in forming an overview of the HPC network. Thanks to prof Panda and prof Subramoni
@mIbrahim1981 11 місяців тому
I didn't find slides on the mentioned path .. Appreciate if you can share right link here
@mIbrahim1981 11 місяців тому
Wonderfull explanation for very interesting topics ... Really Great thanks for this lecture
@thoughtbox Рік тому
It was mentioned that DGX has an 8:1 BW taper, but is not correct. Each DGX has 72 NVLink Network ports. I only mention it for clarification.
@jebtang 7 місяців тому
DGX GH200, each DGX Chassis will have 36 OSFP 400Gb port, which belong to 72 NVlink
@brandydogish Рік тому
From an ASIC SOC technical writer's perspective and as the producer of the Semiconductor Subject Matter Expert Database, your presentation is excellent. Ideal reference material for writing about silicon photonics. It was a good idea to introduce the photonic building blocks, fiber optics packaging fundamentals and associated technical issues. As you say, many in the ASIC domain are new to photonics. Lightmatter's approach, that is replace the 2D arrays of multiply-accumulate units used in present day GPU based natural machine language AI Processors with an optical equivalent, i.e. programmable Mach-Zehnder interferometer or photonic vector matrix multiplication units, doubles down on the benefits of near-zero latency photonic IO ports. I have been researching heterogeneous ASIC solutions with silicon photonic waveguide interconnect as an alternative to wire interconnect and as a way to get around silicon IO latency, power and delay obstacles. This has brought me to other photonic interconnection alternatives like microLEDs, MEMS mirrors, LiFi, nanowire LEDs, photodetector arrays, and lasers.
@levuong8077 Рік тому
Thanks for sharing!
@kbgexplores Рік тому
great presentation, very insightful and learned a lot, thanks
@xbsong8409 Рік тому
how can i find the slack channel, thanks.
@TaekyungHeo 2 роки тому
32:42 Impact of RoCE Congestion Control Policies on Distributed Training of DNNs (Tarannum Khan, Saeed Rashidi, Srinivas Sridharan, Pallavi Shurpali, Aditya Akella and Tushar Krishna)
@jn6038 2 роки тому
Could you please share the source codes, I am really looking forward to study libfabrics. Thanks.
@nathanieljefferson3951 2 роки тому
😥 ｐｒｏｍｏｓｍ
@grasswater001 2 роки тому
in the middle, the vidio is broken while the audio is still there. Pls check and fix it, thanks
@logiclogic8526 2 роки тому
0:02 The HPE Cassini NIC # Keith Underwood, HPE 35:39 GPU Scaling with Intel oneAPI Level Zero Peer-to-Peer Solution # Jaime Arteaga and Ravindra Babu Ganapathi - Intel 1:02:31 Bunch of Wires: An Open and Versatile PHY Standard for Die-to-Die Interconnects # Elad Alon, Shahab Ardalan, Boris Murmann, Bapi Vinnakota and Venkata Satya Rao 1:23:04 Synchronous and Low-Latency Die-to-Die Interface for the IBM z16™ Telum Processor # Chad Maquart, IBM
@Mr_ST_720 2 роки тому
Big fan sir . One more question can there be possibility of having traffic steering approach on cxl switch..like SDN at interconnect cpu to devices level..
@Mr_ST_720 2 роки тому
Sir do you see that the memory devices will get detached n will be made seperate memory boxes like storage and routers n cxl at interconnect along with rdma/nvme at network would be combinedly used to build this future cloud infrastructure?
@scottschweitzer 3 роки тому
I moderated this panel. A full transcript is below in the comments. I was thrilled we had the following speakers: Cary Ussery (Marvell) - prefers the term DPU, Data Processing Unit Mario Baldi (Pensando) - likes to use DSC, Distributed Services Card Michael Kagan (NVIDIA) - they may have coined DPU Jim Dworkin (Intel) - IPU, Infrastructure Processing Unit Nick Ilyadis (Achronix) - SmartNIC Rip Sohan (Xilinx) - SmartNIC Below, in a series of comments is a full transcript of the panel.
@scottschweitzer 3 роки тому
Here are the timestamps and question summary, followed by a summary of answers: 15:10 On queueing packets into accelerator memory: Cary - Hybrid approach of on-chip vs. off-chip. Data paths are being built with batch packet processing in mind. Jim - Temporal locality of the packets and flows to compute engine, especially during initial flow setup. Mario - SmartNICs need to work at line rate; they should not queue packets, so most queueing is on the chip. Queueing may be more critical when working on packets at the messaging level, and then using off-chip memory is appropriate. Nick - Coming from an FPGA point of view, supporting high-speed external memory is critical for flexibility. Michael - Queues shouldn’t be kept on a chip; use host memory, particularly on entry-level NICs. You need to design your NIC, so it is balanced. The attached memory, DDR, is used on DPU to retain context. On-chip, there should be caches that are as small as possible. Rip - Keep expensive on-chip memory as small as possible. There is a case for slower and larger capacity memory; most systems will leverage tiered memory, where the host memory is the final tier. You’re only as good as the primitives you have for sorting, managing, and moving data.
@scottschweitzer 3 роки тому
27:35 How should classification and action on packets be handled? Jim - In any CPU architecture, you have caches, L1-L3-DDR, and as a result, added latency as you work on data; this is not the case with FPGAs. Based on the temporal locality of the data to the compute, you’ll find that these SoC-based solutions take much longer to process packets. Mario - It should not be fixed logic, as requirements and needs change over time. Our approach is to use ASICs with software running on processor cores to manage packets; that way, things are reconfigurable. Nick - We want to provide the user with maximum flexibility, as we know that protocols keep evolving. The FPGA allows you to change the way the hardware is programmed. Michael - Most things should not be done in the CPU; the majority of the traffic should go through the fast path you’ve designed for. Once you have the experience, you can use the CPU, which is on the slow path, the exceptions, for when you need the flexibility to do something not found in the ASIC. It’s a design that’s balance; the art is using CPUs with ASIC blocks for the best performance for what is required. Rip - Your most precious resource is your CPU, so by the time data reaches the CPU, your accelerators should have done everything they could. That way, the CPU could just do the thing that it's supposed to do. Cary - You don’t want to do classification on a general-purpose CPU. It shouldn’t be hard-coded logic as too much is changing in the market, something that is both flexible and programmable, almost on a minute-by-minute basis. Microcode engines, not CPUs.
@scottschweitzer 3 роки тому
36:00 On-chip network bandwidth, what is your approach, and how is it adapting? Mario - The architecture has to be hierarchical; you cannot possibly have all components talking to all components. You’ll need some coherent caches, and the design is not trivial; you may have more than one network on the chip. On-chip bandwidth has a huge impact on the performance of the DPU. The key is how you put together the various components, how tightly you integrate things. Nick - Flexibility is very important when tying the pieces together. We enable our customers so they can configure their architectures the way they want. We strive to offer the most flexible interconnect to meet their needs. At Achronix, our 2D NoC (Network on Chip) is finely tuned for this purpose. Michael - On the SmartNIC, the way you set up the on-chip network is much more straightforward than FPGAs. The right caching and buffering structure is one of the key things to having a successful product. Rip - With a hierarchical structure, as clock speeds and volumes go up, you’ll replicate more of the same components. We buy into that approach. What we aim to do with our architecture is to move data around faster; it turns out that half the time is spent moving data around. When things need to take a step up, make sure the core logic provides for that. Our challenge is, can we scale the way people write applications so that those applications scale up with increased traffic. A 10 GbE application is completely different than a 100 GbE application. The basic structure is the same, it’s the optimizations that matter. It’s not just about the hardware itself, it's about the software, the way you structure the software will ultimately influence how the hardware is designed. Cary - There isn’t an on-chip network per se; there is a series of optimized sub-systems that are interconnected via a crossbar switch. Taking an SoC DPU approach means getting these interconnects optimized and done right is hard work. Jim - It’s our job to figure out how to handle the internal plumbing. The strongest vendors in the market will be those offering customers the most choice. Working with hyperscalers, we’ve worked hard to tune systems to their specific configurations. We harden certain functions, like crypto, with that hardened data path, we can further tune and balance the system.
@scottschweitzer 3 роки тому
46:05 Balancing what we can do versus what we should do in hard logic as opposed to soft logic or programming, what are three DPU functions do you feel should be in hard logic? Nick - Interface logic like Ethernet and PCIe as well as TCAM and a classifier. Michael - Classification, data processing, compression, DMA. Rip - Assuming PCIe and DMA are hard, with regard to applications packet classification (L3-L4), crypto, both storage, and networking functions, and basic packet processing functions like LRO, checksum insert, checksum verification, those three are must-haves. Cary - Crypto, IPSec, adding something different, work scheduling across multiple components on the chip. Jim - On-chip data movers, compression, RDMA for, say iWarp, RDMA, etc...
@scottschweitzer 3 роки тому
49:25 Now that we’re thinking beyond 100 or even 400 GbE, how do you intend to scale your architecture? Michael - 400 GbE is kind of history; we’re working on 800 GbE, you scale vertically and horizontally. Vertically, you do things that use to be programmable now in hardware. Concurrency and more sophisticated data handling. Rip - Increase bus width, increase line rates, take most common soft logic functions and move them to hard logic. Adding more cores to handle greater packet rates. Cary - Optimizing sub-networks and data movers, diversification of that network to achieve higher rates. Jim - Packaging technologies to keep solutions efficient and as low power to keep power manageable. Software is important to make it easier for customers to move from different deployment models, on-prem to cloud, etc. Having worked at other companies on this panel, I’m not sure if some of the solutions of other panelists will scale as we go to 400 and beyond. Mario - All the devices are programmable, along the lines of software are tools, a compiler can make a huge difference. Also, the architect and the way the components are integrated can make a huge difference. Nick - Our 2D NoC is very capable of 400G, scalable to 1.6 Tbps. So I believe having a high-speed network on the chip allows you to move data between parallel processing blocks to achieve these high data rates. It's not all about the speed of the interface but traffic management and shaping.
@kinnabrittne9945 3 роки тому
rt6vd vyn.fyi
@jennieinez2554 3 роки тому
qggir vyn.fyi

HOTI - Hot Interconnects Symposium

КОМЕНТАРІ