EfficientML.ai Lecture 19 - Distributed Training Part 1 (Zoom Recording) (MIT 6.5940, Fall 2024)
Вставка
- Опубліковано 10 гру 2024
- EfficientML.ai Lecture 19 - Distributed Training Part 1 (Zoom Recording) (MIT 6.5940, Fall 2024)
Lecture 19: Distributed Training Part 1
Instructor: Prof. Song Han
Slides: efficientml.ai
Outline:
Background and motivation
Parallelization methods for distributed training
Data parallelism
Communication primitives
Reducing memory in data parallelism: ZeRO-1 / 2/ 3 and FSDP
Pipeline parallelism
Tensor parallelism
Sequence parallelism