Building Machine Learning Systems for a Trillion Trillion Floating Point Operations
Вставка
- Опубліковано 13 гру 2024
- Over the last 10 years we've seen Machine Learning consume everything, from the tech industry to the Nobel Prize, and yes, even the ML acronym. This rise in ML has also come along with an unprecedented buildout of infra, with Llama 3 now reaching 4e25 floating point operations, or 40 yottaflops, or 40 trillion trillion floating point operations.
To build these ML models, you need ML systems, like PyTorch. In this talk, Horace will (attempt to) answer:
How have ML systems evolved over time to meet the training needs of ML models? How does building ML systems differ from regular systems?
How do we get the most out of a single GPU? What's the point of compilers if we're just training a single model?
And what is the right way to think about scaling to 10s of thousands of GPUs?