Berkeley Data Analytics Stack Present Future - Michael Franklin - Technion lecture
Вставка
- Опубліковано 22 гру 2024
- The Berkeley Data Analytics Stack Present and Future
Lecture on March 27, 2014 Technion Computer Engineering Center
Henry Taub Distinguished Visitor, Michael Franklin, is the Thomas M. Siebel Professor of Computer Science at UC Berkeley.
The Berkeley AMPLab was founded on the idea that the challenges of emerging Big Data applications require a new approach to analytics systems. Launching in early 2011, the project set out to rethink the traditional analytics stack, breaking down technical and intellectual barriers that had arisen during decades of evolutionary development. The vision of the lab is to seamlessly integrate the three main resources available for making sense of data at scale: Algorithms (such as machine learning and statistical techniques), Machines (in the form of scalable clusters and elastic cloud computing), and People (individually as analysts and as crowds). The lab is realizing its ideas through the development of a freely-available Open Source software stack called BDAS: the Berkeley Data Analytics Stack. In the three years the lab has been in operation, we've released major components of BDAS. Several of these components have gained significant traction in industry and elsewhere: the Mesos cluster resource manager, the Spark in-memory computation framework, and the Shark query processing system. BDAS shows up prominently in many industry discussions of the future of the Big Data analytics ecosystem -- a rare degree of impact for an ongoing academic project. Given this initial success, the lab is continuing on its research path, moving "up the stack" to better integrate and support advanced analytics and to make people a full-fledged resource for making sense of Big Data.
In this talk, I'll first outline the motivation and insights behind our research approach and describe how we have organized to address the cross-disciplinary nature of Big Data challenges. I will then describe the current state of BDAS with an emphasis on our newest efforts, including some or all of: the GraphX graph processing system, the MLBase machine learning platform, and the SampleClean framework for combining sampling and hybrid human/computer data cleaning. Finally I will present our current views of how all the pieces will fit together to form a system that can adaptively bring the right resources to bear on a given data-driven question to meet time, cost and quality requirements throughout the analytics lifecycle.