Photon for Dummies: How Does this New Execution Engine Actually Work?

Поділитися
Вставка
  • Опубліковано 25 сер 2024

КОМЕНТАРІ • 9

  • @lezwon
    @lezwon 10 місяців тому +4

    Wow! this was one of the best and fun talks I've listened to i a long time. I loved how Holly similplified the entire talk, so that even dummies like me can understand. Kudos to her 👏 Great job from starting with basics of how spark and the system works, to relating it to photon.
    Thank you for the presentation Holly. This was very helpful. 🙏

  • @datasmithing_holly
    @datasmithing_holly 8 місяців тому +7

    Hi everyone! Thanks for watching this video. Unfotunately the sources and credits were cut off at the end, so here they are if you would like to do any further reading.
    [Paper] Alexander Behm, Shoumik Palkar, Utkarsh Agarwal, Timothy Armstrong, David Cashman, Ankur Dave, Todd Greenstein, Shant Hovsepian, Ryan Johnson, Arvind Sai Krishnan, Paul Leventis, Ala Luszczak, Prashanth Menon, Mostafa Mokhtar, Gene Pang, Sameer Paranjpye, Greg Rahn, Bart Samwel, Tom van Bussel, Herman van Hovell, Maryann Xue, Reynold Xin, Matei Zaharia. Photon: A Fast Query Engine for Lakehouse Systems. SIGMOD ’22
    [Paper] Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. ACM SIGMOD
    [Paper] Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, and Peter Boncz. 2018. Everything you always wanted to know about compiled and vectorized queries but were afraid to ask.
    [Lectures] CMU 15-721 Advanced Database Systems. 20 - Databricks Photon / Spark SQL, Andrew Pavlo
    [Book] Code: The Hidden Language of Computer Hardware and Software, Charles Petzold
    With special thanks to fact checkers and early reviewers: Alexander Behm, Sriram Krishnamurthy, Utkarsh Agarwal, Kent Marten, Tim Dikland, Grzegorz Rusin, Yassine Essawabi, Youssef Mrini, Erika Fonseca, Eoin O'Flanagan and Michael O'Kane

    • @wookiist
      @wookiist 23 дні тому +1

      That was amazing session. Thank you!

  • @rakeshreddy6630
    @rakeshreddy6630 11 місяців тому +3

    Holly Smith's voice is amazing..
    explanation is giving so effectively...

  • @allthingsdata
    @allthingsdata 9 місяців тому +2

    fantastic, probably gonna steal some slides for internal training

  • @youssefb.7406
    @youssefb.7406 11 місяців тому +1

    Thanks a lot, could be interesting to showcase performance increase using the photon acceleration

    • @datasmithing_holly
      @datasmithing_holly 8 місяців тому +3

      Hey Youssef, I toyed with the idea of including them, but the problem is that performance is very subjective to workloads, feature coverage and when the test is being run. If I was cherry picking, I would point to the 37x speed up for some text functions. On the other hand, not all workloads are photon-isable, so it could make no difference whatsoever. In general, as of 2023 I'd expect to see 2-3x speed up in a compatible workload, but by 2024 I'm anticipating 3-4x.
      Benchmarks can be useful, but what matters are your personal ETL pipelines you're running. At 37:57 there's a list of good candidates to start with. I'd recommend testing Photon with those, and seeing what kind of a difference it makes.
      Happy testing!

  • @maximerivest3501
    @maximerivest3501 10 місяців тому

    Seems like lots of the problems could have been resolved by using julia instead of scala

  • @ScienceMinisterZero
    @ScienceMinisterZero 9 місяців тому +2

    The jvm is for boomers, rewrite it in Rust.