Alfonso Roa Redondo - Fast and furiously safe: type safe programming with Spark DataFrames

Поділитися
Вставка
  • Опубліковано 7 лип 2024
  • The all-popular Big-data programming framework Spark poses a major dilemma to programmers: if you stand for performance, then DataFrames are the option; if you strive for modularity then you must sacrifice some performance and stick to Datasets. This is too bad ... and unnecessary! We present doric, a new open-source Scala library that offers type-safety in DataFrame column expressions at a minimum cost, without compromising performance, and with minimum changes to your accustomed syntax. The talk guides the listener through some common problems that we commonly encounter with Spark DataFrames, and shows how doric helps us overcome them: Run DataFrames only when it is safe to do so.
    Spark will detect references to non-existing columns and complain about that except at runtime. However, Spark won't be able to detect references to existing columns of wrong types since type expectations are not encoded in plain columns. Using doric, we can prevent the creation of the DataFrame, since column expressions are typed.
    Get rid of malformed column expressions at compile time. Spark complains if we try to add integers with booleans, but it complains too late, with an exception raised at runtime. Using doric, there is no need to wait for so long: errors will be reported at compile-time!
    Get all errors at once.
    Unlike Spark, doric won't stop at the first error: it will keep accumulating problems until no further one is found. And with a hint to the exact location of errors!
    Modularize your business logic.
    Get rid of boilerplate code and copy-paste! Doric's combination of error location and type safety is the key enabler of a principled approach to writing libraries of column expressions and enhancing the modularity of our DataFrames.
    2022.scala.love
    / scala_love
  • Наука та технологія

КОМЕНТАРІ • 1

  • @isaias-b
    @isaias-b 2 роки тому

    It was a pleasure to be participating the talk live back then. Thanks for uploading it here!!!