Pyspark for sql developers - statistical functions - Part 1

Поділитися
Вставка
  • Опубліковано 14 жов 2024

КОМЕНТАРІ • 3

  • @houstonfirefox
    @houstonfirefox 2 місяці тому

    Very good video. I would recommend ensuring all syntax errors be edited out or re-recorded so the viewer doesn't get confused.
    The channel name (Data Engineering Toolbox) is a bit confusing as these function comparisons between SQL Server and PySpark fall under the realm of Data Science. A Data Engineer moves, converts and stores data from system to system whereas a Data Scientist extracts and interprets the data provided by the Data Engineer. A small point to be sure but wanted to be more accurate.
    In Variance: The avg_rating column returned integer values because the underlying column "review_score" was also an integer. To get the PySpark equivalent of a floating point avg_rating you could change the column type to FLOAT (unnecessary really) or use CONVERT(FLOAT, VAR(review_score)) to return the true (more accurate) Variation complete with decimal places.
    New sub. I am interested to see even more Data Science equivalent functions in SQL Server that may be native ( i.e; CORR() ) and how to write functions that emulate some of the functionality in PySpark 🙂

  • @leonmason9141
    @leonmason9141 Рік тому

    *promo sm*