GrayMatter Pyspark Interview Question - Get null count of all columns

Поділитися
Вставка
  • Опубліковано 15 вер 2024
  • Pyspark Interview questions recently asked in GrayMatter interview.
    We need to Get null count of all columns in dataframe.
    Lets see how we can achieve this by using Pyspark.
    Mentioning the dataframe details here
    data = [(1, None, 'ab'),
    (2, 10, None),
    (None, None, 'cd')]
    columns = ['col1', 'col2', 'col3']
    df = spark.createDataFrame(data, columns)
    For more Azure Data Bricks interview questions. Check out our playlist.
    • DataBricks and PySpark...
    Contact us:
    info@cloudchallengers.com
    Follow us on
    Instagram : cloudchallengers
    Facebook : cloudchallengers
    LinkedIn : linkedin.com/company/cloudchallengers

КОМЕНТАРІ • 8

  • @adityavamsi12
    @adityavamsi12 Місяць тому +3

    My SQL solution -
    select
    sum(case when col1 is null then 1 else 0 end) as col1,
    sum(case when col2 is null then 1 else 0 end) as col2,
    sum(case when col3 is null then 1 else 0 end) as col3
    from temp;

    • @CloudChallengers
      @CloudChallengers  Місяць тому +1

      @adityavamsi12 , Thanks for sharing the SQL query.

  • @dasubabuch1596
    @dasubabuch1596 Місяць тому +2

    select sum(case when col1 is null then 1 else 0 end )as col1,
    sum(case when col2 is null then 1 else 0 end) as col2,
    sum(case when col2 is null then 1 else 0 end) as col3
    from input_df;

  • @Sachin_Sambare
    @Sachin_Sambare Місяць тому +2

    My sql solution -
    SELECT COUNT(*)-COUNT(COL1) AS COL1,
    COUNT(*)-COUNT(COL2) AS COL2,
    COUNT(*)-COUNT(COL3) AS COL3
    FROM SAMP_DF;

  • @sravankumar1767
    @sravankumar1767 Місяць тому +1

    Superb explanation bro 👌 👏