GrayMatter Pyspark Interview Question - Get null count of all columns
Вставка
- Опубліковано 15 вер 2024
- Pyspark Interview questions recently asked in GrayMatter interview.
We need to Get null count of all columns in dataframe.
Lets see how we can achieve this by using Pyspark.
Mentioning the dataframe details here
data = [(1, None, 'ab'),
(2, 10, None),
(None, None, 'cd')]
columns = ['col1', 'col2', 'col3']
df = spark.createDataFrame(data, columns)
For more Azure Data Bricks interview questions. Check out our playlist.
• DataBricks and PySpark...
Contact us:
info@cloudchallengers.com
Follow us on
Instagram : cloudchallengers
Facebook : cloudchallengers
LinkedIn : linkedin.com/company/cloudchallengers
My SQL solution -
select
sum(case when col1 is null then 1 else 0 end) as col1,
sum(case when col2 is null then 1 else 0 end) as col2,
sum(case when col3 is null then 1 else 0 end) as col3
from temp;
@adityavamsi12 , Thanks for sharing the SQL query.
select sum(case when col1 is null then 1 else 0 end )as col1,
sum(case when col2 is null then 1 else 0 end) as col2,
sum(case when col2 is null then 1 else 0 end) as col3
from input_df;
@dasubabuch1596, Thanks for sharing.
My sql solution -
SELECT COUNT(*)-COUNT(COL1) AS COL1,
COUNT(*)-COUNT(COL2) AS COL2,
COUNT(*)-COUNT(COL3) AS COL3
FROM SAMP_DF;
@Sachin_Sambare, Thanks for sharing
Superb explanation bro 👌 👏