Azure Cloud Data Engineer Mock Interview | Important Questions asked in Big Data Interviews| Pyspark

Поділитися
Вставка
  • Опубліковано 16 тра 2024
  • 𝐓𝐨 𝐞𝐧𝐡𝐚𝐧𝐜𝐞 𝐲𝐨𝐮𝐫 𝐜𝐚𝐫𝐞𝐞𝐫 𝐚𝐬 𝐚 𝐂𝐥𝐨𝐮𝐝 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫, 𝐂𝐡𝐞𝐜𝐤 trendytech.in/?src=youtube&su... for curated courses developed by me.
    I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years.
    𝐖𝐚𝐧𝐭 𝐭𝐨 𝐌𝐚𝐬𝐭𝐞𝐫 𝐒𝐐𝐋? 𝐋𝐞𝐚𝐫𝐧 𝐒𝐐𝐋 𝐭𝐡𝐞 𝐫𝐢𝐠𝐡𝐭 𝐰𝐚𝐲 𝐭𝐡𝐫𝐨𝐮𝐠𝐡 𝐭𝐡𝐞 𝐦𝐨𝐬𝐭 𝐬𝐨𝐮𝐠𝐡𝐭 𝐚𝐟𝐭𝐞𝐫 𝐜𝐨𝐮𝐫𝐬𝐞 - 𝐒𝐐𝐋 𝐂𝐡𝐚𝐦𝐩𝐢𝐨𝐧𝐬 𝐏𝐫𝐨𝐠𝐫𝐚𝐦!
    "𝐀 8 𝐰𝐞𝐞𝐤 𝐏𝐫𝐨𝐠𝐫𝐚𝐦 𝐝𝐞𝐬𝐢𝐠𝐧𝐞𝐝 𝐭𝐨 𝐡𝐞𝐥𝐩 𝐲𝐨𝐮 𝐜𝐫𝐚𝐜𝐤 𝐭𝐡𝐞 𝐢𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰𝐬 𝐨𝐟 𝐭𝐨𝐩 𝐩𝐫𝐨𝐝𝐮𝐜𝐭 𝐛𝐚𝐬𝐞𝐝 𝐜𝐨𝐦𝐩𝐚𝐧𝐢𝐞𝐬 𝐛𝐲 𝐝𝐞𝐯𝐞𝐥𝐨𝐩𝐢𝐧𝐠 𝐚 𝐭𝐡𝐨𝐮𝐠𝐡𝐭 𝐩𝐫𝐨𝐜𝐞𝐬𝐬 𝐚𝐧𝐝 𝐚𝐧 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡 𝐭𝐨 𝐬𝐨𝐥𝐯𝐞 𝐚𝐧 𝐮𝐧𝐬𝐞𝐞𝐧 𝐏𝐫𝐨𝐛𝐥𝐞𝐦."
    𝐇𝐞𝐫𝐞 𝐢𝐬 𝐡𝐨𝐰 𝐲𝐨𝐮 𝐜𝐚𝐧 𝐫𝐞𝐠𝐢𝐬𝐭𝐞𝐫 𝐟𝐨𝐫 𝐭𝐡𝐞 𝐏𝐫𝐨𝐠𝐫𝐚𝐦 -
    𝐑𝐞𝐠𝐢𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐢𝐧𝐤 (𝐂𝐨𝐮𝐫𝐬𝐞 𝐀𝐜𝐜𝐞𝐬𝐬 𝐟𝐫𝐨𝐦 𝐈𝐧𝐝𝐢𝐚) : rzp.io/l/SQLINR
    𝐑𝐞𝐠𝐢𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐢𝐧𝐤 (𝐂𝐨𝐮𝐫𝐬𝐞 𝐀𝐜𝐜𝐞𝐬𝐬 𝐟𝐫𝐨𝐦 𝐨𝐮𝐭𝐬𝐢𝐝𝐞 𝐈𝐧𝐝𝐢𝐚) : rzp.io/l/SQLUSD
    BIG DATA INTERVIEW SERIES
    This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development
    Our highly experienced guest interviewer, Umesh Kumar Roy, / umesh-kumar-roy shares invaluable insights and practical guidance drawn from his extensive expertise in the Big Data Domain.
    Our expert guest interviewee, Satyam Meena, / satyam-meena-0a1b46138 has an interesting approach to answering the interview questions on Apache Spark, SQL and Azure Cloud Services.
    Link of Free SQL & Python series developed by me are given below -
    SQL Playlist - • SQL tutorial for every...
    Python Playlist - • Complete Python By Sum...
    Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field!
    Social Media Links :
    LinkedIn - / bigdatabysumit
    Twitter - / bigdatasumit
    Instagram - / bigdatabysumit
    Student Testimonials - trendytech.in/#testimonials
    TIMESTAMPS : Questions Discussed
    00:50 Introduction
    02:10 What sources do you use for data ingestion?
    02:25 What connectors do you use for data ingestion?
    02:45 How do you store and transform data after ingestion?
    03:58 How are you preprocessing the data?
    04:41 How do you eliminate duplicate records?
    05:12 How do you ensure the correct records when handling duplicates?
    05:50 How is your storage layer designed? Do you use mounting techniques?
    06:04 Do you use delta files? Why?
    07:00 What optimization techniques have you implemented?
    08:05 Do you use partitions?
    08:24 What factors do you consider when partitioning?
    09:11 Do you use bucketing?
    09:36 What are the use cases for partitioning and bucketing?
    10:33 Besides broadcast joins, what other joins do you use?
    10:52 Which join is the most efficient?
    11:50 What is the difference between narrow and wide transformations?
    12:26 What is your understanding about Spark and Databricks?
    13:22 How do you consume data from the gold layer?
    14:42 How do you connect Power BI to Azure Synapse?
    15:46 Can you outline Spark architecture?
    17:07 What is a DAG?
    18:15 What is the difference between client mode and cluster mode?
    19:29 Have you faced any challenges with cluster mode?
    20:50 Why do DataFrames and Datasets exist?
    22:17 What do you understand by normalization?
    22:51 What other optimization techniques do you use?
    23:33 SQL query
    Music track: Retro by Chill Pulse
    Source: freetouse.com/music
    Background Music for Video (Free)
    Tags
    #mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs

КОМЕНТАРІ • 4

  • @gudiatoka
    @gudiatoka Місяць тому +5

    When someone saying they are optimizing the code in databricks..all are faking😂😂.
    Spark itself optimize your code using catalytst optimizer/Spark sql engine and after spark 3.0 when Adaptive Query Execution(AQE) introduced it also optimized join during run time and we can alter the broadcast threshold which is part of admin team during databricks cluster creation
    The only things didnt impact by above two is those things stored inside user defined memory like udfs and low level programming on rdd ops which now a days no one doing in databricks.last one is caching manually also

    • @SrihariSrinivasDhanakshirur
      @SrihariSrinivasDhanakshirur Місяць тому +3

      Not necessarily, there are other lot of optimizations we can do on resource level, partitioning, bucketing etc

    • @LearnifyTvKannada-ue6op
      @LearnifyTvKannada-ue6op 19 днів тому

      ​@@SrihariSrinivasDhanakshirurexactly there are a lot of other optimisations

  • @hdr-tech4350
    @hdr-tech4350 13 днів тому

    Source type, project discussion
    Handling duplicates
    Delta lake feature
    Spark vs dbx
    Power bi connect to synapse
    Spark architecture
    Dag
    Client mode vs cluster mode
    Df vs dataset
    Normalisation
    2nd highest salary in dep