2nd Data Engineering Interview | Apache Spark Interview | Live Big Data Interview

Поділитися
Вставка
  • Опубліковано 27 жов 2024

КОМЕНТАРІ • 63

  • @priyankadhamija886
    @priyankadhamija886 3 роки тому +3

    I have seen so many videos but your is best on all topics. Precise and cover all the interview questions almost.

    • @DataSavvy
      @DataSavvy  3 роки тому

      Thanks Priyanka... I am happy that you like it

  • @nikhilv199138
    @nikhilv199138 4 роки тому +17

    These type of videos are extremely helpful. If you could prepare a video about scala interview questions that would be of great help!!

    • @DataSavvy
      @DataSavvy  4 роки тому +4

      Sure Nikhil... That is already in plan.. it is just difficult to find volunteers for Mock Interview

    • @somasundaramvalliappan3851
      @somasundaramvalliappan3851 4 роки тому

      Can anyone please help me with sample resumes for scala, it's very hard for me to find s ala resumes in internet

  • @DoraChintala
    @DoraChintala Рік тому

    Some scenarios where dropping a schema but not the data are 1)Reorganizing the database structure 2)Cleaning up unused schemas 3) Rebuilding the schema from scratch.

  • @atanu4321
    @atanu4321 4 роки тому +16

    Good Initiative Data Savvy, this will really help full for those who is preparing for interviews. One suggestions, if you will create a followup video where you can explain what good or wrong answer the candidate has given or what is the correct answer the candidate should give in order to get more acceptance from interviewer. a kind of analysis of this interview.

    • @DataSavvy
      @DataSavvy  4 роки тому +8

      Thanks Atanu... I will plan for that... Your suggestion is very valuable

    • @nikhilv199138
      @nikhilv199138 4 роки тому +2

      Exactly

    • @DataSavvy
      @DataSavvy  4 роки тому +3

      Point noted... If anyone of you can volunteer, it will help me create these kind of videos...

    • @omkarjoshi3750
      @omkarjoshi3750 4 роки тому

      @@DataSavvy hello sir,
      I am interested for volunteering. But I am fresher (2020 pass-out). If it is ok then I can volunteer.

    • @manojkumar-oc1sp
      @manojkumar-oc1sp 4 роки тому

      @@DataSavvy I am interested to volunteer..

  • @graceindia3122
    @graceindia3122 3 роки тому +3

    Great vedio. But the candidate seems to be have ETL informatica developer experience not data engineer experience. He was not able to ans major questions of Spark. 😀. But good initiative data savvy, helps me to test my knowledge on Spark, big data and I m able to ans many questions.

  • @ankurrunthala
    @ankurrunthala 3 роки тому +1

    I wish I got senior like u can learn more knowledge ❤️nice question ⁉️ .....sir u should make the anwers also ....mostly problematic answer ❤️❤️❤️❤️

  • @johnsonrajendran6194
    @johnsonrajendran6194 4 роки тому +1

    I found this video to be really helpful sir....Please create more such videos🙏

  • @Chittaluri
    @Chittaluri 4 роки тому +5

    Thanks a lot team, specially to Harjeet this video boosted my confidence towards interviews, please kindly post more interview videos

    • @DataSavvy
      @DataSavvy  4 роки тому +2

      Thanks Sai... Yes I plan to create more videos

  • @DoraChintala
    @DoraChintala Рік тому

    Based on Problem Statement, My Answer would be firstly by using Apache Kafka or Amazon Kinesis handle the streaming data and dump into Aws S3, Since Aws S3 is acts like a Data Lake. After that by using Apache spark do some essential data processing and then ingest data into Aws Redshift or any other Datawarehouse by using Aws Glue as ETL.

  • @ansarhayat6276
    @ansarhayat6276 3 роки тому +6

    1.common current working task?
    2. What type of problem have you face current task?
    3.loading data into data lake ,which changes you face?
    4.how you handle increamental data?by batch or stream?how much size of daily process data?
    5.snarion:sales data group product category per hour. need result of half historical+ half real time data in report?
    6.which tools possible to use for above sanirio? kafka,event hub
    7.how to tranfermation in kafka?
    ****Hive****
    8.hive external and internal table keys different? give use case
    9.when use static/dynamic partation in hive table?
    10.daily transcational table with year,date colum we can use any one of them,its static/dynamic partation?
    soultion: we partation on date colum which dynamic.each day day data place in daily partation.
    ***Spark***
    11.why,which language you use in spark ?desc its benifits
    12.you use df and data set? any error on runtime in df/dataset ?give example
    13.spark end coders?
    14.1 TB data process by spark ,how distrbute memory of core,driver,executor
    15.scala case class and regular calss difference?
    ***DB***
    16.have you work any non-relational db?
    17.a given tabel with three colum need to show one row of data use of grouping
    CREATE DATABASE big_data;
    USE big_data;
    CREATE TABLE user_info
    (user_name NVARCHAR(255),user_age INT,user_loc NVARCHAR(255))
    INSERT INTO user_info (user_name,user_age,user_loc)VALUES('ansar',30,'bang'),('ansar',30,'fsk');
    SELECT * FROM user_info;
    SELECT DISTINCT user_name,user_age FROM user_info;
    SELECT DISTINCT user_name,user_age,user_loc FROM user_info
    GROUP BY user_name,user_age;

    • @0yustas0
      @0yustas0 3 роки тому +2

      Just for fun with Hive:
      SET hivevar:rnd = CAST(ROUND(RAND()) AS INT);
      SELECT user_name,user_age,collect_list(user_loc)[${hivevar:rnd}] AS c1, MAX(user_loc) AS c2
      FROM user_info
      GROUP BY user_name,user_age;

  • @rahulmaheshwari5582
    @rahulmaheshwari5582 3 роки тому +1

    Very informative. Thanks for the video. 🙏

  • @mamamiakool
    @mamamiakool 3 роки тому +3

    Hi Harjit, you are doing a great job for the community. Is there a way i can connect with you on Linkedin or via email?
    Also, do you plan to conduct similar interviews about Spark Streaming/Kafka?

  • @bhavaniv1721
    @bhavaniv1721 3 роки тому

    Thank you so much for sharing this kind of videos , really I understand that how interview happen 🙏

  • @anshusharaf2019
    @anshusharaf2019 8 місяців тому

    In this scenario-based question can we create an end-to-end pipeline using the Kafka and power BI dashboard like..we can connect with your database as a source connector and for the transformation we can use KSQL DB where we perform some business-level transformation and after that store it into the Kafka-topic and then connect with the power BI for dashboard?
    @dataSavvy or someone, can u check Am I right thinking?

  • @raviyadav-dt1tb
    @raviyadav-dt1tb 10 місяців тому

    Can you please provide aws questions and for data engineer, it will be helpful for us thanks 🙏

  • @vibhavaribellutagi9439
    @vibhavaribellutagi9439 4 роки тому

    Really helpful. thanks a lot for the video.

  • @phanidbd7284
    @phanidbd7284 4 роки тому +2

    Great Thanks.... Can you please create a video with answers for these questions ...It really helps... Or add your comments at the end of the video

    • @DataSavvy
      @DataSavvy  4 роки тому +5

      That's a good suggestion... Let me look into this

  • @prachigupta7688
    @prachigupta7688 4 роки тому +3

    In the last question, to combine emp data on name, age and select random location, if we use groupby & collect list, won't it create the list of all loc for the group of emp name and age ? Shouldnt the use of other function like max, first etc will help in this scenario?

  • @yelururao1
    @yelururao1 3 роки тому

    Hi sir..
    Please do more videos like this..

  • @usharani7125
    @usharani7125 2 роки тому

    Harjit let me know if you are taking Ang training session please

  • @sachinchandanshiv7578
    @sachinchandanshiv7578 2 роки тому

    Hi Sir,
    How much it's important to know snowflake for big data engineer?

  • @rajasekharreddy7624
    @rajasekharreddy7624 4 роки тому

    Hi DataSavvy, Please let me know your free time will discuss about the mock interview to me.

  • @anirbandatta2037
    @anirbandatta2037 3 роки тому

    Hi @Data Savvy, can you plan for a senior level interview, may be people with more than 16-20 yrs of experience?

  • @rashmidogra7792
    @rashmidogra7792 3 роки тому

    what is sql question he is asking, I could not understand completely.

  • @ashokkodari5042
    @ashokkodari5042 4 роки тому +1

    What will be the usecase for dropping schema instead of truncating complete tbl? For only restore data in future or any other major reason?

    • @RakeshGupta23
      @RakeshGupta23 4 роки тому +3

      Major use case for dropping schema or. Creating external table when you have storage area outside your Hadoop e.g. client want data to be stored in S3 or data stored in mongodb.

    • @manojkumar-oc1sp
      @manojkumar-oc1sp 4 роки тому +1

      @@RakeshGupta23 Thanks bro.. One more question.. what will happen if we delete the external table file folder.

    • @DataSavvy
      @DataSavvy  4 роки тому +2

      This is usually done when more than one team is consuming same data and also using different tech to consume it

    • @RakeshGupta23
      @RakeshGupta23 4 роки тому +2

      @@manojkumar-oc1sp you mean ,you are keeping the external table schema but deleting the folder and file from hdfs?. in that case you won't be able to access the data as in hdfs it looks like when you create a database or table but under the hood it's always a file or folder.

    • @ashokkodari5042
      @ashokkodari5042 4 роки тому

      @@RakeshGupta23 Thanks Rakesh for your quick response

  • @yadlapallipriyanka9000
    @yadlapallipriyanka9000 4 роки тому

    Hi Harjeet... I am Priyanka and I would like to volunteer for mock interview on BigData

  • @nitinm1473
    @nitinm1473 4 роки тому +1

    What is the function for grouping distinct and select random value from other column?

    • @ashutoshsamanta4244
      @ashutoshsamanta4244 4 роки тому +1

      You can use first()

    • @prabhaker9031
      @prabhaker9031 3 роки тому

      @@ashutoshsamanta4244 Ah thanks man. I was trying to put a max filter and whatnot

  • @ghumredhanu6381
    @ghumredhanu6381 2 роки тому

    Sir can you make of kafkha community

  • @Iamsatya_1
    @Iamsatya_1 4 роки тому

    Can we solve the sales problem using classification. i.e - we can train our historical data by logistics regression and then predict the value of sales using evaluate function on new data.

  • @paul4367
    @paul4367 4 роки тому +1

    Can u call some data analysts for mock interview too plz??

    • @DataSavvy
      @DataSavvy  4 роки тому +1

      I am finding it difficult to get volunteers... Let me explore that

    • @ashleylemos3977
      @ashleylemos3977 3 роки тому

      @@DataSavvy I would love to give mock interviews to all of your data engineering questions in case you looking out for candidates with 12+ years of experience in PySpark, AWS, Spark SQL, Jenkins CI/CD, Glue, Kafka, Python, Hive, Athena, Presto, Bash, Airflow, Nifi.

  • @nikhilmishra7572
    @nikhilmishra7572 4 роки тому

    @30.02 what would be the solution? Having count(*)>1 after group by?

  • @uditsethia7
    @uditsethia7 2 роки тому

    LAMBDA ARCHITECTURE

  • @awanishkumar6308
    @awanishkumar6308 3 роки тому

    Sir in red t-shirt is making the interview questions very uninteresting even though spark itself is very much interning in terms of its concept and its working principle,, but sorry you are ruining the interest of learning