Question 14: Interview question for data engineers

Поділитися
Вставка
  • Опубліковано 16 вер 2024
  • In this video I have discussed on the question that was asked in an MNC interview for data engineer checking whether the interviewee have worked with JSON files or not.
    You are tasked with processing a JSON file containing information about sales transactions. Each transaction record consists of the transaction ID, the customer ID, the product ID, the quantity sold, and the timestamp of the transaction. Your goal is to analyze this data using PySpark and perform the following tasks:
    Calculate the total sales revenue generated from each product.
    Identify the top-selling product.
    Determine the total number of transactions for each customer.
    Find the customer who made the most transactions.
    Sample Json
    [
    {"transaction_id": 1, "customer_id": 101, "product_id": 1, "quantity": 2, "timestamp": "2024-01-01 08:00:00"},
    {"transaction_id": 2, "customer_id": 102, "product_id": 2, "quantity": 1, "timestamp": "2024-01-01 08:30:00"},
    {"transaction_id": 3, "customer_id": 103, "product_id": 1, "quantity": 3, "timestamp": "2024-01-01 09:00:00"},
    {"transaction_id": 4, "customer_id": 101, "product_id": 3, "quantity": 1, "timestamp": "2024-01-01 10:00:00"},
    {"transaction_id": 5, "customer_id": 102, "product_id": 1, "quantity": 2, "timestamp": "2024-01-01 10:30:00"},
    {"transaction_id": 6, "customer_id": 103, "product_id": 2, "quantity": 2, "timestamp": "2024-01-01 11:00:00"}
    ]
    To create dataframe
    sales_df = spark.read.option("multiline",True).json("dbfs:/FileStore/transaction.json")
    #pyspark #mnc #dataengineer #azure #databricks #interview #questions #bigdata #bigdataquestions #json

КОМЕНТАРІ • 2

  • @rawat7203
    @rawat7203 6 місяців тому +1

    Thankyou sir

    • @pysparkpulse
      @pysparkpulse  6 місяців тому

      Thank you for your appreciation 😊