How to read json file in PySpark dataframe | convert json to pyspark df

Поділитися
Вставка
  • Опубліковано 5 жов 2024
  • In this Pyspark tutorial for beginners video, I have explained how to read json (JavaScript Object Notation) file in Pyspark Dataframe using Google Colab. The steps and the Pyspark syntax to convert JSON file to dataframe PySpark can work anywhere. The spark.read.json() Pyspark example in this video can be executed on various Apache Spark, Python platforms and Python notebooks like Azure Databricks, Databricks and Jupyter Notebook and Kaggle notebook as well.
    #pyspark #googlecolab #pandas #jupyternotebook #databricks
    If you just follow the same code, it would be sufficient to read .json file in databricks using pyspark and also jupyter notebook. The JSON file name, path other arguments passed to read.json function can vary as per the user and project requirements.
    There are some other methods in Pyspark to read json files, but for this particular video, I am demonstrating it with the most basic and simple PySpark commands. It is also possible to perform same task in python using Pandas library. There are some minor changes you need to make to read json file in Google Colab using Pandas. I will make a separate video to cover that topic.
    • pyspark code to read json file
    spark = SparkSession.Builder().master("master_name").appName("app_name").getOrCreate()
    df = spark.read.json("file_path")
    df.printSchema()
    df.show()
    --------------------------------------- OR -------------------------------------------
    #exact code from above video
    from pyspark.sql import SparkSession
    spark = SparkSession.builder.appName("app1").getOrCreate()
    df = spark.read.json("/content/sample_data/anscombe.json")
    df.show()
    df.printSchema()
    -------------------------------------------------------------------------------------------
    Jump directly to the particular topic using below Timestamps:
    0:00 - Introduction
    0:37 - What is JSON file
    1:04 - Google Colab JSON file location
    1:39 - Analyzing JSON file content
    2:21 - Create SparkSession
    3:17 - spark.read.json("") syntax
    4:09 - Display result dataframe
    5:05 - Result df and input JSON
    5:55 - Result dataframe schema
    6:25 - Conclusion
    I am using the exact code for read JSON pyspark example taken in this video.
    For this particular example, I have used a .json file that is already provided under Google Colab files folder in sample data folder. But, it is also possible to read JSON file on google colab from desktop that you can find in other my video from my UA-cam channel @promptsurfing .
    Moreover, you can read data from google drive in colab, this can be achieved once you mount drive on google colab. There is a small code for mounting drive in google colab so you can access each and every file from google drive. It is possible from UI as well that you can find on my channel.

КОМЕНТАРІ •