How to create a new column in pyspark dataframe with calculation | withColumn in Spark | modify df

Поділитися
Вставка
  • Опубліковано 5 вер 2024
  • This pyspark tutorial for data engineers explains how to add a new column in a dataframe with calculation from existing columns or with function. So basically you can add a new column to an existing dataframe based on your requirement. In this tutorial we will see how to add one by calculating it's value from an existing column within same dataframe.
    The example in this pyspark tutorial for beginners shows the way we can create a new column named "Age" in pyspark dataframe from existing "DOB" column by adding few calculations along with pyspark withColumn function.
    pyspark withColumn allows you to hold different types of values, you can have a new column with
    • constant values
    • with null values
    • from a list
    • with current date
    • with default value
    • with calculation
    • with function
    The syntax to add a new column to an existing df is quite simple. Below is an example pyspark expression to add new column.
    df = df.withColumn("new_col_name", *calculations and functions applied on another column*)
    This is how pyspark add column based on another column or based on certain condition.

КОМЕНТАРІ •