32
8 259

Data Analysis with T-SQL | Episode 1: Customer Segmentation with SQL Server

8:04

Master PySpark date_format() Function in Databricks with Complex Examples | Date Manipulation

6:56

How to Fetch API Data and Implement Incremental Loading in PySpark with Delta Lake | Databricks

9:32

Mastering Padding in PySpark | Complex Example in Databricks

6:47

Advanced PySpark Tutorial: Using collect_list Function in Databricks with Complex Examples

5:51

Calculating Sales Differences Using PySpark: Lag Function and Window Specifications

6:11

Data Analysis with T-SQL | Episode 2: Customer Lifetime Value (CLV) Calculation & Segmentation

In this episode of the "Data Analysis with T-SQL" series, we explore how to calculate Customer Lifetime Value (CLV) using SQL Server. CLV is a crucial metric for understanding the long-term profitability of your customers. This tutorial demonstrates how to calculate Purchase Frequency, Average Order Value (AOV), and Customer Lifespan, which are key components for determining CLV. We also dive into customer segmentation based on profitability, categorizing customers as Non-Profitable, Profitable, or Very Profitable based on their CLV.
By the end of this video, you'll learn how to:
Prepare customer sales data using Common Table Expressions (CTEs).
Calculate CLV using a formula that incorporates AOV, Purchase Frequency, and Customer Lifespan.
Segment customers into different groups based on their profitability.
This video is perfect for those who want to enhance their SQL data analysis skills and gain insights into customer behavior using T-SQL.
🔔 Subscribe for more SQL tutorials
👍 Like this video if you find it helpful!
💬 Comment below with any questions or suggestions for future videos.

Відео

Data Analysis with T-SQL | Episode 1: Customer Segmentation with SQL Server

8:04

Data Analysis with T-SQL | Episode 1: Customer Segmentation with SQL Server

Переглядів 1112 годин тому

In this episode, we walked through the essentials of customer segmentation using T-SQL, a crucial practice in data analysis. We demonstrated how to prepare, aggregate, and dynamically segment your data, giving you actionable insights into customer behavior. In the next episodes, we’ll dive deeper into data analysis techniques using SQL Server, covering topics like advanced querying, performance...

Master PySpark date_format() Function in Databricks with Complex Examples | Date Manipulation

6:56

Master PySpark date_format() Function in Databricks with Complex Examples | Date Manipulation

Переглядів 3421 день тому

In this PySpark tutorial, we dive deep into the date_format() function and how to apply it in complex scenarios within Databricks. We'll walk through various examples that demonstrate how to reformat dates, extract specific parts of a date (like the year, month, or day of the week), and combine multiple transformations in a PySpark DataFrame. What You’ll Learn: How to change date formats in PyS...

How to Fetch API Data and Implement Incremental Loading in PySpark with Delta Lake | Databricks

9:32

How to Fetch API Data and Implement Incremental Loading in PySpark with Delta Lake | Databricks

Переглядів 19121 день тому

In this tutorial, I'll guide you step-by-step on how to fetch data from a free API and implement incremental loading using PySpark in Databricks with Delta Lake. We'll use the JSONPlaceholder API as an example and demonstrate how to ensure that only new data is appended to the Delta Lake table. This method helps in avoiding redundant records and improves data processing efficiency. This video c...

Mastering Padding in PySpark | Complex Example in Databricks

6:47

Mastering Padding in PySpark | Complex Example in Databricks

Переглядів 3121 день тому

Welcome to this PySpark tutorial on Padding in Databricks! In this video, we will explore how to use padding functions effectively in PySpark. Padding can be useful when you want to manipulate strings in your DataFrame to achieve consistent formatting, like aligning numbers, adding leading zeros, etc. We will work through a complex example that demonstrates padding within PySpark transformation...

Advanced PySpark Tutorial: Using collect_list Function in Databricks with Complex Examples

5:51

Advanced PySpark Tutorial: Using collect_list Function in Databricks with Complex Examples

Переглядів 4921 день тому

In this PySpark tutorial, we dive deep into the collect_list function to show how to aggregate data into lists in a distributed environment. We'll explore a complex example, using real-world scenarios where collect_list can streamline your data transformation in Databricks. Perfect for data engineers, analysts, and developers working with big data. Topics Covered: Introduction to collect_list W...

Calculating Sales Differences Using PySpark: Lag Function and Window Specifications

6:11

Calculating Sales Differences Using PySpark: Lag Function and Window Specifications

Переглядів 1321 день тому

In this video, we demonstrate how to use PySpark to analyze sales data by applying the lag function and window specifications. We start with a sample dataset of sales transactions with new salesperson names and dates. Data Creation: We create a DataFrame with columns for salesperson names, dates, and sales amounts. Displaying the DataFrame: We show the initial DataFrame for a clear view of the ...

How to Use Widgets in Databricks to Filter Data Interactively | Databricks Widgets Tutorial

6:39

How to Use Widgets in Databricks to Filter Data Interactively | Databricks Widgets Tutorial

Переглядів 29Місяць тому

In this video, I'll show you how to create and use widgets in Databricks to make your data filtering more interactive and dynamic. Widgets allow you to add dropdowns, text fields, and multiselect options directly in your Databricks notebooks. I'll guide you through an example where we filter sales data based on region, sales thresholds, and product categories selected through widgets. Watch til...

Databricks Tutorial: Recursive File Lookup in PySpark (CSV Files)

0:59

Databricks Tutorial: Recursive File Lookup in PySpark (CSV Files)

Переглядів 77Місяць тому

UA-cam Video Description: In this video, I'll show you how to use recursive file lookup in Databricks with PySpark. This allows you to read CSV files from a directory and all its subdirectories automatically. You'll learn how to: Set up a file path for recursive reading. Automatically detect CSV headers and infer schema. Add a column to track which file each row came from. This method is perfec...

Optimizing PySpark Performance: Breaking Down DAGs into Stages

10:04

Optimizing PySpark Performance: Breaking Down DAGs into Stages

Переглядів 1428 місяців тому

Hello, everyone! Welcome to today's PySpark tutorial, where we'll explore the intriguing realm of optimizing performance by breaking down Directed Acyclic Graphs (DAGs) into stages. Code link: github.com/ekhosravie/Optimizing-PySpark-Performance-Breaking-Down-DAGs-into-Stages Our objective is to grasp how this technique significantly boosts the execution speed of PySpark applications. Why Break...

AI in Data Engineering : Anomaly Detection in Retail Sales Transactions using Isolation Forest

25:43

AI in Data Engineering : Anomaly Detection in Retail Sales Transactions using Isolation Forest

Переглядів 2828 місяців тому

Hello and welcome to our UA-cam tutorial on anomaly detection in retail sales transactions using AI in data engineering! Today, we're going to explore an exciting use case that demonstrates how to identify unusual and potentially fraudulent activities in a retail company's sales data by leveraging artificial intelligence techniques within the data engineering domain. Our goal is to provide you ...

PySpark and Pandas Harmony with Arrow in Databricks!

1:09

PySpark and Pandas Harmony with Arrow in Databricks!

Переглядів 248 місяців тому

Why Use This Code: This code demonstrates the integration between Spark and Pandas using Arrow for efficient data transfer. Arrow is particularly useful when you need to perform local Pandas operations on Spark DataFrames, minimizing the overhead of data conversion and enhancing performance. It illustrates a common scenario where data is moved seamlessly between Spark and Pandas, allowing users...

AI Insights: Decoding Product Reviews with Databricks(PySpark) & GPT-2

26:55

AI Insights: Decoding Product Reviews with Databricks(PySpark) & GPT-2

Переглядів 328 місяців тому

Hi, everyone! Welcome back to our channel! Today, we've got something exciting for you. Imagine you're an avid online shopper, constantly exploring new products and reading through numerous reviews. It's all too familiar, right? The excitement of finding a hidden gem versus the disappointment of a purchase that doesn't quite live up to expectations. Now, what if there was a way to add a powerfu...

Building a PySpark Data Pipeline with Azure SQL Database and Synapse Analytics

25:18

Building a PySpark Data Pipeline with Azure SQL Database and Synapse Analytics

Переглядів 1549 місяців тому

Scenario: In this tutorial, we will walk through the process of creating a PySpark data pipeline using Databricks, fetching data securely from Azure SQL Database, applying transformations, and storing the results in Azure Synapse Analytics (Azure SQL Data Warehouse). We will cover advanced topics such as secure credential handling, data partitioning, Spark session isolation levels, and a basic ...

Understanding Partition Pruning in PySpark for Improved Query Performance

9:23

Understanding Partition Pruning in PySpark for Improved Query Performance

Переглядів 2039 місяців тому

Partition pruning is a query optimization technique used in PySpark to improve the efficiency of data processing. It involves scanning only the partitions of a DataFrame that are relevant to the query, based on certain predicates or filters. This helps in reducing the amount of data that needs to be read and processed, resulting in faster query execution times and improved overall performance. ...

17:21

Understanding Broadcast join in PySpark

Переглядів 3259 місяців тому

Understanding Broadcast join in PySpark

Simplify Reading Nested JSON Data from MongoDB into PySpark DataFrame

15:29

Simplify Reading Nested JSON Data from MongoDB into PySpark DataFrame

Переглядів 1149 місяців тому

Simplify Reading Nested JSON Data from MongoDB into PySpark DataFrame

Automating Schema Generation in PySpark with Databricks

19:02

Automating Schema Generation in PySpark with Databricks

Переглядів 2779 місяців тому

Automating Schema Generation in PySpark with Databricks

Comparing Vectorized UDFs and Built-in Functions in Databricks (pyspark)

27:36

Comparing Vectorized UDFs and Built-in Functions in Databricks (pyspark)

Переглядів 589 місяців тому

Comparing Vectorized UDFs and Built-in Functions in Databricks (pyspark)

Efficient Schema Evolution in Delta Lake with PySpark: A Databricks Tutorial

13:21

Efficient Schema Evolution in Delta Lake with PySpark: A Databricks Tutorial

Переглядів 2299 місяців тому

Efficient Schema Evolution in Delta Lake with PySpark: A Databricks Tutorial

11:08

Dynamic Partitioning In Databricks

Переглядів 4649 місяців тому

Dynamic Partitioning In Databricks

Pyspark for sql developers - statistical functions - Part 1

37:43

Pyspark for sql developers - statistical functions - Part 1

Переглядів 38Рік тому

Pyspark for sql developers - statistical functions - Part 1

25:07

Pyspark for sql developers - joins

Переглядів 35Рік тому

Pyspark for sql developers - joins

36:32

Pyspark For Sql developers

Переглядів 88Рік тому

Pyspark For Sql developers

Fetch Data from Mongodb by sql server (Polybase)

9:35

Fetch Data from Mongodb by sql server (Polybase)

Переглядів 239Рік тому

Fetch Data from Mongodb by sql server (Polybase)

29:09

From SSAS Cube To PySpark MLib

Переглядів 243Рік тому

From SSAS Cube To PySpark MLib

PySpark : Read and Write from/to Sql Server Via JDBC

24:10

PySpark : Read and Write from/to Sql Server Via JDBC

Переглядів 2,8 тис.Рік тому

PySpark : Read and Write from/to Sql Server Via JDBC

30:50

CockroachDb Vs sql server (TSQL)

Переглядів 692 роки тому

CockroachDb Vs sql server (TSQL)

КОМЕНТАРІ

@ll_ashu_ll 2 місяці тому
Please make more videos related to pyspark and databricks
@houstonfirefox 2 місяці тому
Very good video. I would recommend ensuring all syntax errors be edited out or re-recorded so the viewer doesn't get confused. The channel name (Data Engineering Toolbox) is a bit confusing as these function comparisons between SQL Server and PySpark fall under the realm of Data Science. A Data Engineer moves, converts and stores data from system to system whereas a Data Scientist extracts and interprets the data provided by the Data Engineer. A small point to be sure but wanted to be more accurate. In Variance: The avg_rating column returned integer values because the underlying column "review_score" was also an integer. To get the PySpark equivalent of a floating point avg_rating you could change the column type to FLOAT (unnecessary really) or use CONVERT(FLOAT, VAR(review_score)) to return the true (more accurate) Variation complete with decimal places. New sub. I am interested to see even more Data Science equivalent functions in SQL Server that may be native ( i.e; CORR() ) and how to write functions that emulate some of the functionality in PySpark 🙂
@DataEngineeringToolbox 2 місяці тому
thanks for amazing your suggestions
@yashwanthv5604 6 місяців тому
Can you provide code github link
@Grover-mb 6 місяців тому
buen video amigo, pero me sale un error que no puedo solucionarlo --------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) Cell In[10], line 14 3 spark = sqlContext.sparkSession \ 4 #.appName("Mi_Aplicacion") \ 5 #.getOrCreate() 6 7 # Tu código de Spark aquí 8 jdbcDF = spark.read.format('jdbc') \ 9 .option('url',url) \ 10 .option('query',query) \ 11 .option('user',user) \ 12 .option('password',password) \ 13 .option('driver',driver) \ ---> 14 .load() 16 spark.stop() # No olvides detener la sesión de Spark al finalizar File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pyspark\sql eadwriter.py:314, in DataFrameReader.load(self, path, format, schema, **options) 312 return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path))) 313 else: --> 314 return self._df(self._jreader.load()) File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\py4j\java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ ... at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:1570) Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
@venvilhenrydsilva8354 7 місяців тому
"You are trying to pass an insecure Py4j gateway to Spark. This" " is not allowed as it is a security risk." while sc = SparkContext(conf=conf)
@nickoder4374 7 місяців тому
parasha
@平凡-p1v 8 місяців тому
code is not clear to follow.
@DataEngineeringToolbox 8 місяців тому
Thanks , I will try to provide better tutorial in the future
@sravankumar1767 8 місяців тому
Superb explanation 👌 👏 👍
@DataEngineeringToolbox 8 місяців тому
Thanks
@lucaslira5 9 місяців тому
Using auto loader it’s not necessary
@平凡-p1v 9 місяців тому
the video is not clear even in full screen mode.
@DataEngineeringToolbox 9 місяців тому
Thanks for the feedback! I apologize for the video quality issue. I'm working on improving it for future videos. Your input is valuable, and I appreciate your understanding
@DatabricksPro 9 місяців тому
Cool. thx
@leonmason9141 Рік тому
*promo sm*

Data Engineering Toolbox

КОМЕНТАРІ