Part two Pytest integration with ETL pipeline: ua-cam.com/video/7FPksG-LYOA/v-deo.html Part three of Pytest - Data Quality report: ua-cam.com/video/Sv6QWF7J63k/v-deo.html
You did a great job. I was looking same material for long time. Thanks man for sharing great content. I have many questions on pytest, will ask many questions once I go through all videos . Thanks
Thank you for a great tutorial! You already have few different videos, can you add a number(to order them) to each tutorial it can help which video is the first and which one is the last.
Thanks and good suggestion. I have consolidated the data quality videos in their own playlist. Here is the link: ua-cam.com/play/PLaz3Ms051BAkgmoRZEcGFvQzY4YW_SR8b.html
Here is the link to the video in the series that runs data quality test against sql server. ua-cam.com/video/7FPksG-LYOA/v-deo.html Here is the link to the series: ua-cam.com/play/PLaz3Ms051BAkgmoRZEcGFvQzY4YW_SR8b.html
You can embed these tests in your Data Pipeline, below is an example. Once you schedule it via an orchestrator then these tests will run each time your pipeline is triggered. You can use any tool like Airflow, Dagsters, Prefect or cron to schedule Python based pipelines. ua-cam.com/video/7FPksG-LYOA/v-deo.html&ab_channel=BIInsightsInc Airflow: ua-cam.com/video/eZfD6x9FJ4E/v-deo.html&ab_channel=BIInsightsInc Dagster: ua-cam.com/video/f1TbVGdhmYg/v-deo.html&ab_channel=BIInsightsInc
I will try and cover this in the future. In the meantime you can check out the following videos on the testing and automating the ETL pipelines. ua-cam.com/video/7FPksG-LYOA/v-deo.html ua-cam.com/video/Sv6QWF7J63k/v-deo.html&t ua-cam.com/video/7UQ91Ib7PtU/v-deo.html&t How to automate Python based ETL pipelines. ua-cam.com/video/f1TbVGdhmYg/v-deo.html&t ua-cam.com/video/eZfD6x9FJ4E/v-deo.html&t ua-cam.com/video/IsuAltPOiEw/v-deo.html
If you want to log the test for review or sharing then check out the next video. I haven't played around with Tqdm but here is there docs and implementation. Maybe in the future I will implement this in a project. github.com/tqdm/tqdm
Thanks for spotting this. I have updated the code base. You can use the following assertion. # check for nulls def test_null_check(df): assert df['ProductKey'].notnull().all()
If the data type of this column is string or object then it will be pass. If you have datatype of Int or float then it will fail. You can also remove the "O" and test for string if that's the objective. Here is an example of this test with int. github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/Pytest/Session%20one/string%20and%20object%20test%20result.png
@@BiInsightsIncThanks for responding. When I have the column value as 1, which is int below assertion is passing. I tried to remove "O" and then it's failing but it fails even if the data type is string. assert (df["Genre"].dtype == str or df["Genre"].dtype == 'O')
@@dmunagala you need to check the data type. Value might be 1 but it can be stored as string. Check my previous comment I have link to this test and it’s failing with int data type.
@@BiInsightsInc Yes, you are right. I checked the datatype by using, df.info() and got to know the exact datatypes for all columns in my csv file. It is working as expected. Thank you so much for your help, you are amazing!!
Part two Pytest integration with ETL pipeline: ua-cam.com/video/7FPksG-LYOA/v-deo.html
Part three of Pytest - Data Quality report: ua-cam.com/video/Sv6QWF7J63k/v-deo.html
Can you mention a blog or link that shows roadmap/sequence of your videos for ETL ?
@@srh1034 sure. Here is an overview of the channel's content and the ETL series sequence.
ua-cam.com/video/pjiv6j7tyxY/v-deo.html
The best data engineering UA-camr I've had the pleasure to find. Thanks and please keep it up!
Heart felt thanks to you for all these recorded sessions/tutorials .. you have made life so simple.
The best data engineering UA-camr Thank you
Nicely explained. Good presentation, well organized, well spoken. Keep up the good work.
Articulate explanation!You’re the Best!!Thank you so much .
You did a great job. I was looking same material for long time. Thanks man for sharing great content.
I have many questions on pytest, will ask many questions once I go through all videos . Thanks
Great and very helpful Content. Thank you.
Thank you for a great tutorial!
You already have few different videos, can you add a number(to order them) to each tutorial it can help which video is the first and which one is the last.
Thanks and good suggestion. I have consolidated the data quality videos in their own playlist. Here is the link: ua-cam.com/play/PLaz3Ms051BAkgmoRZEcGFvQzY4YW_SR8b.html
could you please do this with apache beam…. jdbc source to Bigquery …. or you help me in this… i really need this kind of information
Thanks for this video, is there a video on how to do these runs on SQL server, pgadmin or Athena ?
Here is the link to the video in the series that runs data quality test against sql server.
ua-cam.com/video/7FPksG-LYOA/v-deo.html
Here is the link to the series: ua-cam.com/play/PLaz3Ms051BAkgmoRZEcGFvQzY4YW_SR8b.html
Thanks for such important info.
How to automate these test cases?
You can embed these tests in your Data Pipeline, below is an example. Once you schedule it via an orchestrator then these tests will run each time your pipeline is triggered. You can use any tool like Airflow, Dagsters, Prefect or cron to schedule Python based pipelines.
ua-cam.com/video/7FPksG-LYOA/v-deo.html&ab_channel=BIInsightsInc
Airflow: ua-cam.com/video/eZfD6x9FJ4E/v-deo.html&ab_channel=BIInsightsInc
Dagster: ua-cam.com/video/f1TbVGdhmYg/v-deo.html&ab_channel=BIInsightsInc
Please make video on etl automation testing from scratch and make seperate playlists
I will try and cover this in the future. In the meantime you can check out the following videos on the testing and automating the ETL pipelines.
ua-cam.com/video/7FPksG-LYOA/v-deo.html
ua-cam.com/video/Sv6QWF7J63k/v-deo.html&t
ua-cam.com/video/7UQ91Ib7PtU/v-deo.html&t
How to automate Python based ETL pipelines.
ua-cam.com/video/f1TbVGdhmYg/v-deo.html&t
ua-cam.com/video/eZfD6x9FJ4E/v-deo.html&t
ua-cam.com/video/IsuAltPOiEw/v-deo.html
Very helpul. Thank you.
Great video, thanks
How to add a logger to it with Tqdm progress bar
If you want to log the test for review or sharing then check out the next video. I haven't played around with Tqdm but here is there docs and implementation. Maybe in the future I will implement this in a project.
github.com/tqdm/tqdm
Function test_null_check(df) will always return passed
Thanks for spotting this. I have updated the code base. You can use the following assertion.
# check for nulls
def test_null_check(df):
assert df['ProductKey'].notnull().all()
@@BiInsightsInc Thank you. Your content is very useful.
Thanks
Video resolution is poor.
Please try it in HD 1080p.
def test_Genre_dtype_str(df):
assert (df["Genre"].dtype == str or df["Genre"].dtype == 'O')
This test case is always returned Pass
If the data type of this column is string or object then it will be pass. If you have datatype of Int or float then it will fail. You can also remove the "O" and test for string if that's the objective. Here is an example of this test with int.
github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/Pytest/Session%20one/string%20and%20object%20test%20result.png
@@BiInsightsIncThanks for responding.
When I have the column value as 1, which is int below assertion is passing. I tried to remove "O" and then it's failing but it fails even if the data type is string.
assert (df["Genre"].dtype == str or df["Genre"].dtype == 'O')
@@dmunagala you need to check the data type. Value might be 1 but it can be stored as string. Check my previous comment I have link to this test and it’s failing with int data type.
@@BiInsightsInc Yes, you are right. I checked the datatype by using, df.info() and got to know the exact datatypes for all columns in my csv file. It is working as expected. Thank you so much for your help, you are amazing!!
Thanks