How to test your Python ETL pipelines | Data pipeline | Pytest

Поділитися
Вставка
  • Опубліковано 26 лис 2024

КОМЕНТАРІ • 35

  • @BiInsightsInc
    @BiInsightsInc  Рік тому +3

    Part two Pytest integration with ETL pipeline: ua-cam.com/video/7FPksG-LYOA/v-deo.html
    Part three of Pytest - Data Quality report: ua-cam.com/video/Sv6QWF7J63k/v-deo.html

    • @srh1034
      @srh1034 5 місяців тому

      Can you mention a blog or link that shows roadmap/sequence of your videos for ETL ?

    • @BiInsightsInc
      @BiInsightsInc  5 місяців тому

      @@srh1034 sure. Here is an overview of the channel's content and the ETL series sequence.
      ua-cam.com/video/pjiv6j7tyxY/v-deo.html

  • @willosullivan3571
    @willosullivan3571 Рік тому +2

    The best data engineering UA-camr I've had the pleasure to find. Thanks and please keep it up!

  • @ShubbhasmitaSahani
    @ShubbhasmitaSahani Рік тому

    Heart felt thanks to you for all these recorded sessions/tutorials .. you have made life so simple.

  • @farhadshakibaca
    @farhadshakibaca Рік тому

    The best data engineering UA-camr Thank you

  • @safkaify7875
    @safkaify7875 2 місяці тому

    Nicely explained. Good presentation, well organized, well spoken. Keep up the good work.

  • @poojaak1678
    @poojaak1678 Рік тому

    Articulate explanation!You’re the Best!!Thank you so much .

  • @Sreenu1523
    @Sreenu1523 Рік тому

    You did a great job. I was looking same material for long time. Thanks man for sharing great content.
    I have many questions on pytest, will ask many questions once I go through all videos . Thanks

  • @soheilahg921
    @soheilahg921 Рік тому

    Great and very helpful Content. Thank you.

  • @gulnarabekirova4741
    @gulnarabekirova4741 10 місяців тому

    Thank you for a great tutorial!
    You already have few different videos, can you add a number(to order them) to each tutorial it can help which video is the first and which one is the last.

    • @BiInsightsInc
      @BiInsightsInc  10 місяців тому

      Thanks and good suggestion. I have consolidated the data quality videos in their own playlist. Here is the link: ua-cam.com/play/PLaz3Ms051BAkgmoRZEcGFvQzY4YW_SR8b.html

  • @ashishvats1515
    @ashishvats1515 Рік тому

    could you please do this with apache beam…. jdbc source to Bigquery …. or you help me in this… i really need this kind of information

  • @KayChannel23
    @KayChannel23 8 місяців тому

    Thanks for this video, is there a video on how to do these runs on SQL server, pgadmin or Athena ?

    • @BiInsightsInc
      @BiInsightsInc  8 місяців тому

      Here is the link to the video in the series that runs data quality test against sql server.
      ua-cam.com/video/7FPksG-LYOA/v-deo.html
      Here is the link to the series: ua-cam.com/play/PLaz3Ms051BAkgmoRZEcGFvQzY4YW_SR8b.html

  • @bharamkarvivek4632
    @bharamkarvivek4632 Рік тому

    Thanks for such important info.
    How to automate these test cases?

    • @BiInsightsInc
      @BiInsightsInc  Рік тому

      You can embed these tests in your Data Pipeline, below is an example. Once you schedule it via an orchestrator then these tests will run each time your pipeline is triggered. You can use any tool like Airflow, Dagsters, Prefect or cron to schedule Python based pipelines.
      ua-cam.com/video/7FPksG-LYOA/v-deo.html&ab_channel=BIInsightsInc
      Airflow: ua-cam.com/video/eZfD6x9FJ4E/v-deo.html&ab_channel=BIInsightsInc
      Dagster: ua-cam.com/video/f1TbVGdhmYg/v-deo.html&ab_channel=BIInsightsInc

  • @kiranpatil4968
    @kiranpatil4968 Рік тому

    Please make video on etl automation testing from scratch and make seperate playlists

    • @BiInsightsInc
      @BiInsightsInc  Рік тому

      I will try and cover this in the future. In the meantime you can check out the following videos on the testing and automating the ETL pipelines.
      ua-cam.com/video/7FPksG-LYOA/v-deo.html
      ua-cam.com/video/Sv6QWF7J63k/v-deo.html&t
      ua-cam.com/video/7UQ91Ib7PtU/v-deo.html&t
      How to automate Python based ETL pipelines.
      ua-cam.com/video/f1TbVGdhmYg/v-deo.html&t
      ua-cam.com/video/eZfD6x9FJ4E/v-deo.html&t
      ua-cam.com/video/IsuAltPOiEw/v-deo.html

  • @ZarifouDjibril
    @ZarifouDjibril 8 місяців тому

    Very helpul. Thank you.

  • @BillusTinnus
    @BillusTinnus Рік тому

    Great video, thanks

  • @SP-db6sh
    @SP-db6sh Рік тому

    How to add a logger to it with Tqdm progress bar

    • @BiInsightsInc
      @BiInsightsInc  Рік тому

      If you want to log the test for review or sharing then check out the next video. I haven't played around with Tqdm but here is there docs and implementation. Maybe in the future I will implement this in a project.
      github.com/tqdm/tqdm

  • @lalalf4535
    @lalalf4535 Рік тому

    Function test_null_check(df) will always return passed

    • @BiInsightsInc
      @BiInsightsInc  Рік тому

      Thanks for spotting this. I have updated the code base. You can use the following assertion.
      # check for nulls
      def test_null_check(df):
      assert df['ProductKey'].notnull().all()

    • @lalalf4535
      @lalalf4535 Рік тому

      @@BiInsightsInc Thank you. Your content is very useful.

  • @muddashir
    @muddashir Рік тому

    Thanks

  • @robertclayton3189
    @robertclayton3189 Рік тому

    Video resolution is poor.

  • @dmunagala
    @dmunagala 5 місяців тому

    def test_Genre_dtype_str(df):
    assert (df["Genre"].dtype == str or df["Genre"].dtype == 'O')
    This test case is always returned Pass

    • @BiInsightsInc
      @BiInsightsInc  5 місяців тому

      If the data type of this column is string or object then it will be pass. If you have datatype of Int or float then it will fail. You can also remove the "O" and test for string if that's the objective. Here is an example of this test with int.
      github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/Pytest/Session%20one/string%20and%20object%20test%20result.png

    • @dmunagala
      @dmunagala 5 місяців тому

      @@BiInsightsIncThanks for responding.
      When I have the column value as 1, which is int below assertion is passing. I tried to remove "O" and then it's failing but it fails even if the data type is string.
      assert (df["Genre"].dtype == str or df["Genre"].dtype == 'O')

    • @BiInsightsInc
      @BiInsightsInc  5 місяців тому

      @@dmunagala you need to check the data type. Value might be 1 but it can be stored as string. Check my previous comment I have link to this test and it’s failing with int data type.

    • @dmunagala
      @dmunagala 5 місяців тому

      @@BiInsightsInc Yes, you are right. I checked the datatype by using, df.info() and got to know the exact datatypes for all columns in my csv file. It is working as expected. Thank you so much for your help, you are amazing!!

  • @vivek2319
    @vivek2319 Рік тому

    Thanks