How to test your Data Pipelines with Great Expectations

Поділитися
Вставка
  • Опубліковано 3 січ 2025

КОМЕНТАРІ • 29

  • @fcastellano
    @fcastellano Рік тому +4

    this is the best basic tutorial of this tool i've been able to find, you have everything one would need to start, in a digestable way. thanks

  • @bralabala
    @bralabala Рік тому +1

    this is gradually becoming my favourite channel.

  • @GuilhermeMendesG
    @GuilhermeMendesG Рік тому

    What an awesome video. Thank you.

  • @ManuelPortela-g2y
    @ManuelPortela-g2y 3 місяці тому

    Excellent video.

  • @nizar8073
    @nizar8073 Рік тому

    I love your videos man
    Try publishing them more on subreddits like r/datascience and r/dataengineering

    • @BiInsightsInc
      @BiInsightsInc  Рік тому

      Thanks for the tip. Will publish on subreddit in the future.

  • @vaibhavoberoi7
    @vaibhavoberoi7 Рік тому

    Awesome Video. Subscribed for more!

  • @palermodpr
    @palermodpr Рік тому

    Is there a way to create the rules/tests automatically? Is there a better way to visualise a summary of tests?

    • @BiInsightsInc
      @BiInsightsInc  Рік тому +1

      GE rules are already created; you need to apply them to your data. If you want to auto apply these rules then create a custom suite and it will apply a few rules automatically. However, these will be generic and for more control you would want to apply rules based on your needs. Custom Suite also offers a better visual experience via HTML.

  • @vishalkoundal5008
    @vishalkoundal5008 Рік тому

    From where we can refere the videos related to pytest framwork set up to validate data kindly help with the video link?

    • @BiInsightsInc
      @BiInsightsInc  Рік тому

      Please check the description. Link to the PyTest video is available there.

  • @sirajansari2848
    @sirajansari2848 Рік тому

    Good explanation.
    But you need to update the video. "get_expectations_config()" is no longer available in great expectations framework. I used your code for running my checks, and it failed in the "get_expectations_config()" step.

    • @BiInsightsInc
      @BiInsightsInc  Рік тому

      What version are you using? Anyways, you can try the "get_expectation_suite" that's the updated function.
      expectation_suite = data_assistant_result.get_expectation_suite(
      expectation_suite_name=expectation_suite_name
      )

  • @swagatdash91
    @swagatdash91 Рік тому

    Can you please make a video on how to create custom expectation using query or anything. Then how to apply that for DQ

    • @BiInsightsInc
      @BiInsightsInc  Рік тому

      Hey Sawgat, this is a custom expectation and I will cover the custom expectations suite with this library in the future.

  • @Sreenu1523
    @Sreenu1523 Рік тому

    This is another master peace video.
    I am struggling on below scenarios , if possible could you please explain in your upcoming videos
    1. How to read latest file . Suppose my source folder contain many files, i want read only latest file
    2. I want create a Python script to read, process and load the data into db table when file arrived in source folder

    • @BiInsightsInc
      @BiInsightsInc  Рік тому

      Thanks and will create a video on your suggested topic. Stay tuned.

  • @dileepkumar-dk5kc
    @dileepkumar-dk5kc 9 місяців тому

    Thanks for the video. What if I have composite primary key will the “expect_column_values_to_be_unique” work?
    Its failing for me. so could you help me on this?

    • @BiInsightsInc
      @BiInsightsInc  9 місяців тому

      Thanks for stopping by. You can try the “expect_compound_columns_to_be_unique” in above scenario.

  • @Nicolas-uh9pf
    @Nicolas-uh9pf 9 місяців тому

    can you have custom expactations?

    • @BiInsightsInc
      @BiInsightsInc  9 місяців тому +1

      Yes, you can build and manage custom expectations for your use case. Here is the docs on custom expectations: docs.greatexpectations.io/docs/oss/guides/expectations/custom_expectations_lp/

  • @Memes_uploader
    @Memes_uploader Рік тому

    Did they change the structure? Why did not you talked about validator, checkpoints? I have stuck here: import great_expectations as gx
    context = gx.get_context()
    validator = context.sources.pandas_default.read_csv(
    "data.csv"
    )
    validator.validate(expectation_suite="config.json")
    checkpoint = context.add_or_update_checkpoint(
    name="my_quickstart_checkpoint",
    validator=validator,
    )
    checkpoint_result = checkpoint.run()
    context.view_validation_result(checkpoint_result)
    ERROR: great_expectations.exceptions.exceptions.DataContextError: expectation_suite default not found

    • @BiInsightsInc
      @BiInsightsInc  Рік тому +1

      They might have, please check their docs. On the error front, you are not providing the correct directory for the config.json. It is not able to locate the expectation_suite that you are providing.

  • @balakrishnaprasad8928
    @balakrishnaprasad8928 Рік тому

    Please create end to end python projects for Data Analyst

    • @BiInsightsInc
      @BiInsightsInc  Рік тому

      Will try and create an end to end project.

  • @VB-rf2fv
    @VB-rf2fv 10 місяців тому

    Hi, Can we test csv file data with database table with some expectation?

    • @BiInsightsInc
      @BiInsightsInc  10 місяців тому

      We are testing a csv file data in this session with Great Expecations. In the upcoming video we will test database table using Great Expecations in an Airflow Dag.

  • @dbrx4
    @dbrx4 9 місяців тому

    Why every tutorial is using local client; its a very unusual case because in real world you have to load all the data into memory to execute this quality analysis

    • @BiInsightsInc
      @BiInsightsInc  9 місяців тому +2

      You can take the code and concepts and replicate it for your use case on a server or cloud environment.