Synthetic DATA Generation using LANGCHAIN 🦜️🔗

Поділитися
Вставка
  • Опубліковано 14 гру 2024

КОМЕНТАРІ • 33

  • @seththunder2077
    @seththunder2077 Рік тому +1

    This is amazing! Can you please try making a more comprehensive version of this and use real data as example (doesnt have to be medical but just so that we can see full procedure)

  • @aanchalrawat
    @aanchalrawat 9 місяців тому

    Really Amazing

  • @saivihari3703
    @saivihari3703 2 місяці тому +1

    Getting Error:PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.PlainValidatorFunctionSchema ({'type': 'with-info', 'function': })

  • @teja3925
    @teja3925 4 місяці тому

    Hello,
    How to generate data when there are two tables and having relationship PK, FK? Does the model is capable enough to generate such data with relation?

  • @m.rr.c.1570
    @m.rr.c.1570 3 місяці тому

    Thankyou this will really help me 👌

  • @sivaprasadatla
    @sivaprasadatla 5 місяців тому

    Please give the approach for synthetic data generation using Azure open AI as i have azure open AI key

    • @datasciencebasics
      @datasciencebasics  5 місяців тому

      Hello, you can quickly use Azure OpenAI by importing Azure OpenAI feom LangChain.
      For ref here is the link -> python.langchain.com/v0.2/docs/integrations/llms/azure_openai/

    • @sivaprasadatla
      @sivaprasadatla 5 місяців тому

      @@datasciencebasics thanks a lot! i will check

  • @Shubhknsha
    @Shubhknsha 9 місяців тому

    Using AzureChatOpenAI instead of ChatOpenAI, It's not working any idea?

  • @orlandocastellanos9263
    @orlandocastellanos9263 Рік тому

    What framework is best for enterprise application, haystak or langchain?

    • @datasciencebasics
      @datasciencebasics  Рік тому

      Haven’t explored Haystack yet so can’t say which one but having knowledge of both might be beneficial !

    • @orlandocastellanos9263
      @orlandocastellanos9263 Рік тому

      @@datasciencebasics thanks for the recommendation but is langchain good enough to work at scale in production?

    • @datasciencebasics
      @datasciencebasics  Рік тому

      It depends what kind of app you want to build and deploy it. Underlying models are the key as Langchain is just the framework. Having said that, this field is still evolving and constant upgrades are necessary.

  • @shitaldhakne-b7p
    @shitaldhakne-b7p 11 місяців тому

    Hi, good video, for multi table data generation with referential integrity can we use Langchain ?

    • @ankit85jain
      @ankit85jain 10 місяців тому

      This video is just the explanation of same example which Langchain has given in documentation. I am also looking for examples of more of real world scenario based data generation.

  • @gamevint
    @gamevint 2 місяці тому

    Can we generate a larger dataset >1000 using this?

    • @datasciencebasics
      @datasciencebasics  2 місяці тому

      I haven’t tried but it might be possible. Give a try!!

  • @sebiraj149
    @sebiraj149 Рік тому

    Could you let me know which version of opening and Langchain used in this video

    • @datasciencebasics
      @datasciencebasics  Рік тому

      I used the latest version when the video was uploaded so you can check the version from this link searching the package (video uploaded on Oct 27)
      pypi.org/

  • @hadikhantec
    @hadikhantec 6 місяців тому

    Thanks! That's a very practical use case. Can you make a full-scale video?

  • @harshadahadawale9533
    @harshadahadawale9533 Рік тому

    I have made application using same code ....getting output parser error while passing sample data to langchain library

  • @nasiksami2351
    @nasiksami2351 7 місяців тому

    Great tutorial! Is there any open-source implementation available of this approach?

  • @Player13.917
    @Player13.917 6 місяців тому

    I am unable to create 2 tier nested json using this example. Can anyone help here?

  • @devyanshrastogi
    @devyanshrastogi Рік тому

    I saw your video about fine tuning Llama 2 on your own data, can you please make a similar video on fine tuning zephyr or mistral 7b on google colab using abhisekh thakur's autotrain and then how to use that fine tuned model?

  • @henkhbit5748
    @henkhbit5748 Рік тому

    interesting video👍 Curious if you have fields that are lookup values and has only 4 different values and after generation the generated values is still valid... Also if you have fields that are made by some algorithm, for example bank number, if its also passed the check constraint for this field after generation based on the few shot examples... And can it also be done using open source llm?

  • @prashantt022
    @prashantt022 Рік тому

    Good content , very helpful , able to advice ?
    If we check statistical correlation between the real and synthetic data , will the % would be above 90 % ?

    • @datasciencebasics
      @datasciencebasics  Рік тому +1

      Personally, haven’t checked it. That would be a good check though before utilizing this in usecases.

  • @ankit85jain
    @ankit85jain Рік тому

    May I request to suggest what other open source models we can use to generate synthetic data?

    • @datasciencebasics
      @datasciencebasics  11 місяців тому

      I haven’t tried myself with other os models. You can try if it works. Also, one thing to notice is how statistically close the synthetic data and real data are.

  • @pseudoartist
    @pseudoartist 8 місяців тому

    dami dai dami