This is amazing! Can you please try making a more comprehensive version of this and use real data as example (doesnt have to be medical but just so that we can see full procedure)
Hello, How to generate data when there are two tables and having relationship PK, FK? Does the model is capable enough to generate such data with relation?
Hello, you can quickly use Azure OpenAI by importing Azure OpenAI feom LangChain. For ref here is the link -> python.langchain.com/v0.2/docs/integrations/llms/azure_openai/
It depends what kind of app you want to build and deploy it. Underlying models are the key as Langchain is just the framework. Having said that, this field is still evolving and constant upgrades are necessary.
This video is just the explanation of same example which Langchain has given in documentation. I am also looking for examples of more of real world scenario based data generation.
I used the latest version when the video was uploaded so you can check the version from this link searching the package (video uploaded on Oct 27) pypi.org/
I saw your video about fine tuning Llama 2 on your own data, can you please make a similar video on fine tuning zephyr or mistral 7b on google colab using abhisekh thakur's autotrain and then how to use that fine tuned model?
interesting video👍 Curious if you have fields that are lookup values and has only 4 different values and after generation the generated values is still valid... Also if you have fields that are made by some algorithm, for example bank number, if its also passed the check constraint for this field after generation based on the few shot examples... And can it also be done using open source llm?
Good content , very helpful , able to advice ? If we check statistical correlation between the real and synthetic data , will the % would be above 90 % ?
I haven’t tried myself with other os models. You can try if it works. Also, one thing to notice is how statistically close the synthetic data and real data are.
This is amazing! Can you please try making a more comprehensive version of this and use real data as example (doesnt have to be medical but just so that we can see full procedure)
Really Amazing
Getting Error:PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.PlainValidatorFunctionSchema ({'type': 'with-info', 'function': })
Did you get any solution?
Hello,
How to generate data when there are two tables and having relationship PK, FK? Does the model is capable enough to generate such data with relation?
Thankyou this will really help me 👌
You are welcome !!
Please give the approach for synthetic data generation using Azure open AI as i have azure open AI key
Hello, you can quickly use Azure OpenAI by importing Azure OpenAI feom LangChain.
For ref here is the link -> python.langchain.com/v0.2/docs/integrations/llms/azure_openai/
@@datasciencebasics thanks a lot! i will check
Using AzureChatOpenAI instead of ChatOpenAI, It's not working any idea?
What framework is best for enterprise application, haystak or langchain?
Haven’t explored Haystack yet so can’t say which one but having knowledge of both might be beneficial !
@@datasciencebasics thanks for the recommendation but is langchain good enough to work at scale in production?
It depends what kind of app you want to build and deploy it. Underlying models are the key as Langchain is just the framework. Having said that, this field is still evolving and constant upgrades are necessary.
Hi, good video, for multi table data generation with referential integrity can we use Langchain ?
This video is just the explanation of same example which Langchain has given in documentation. I am also looking for examples of more of real world scenario based data generation.
Can we generate a larger dataset >1000 using this?
I haven’t tried but it might be possible. Give a try!!
Could you let me know which version of opening and Langchain used in this video
I used the latest version when the video was uploaded so you can check the version from this link searching the package (video uploaded on Oct 27)
pypi.org/
Thanks! That's a very practical use case. Can you make a full-scale video?
I have made application using same code ....getting output parser error while passing sample data to langchain library
Great tutorial! Is there any open-source implementation available of this approach?
I am unable to create 2 tier nested json using this example. Can anyone help here?
I saw your video about fine tuning Llama 2 on your own data, can you please make a similar video on fine tuning zephyr or mistral 7b on google colab using abhisekh thakur's autotrain and then how to use that fine tuned model?
interesting video👍 Curious if you have fields that are lookup values and has only 4 different values and after generation the generated values is still valid... Also if you have fields that are made by some algorithm, for example bank number, if its also passed the check constraint for this field after generation based on the few shot examples... And can it also be done using open source llm?
Good content , very helpful , able to advice ?
If we check statistical correlation between the real and synthetic data , will the % would be above 90 % ?
Personally, haven’t checked it. That would be a good check though before utilizing this in usecases.
May I request to suggest what other open source models we can use to generate synthetic data?
I haven’t tried myself with other os models. You can try if it works. Also, one thing to notice is how statistically close the synthetic data and real data are.
dami dai dami