Hey Sonu - this is one of first YT tutorials with a thorough explanation I've seen in a while. I got this running the first time 'out of the box' ; it did ask me to pip install ctransformers, but after that it came up just fine. I am going to experiment with other documents. Some people don't like to sit through writing code, but it's good for us! Especially when you mention other tools we could try and why you picked what you use. Excellent!
Thanks for the great tutorial. It is really helpful. A hint for anyone stuck with some errors in model.py, here are some fixes (original -> fix): chain = cl.user_session.set("chain")-> chain = cl.user_session.get("chain") res = await chain.acall(message, callables=[cb]) -> res = await chain.acall(message.content, callbacks=[cb])
another addition - under #QA model function, change -> db = FAISS.load_local(DB_FAISS_PATH, embeddings) ------> db = FAISS.load_local(DB_FAISS_PATH, embeddings,allow_dangerous_deserialization=True)
Hey, somehow ended up on this extremely underrated channel and I gotta say I love it!, I loved each and every part of this tutorial, something I was looking for quite some days now. Thank you so much, deffo subscribing and looking forward for such content. Regards, Anas from Pakistan
Thank you man so much! I am very grateful for your content. I appreciate your passion for open source ai and your teachings are helping bring this technology into my reach. I was so happy when this ran! :) Excited to see your future videos.
Just wanted to drop in and say congrats on your UA-cam tutorial! 🎉🎥 Seriously, I'm so impressed with your content! Keep up the fantastic work! Best wishes, Rafael from Belgium
Hey can you help me i am getting an error "ERROR: [Errno 10048] error while attempting to bind on address ('0.0.0.0', 8000): only one usage of each socket address (protocol/network address/port) is normally permitted" how can i fix that
Hello from Portugal! Thanks for your video, Sir. Could you make a follow up video on how to run it on GPU? As you see there are many viewers interested on it. Being a non-programmer, it would be nice to see a video showing what and where to change on the code. I was able to follow this video and make it work eventhough i don't know coding at all, so i believe you would generate a great video for GPU usage too. Maybe something like a follow up video. Thanks, Sir!
Appreciate the great work! Most of the tutorials out there are just trying these LLMs on Colab notebooks, makes you eager for more. Would appreciate if you can also cover the deployment part, thank you :)
This was a really well put together Tutorial thank you so much. Just one question what all needs to change to run this on GPU instead of CPU. Thank you so much for your time. Keep up the awesome work!!!!
Nice video and great learning. Liked your confidence and knowledge. Going to build this bot on over the weekend and hopefully should be a breeze by looking into your code base and video.
Thank you very much for this great tutorial. I have an error and I am struggling to solve, maybe could you help me? The error is: ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'") and I have already tried to uninstall, downgrade, and upgrade versions of faiss-cpu but it does not solve this issue.
It would be nice to see how we could actually stream the responses. Also, that quantized version you are using is old, the new quantized versions that have a "K" are better.
In the video at 4:39 you have run a command in the terminal. Where should we run it in the windows Operating system. How to do that please give an explanation.
Thank you for your detailed explanation. Your classes are quite interesting and are building confidence to move further forward. I need some suggestions: I saw a medical chatbot using Llama 2 on a CPU machine, which was all open source. Similarly, I need to build an image-to-text multimodal model on a CPU using all open-source tools. Please provide your suggestions.
Good work, How to get a stream response like chatGpt and output the stream word by word as soon as we get. If possible reply with a code example of current video.
Hii , I tried this project but when running the same code in chainlit getting tuple out of index and UserSession.set() missing 1 required positional argument: 'value', issue , can anyone help plzzz
Amazing video thank you, I had a question. 1. Unable to retrieve the answers for the question for the content out of the pdf, if we want to get a ans from pdf if not found then from pretrained model. how to configure it.
Can you tell me sir , what should I learn become like you means , tell me all thing you learn in python and ml , so I also started learning deeply about that. I can able to create ml app but not like you ,so please tell me all things you learned
Thank you for the efforts to explain this in very simple way. Am new to LLM's. Tried your GitHub code, When am asking the question it gives the error "Async generation not implemented for this LLM." Could you please help with a workaround.
Thank you! If you have a good enough VRAM (GPU)... You can use original Llama2 weights and load it using Transformers to run the model. Look at the text generation interface on Huggingface.
f I ask questions that are not related to the data I provided, it still answers my questions (for example, I asked the chatbot about Virat Kohli, and it gave me a correct response). It should not give answers that are not present in the data
Fantastic questions. Let me answer it..... Harrison chase and team has done a great job with Langchain but atm they aren't enterprise ready: 1. It's an arbitrary code execution. Prone to prompt injections. 2. Edge cases issues are identified with integrations. 3. High compute costs due to CPU and memory spikes. 4. Many other vulnerabilities Let's give Harrison some time on it. Many other developments are happening. Stay tuned.
Interesting! I think Google is on it to build it first.... 2024 release! Heard it from a few community friends. Let me know if you are thinking around these lines. It's an ambitious task at the moment. Alone we can't do, we need a team to work on it.
Awesome asusual, But the thing is Users who is gonna use such type of chatbots, Dont wanna see that all Info. I think you got me, It should be Simple and looks beautiful then ppl try to use that. So, Unnecessary Information is not useful for the ppl who dont know how this works, They only want to Type the query and get the response, Thats it. I hope you do keep in your mind for the next tutorials. Yohr doing GREAT BUDDY 🔥🔥🔥🙌🏻🙌🏻🙌🏻🙌🏻✨
Thanx for the work, But that code need correction. Most probably you ran the original reference code during output and not the one you coded in video . 1. 44:11 That need to be *chain = cl.user_session.get("chain")* in *async def main(message):* and no *set* . 2. 40:20 How is *final_result(query)* called in your code? Is it just for example? 🤔 💭
Looks like a great catch, Abhishek. Can you please open a PR on the GitHub repo on this one? I have to check the code base again. Will make the necessary changes. Thanks for your comment 🙂
@AIAnytime perfect one! As we saw responses took upto 2 mins. LLMChain primarily consumed all the time. How we can further tune it to speed up while staying on cpu (with respect to both hardware specs or some parameter config/code tuning that can speed up the replies). And what's a good hardware config to run this solution to get chatgpt like responses.
Thank you for the very instructive video. I should bring to your attention (for the benefit of your subscribers, future and present) that there is an important note regarding ggml files and I quote from HuggingFace: "Important note regarding GGML files. The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models. Third party clients and libraries are expected to still support it for a time, but many may also drop support. Please use the GGUF models instead." My question is: How does this note affect your instructions in this video? Any code changes? Thank you.
hii. so while i am on venv and trying to install the requirements.txt, facing issues for installing torch in mac m2 air. Is there any solution for the same?
Fantastic information and super useful and a big time saver. Thank You. For anyone who may have had with an error relating to string replacement, change the chain async call like so: res = await chain.acall(message.content, callbacks=[cb]) I am curious how to prevent context length for the transformer going beyond 512 tokens.
Hi sir, thank you so much for the video we are looking for the same type of video. ( I have one request- can you please make a video for data extraction from different types of invoice data with the help of open source model or libraries.)
I’ve seen every video about embeddings, but no one talks about how do you update the embeddings? For example, you created an embedding of a document which had a text ‘our stores open at 11am’ now the document is updated to reflect ‘our stores open at 10am’ How do you update the embeddings? Do you delete the old document store and re-generate everything? (too much just to reflect 1 line change) Or what is the solution to update this specific embedding? Because if we just add this line, it’ll conflict with the previous existing embedding. (it can pick old line from top k). This is bad for production. What’s the solution? +You earned a sub :)
Great job dude non technical person can also understand your explaination. Thanks n respect for sharing the open source Ai. I have one question how i can restrict this chatbot not to answer any question outside of the document/PDF. For example if i ask chatbot what is python then it is giving the answer but this information is not present in PDF. How i can restrict it and make it only PDF specific bot?
this is amazing tutotial thank you sir .But unfortunately I have got error "Session disconnected" Plese sir how and why it happened explain it you fan form Pakistan❤❤❤
Hello this is a very helpful video to me since i was working on something similar i have a small query does it only answer questions like "what" and " how " questions ? Like for it to be a chatbot it needs to have a conversation with the user right , for example if i say " I have certain symptoms" , what does it generate? A reply would be helpful for me
Thank you for this, but can u make a similar bot which not only gives response with text but rich media(like images,gif,links) etc. Just like how u create embeddings on the text can u do embedding on images in pdf.Would love to see ur video on this
Nice video! As an LLM newbie I might be being too optimistic trying to run this with the Llama 2 quantized model llama-2-7b-chat.Q4_K_M.gguf on a cpu with only 8GB RAM nominal. The Chainlit page loads and after entering a question after a while it appears to timeout with the message "Could not reach the server".
Yeah you need to use a multimodal embedding model. If it's images, sentence Transformers still works but how do you return it in the response is something, you might need to look.
I install ctransformer but give error ,try to resolve lot of thing using bard but unable to solve this issue please give me some suggestions.Thank u so much for this amazing content
Hi sir, can you please tell me when it available for production and for deployment in different cloud provider, can you please make a video for deployment.
Coding isn't really my forte, but I gave this project a shot. Unfortunately, I encountered numerous errors towards the end because I'm unfamiliar with setting things up from scratch. The tutorial seems tailored for professionals; what about beginners who want to embark on such projects?
Sad to hear that you got error. Meanwhile you can check a similar video here that i created ua-cam.com/video/rIV1EseKwU4/v-deo.html .. create an issue on GitHub projects for any bugs. I will help debug that ...
Amazing Video SIr . Just had a question , i want to make it fast like chatgpt , so i changed the model_kwargs={'device': 'cpu'} to model_kwargs={'device': 'cuda'} , but it is taking the same exact time any idea how i can do changes in code to make it fast responsive
@@AIAnytime yes sir , i did install cuda and installed the torch cuda version for it , changed all the values of cpu to cuda in the code ,model_kwargs={'device': 'cuda'} but after running the code still only the cpu is getting utilized when saw in the task manager
You have to pass cuda in CTransformers not in model kwargs for sentence Transformers. In CTransformers, you have to pass GPU layers. Look at CTransformers GitHub for more info.
Hi plese can you share the implementation of integrating memory in this so as to remember context chat history on follow up questions. Thanks in advance.
@AIAnytime Hi ,I have used it, but I wanted to use RetrievalQAWithSourcesChain in place of that, I am facing error with same.Also when I am using Conversational Retrieval Chain with gpt 3.5 turbo llm and openai embedding, though it is able to preserve memory and answer follow up question(but not that good as sometime current prompt get drastically rephrased into entirely different prompt) , but also it is giving me answers from its own knowledge base, for e.g. say I have uploaded a document related to Medical, and then I ask Who is Sachin Tendulkar, it still answers. I want model to say "Question out of Context in those cases". See if you can help it and make a tutorial on that. A chainlit interface where we upload pdf, ask related questions and follow up questions, while discarding question out of document scope.
Is there any way of using embedding model from sentence_transformer/ all-minilm-l6-v2 by downloading it into our local system rather than directly using from huggingface?
Very good content, it is very helpful on a Mac these developments cannot be run due to the GPUs. However, I understand that in Google Colab it could be carried out, right?
What are the specs of the hardware you are using in the video for reference so we know if our machine is going to do better or worse? .. 16gb Ram, ??? CPU
I suggest You Sir If Your code is not working properly and you have missing user_session.set () and aother information time there also you have used user_session.set("chain) there need user_session.get("chain") as well you have empty context there also have issue becouse there you did't mention the context in curly brackets as well i am getting issue in Message 'Message' object has no attribute 'replace' bot response i am geeting this suggest me where i mistake
hi sir, i have tried this program but when i am running the chainlit application, 30 seconds after a query the server is getting disconnected. i have also searched but couldn't figure out what might be the reason?
Thank you so much for the video genuinely I learned something from this 1 hour , Just one question for GPU we have to change just cpu to gpu or any other package to be updated. Once again great video
Hey Sonu - this is one of first YT tutorials with a thorough explanation I've seen in a while. I got this running the first time 'out of the box' ; it did ask me to pip install ctransformers, but after that it came up just fine. I am going to experiment with other documents. Some people don't like to sit through writing code, but it's good for us! Especially when you mention other tools we could try and why you picked what you use. Excellent!
Thanks for the great tutorial. It is really helpful.
A hint for anyone stuck with some errors in model.py, here are some fixes (original -> fix):
chain = cl.user_session.set("chain")-> chain = cl.user_session.get("chain")
res = await chain.acall(message, callables=[cb]) -> res = await chain.acall(message.content, callbacks=[cb])
thanks man
thank you,
another addition - under #QA model function, change -> db = FAISS.load_local(DB_FAISS_PATH, embeddings) ------> db = FAISS.load_local(DB_FAISS_PATH, embeddings,allow_dangerous_deserialization=True)
Wow. You packed a lot here - very helpful, thanks.
Glad it was helpful! Thank you 🙏
Hey, somehow ended up on this extremely underrated channel and I gotta say I love it!, I loved each and every part of this tutorial, something I was looking for quite some days now. Thank you so much, deffo subscribing and looking forward for such content.
Regards,
Anas from Pakistan
Loved the comment. Thanks
Thank you man so much! I am very grateful for your content. I appreciate your passion for open source ai and your teachings are helping bring this technology into my reach. I was so happy when this ran! :) Excited to see your future videos.
Glad you like them! I have many videos already. More coming soon. Pls stay tuned 🙏
you did semantic search and no finetuning involved . is this accurate ?
Just wanted to drop in and say congrats on your UA-cam tutorial! 🎉🎥
Seriously, I'm so impressed with your content! Keep up the fantastic work!
Best wishes,
Rafael from Belgium
Hi Rafael, thanks for your lovely comment. Let's connect if you feel like..... Best, Sonu!!
What a fantastic video! Probably the only one that goes into complete details!
Glad you liked it!
Hey can you help me i am getting an error "ERROR: [Errno 10048] error while attempting to bind on address ('0.0.0.0', 8000): only one usage of each socket address (protocol/network address/port) is normally permitted"
how can i fix that
Hello from Portugal! Thanks for your video, Sir. Could you make a follow up video on how to run it on GPU? As you see there are many viewers interested on it. Being a non-programmer, it would be nice to see a video showing what and where to change on the code. I was able to follow this video and make it work eventhough i don't know coding at all, so i believe you would generate a great video for GPU usage too. Maybe something like a follow up video. Thanks, Sir!
Hi Pedro, thanks for your lovely comment! I will create a video soon for the GPU as well. Stay tuned....
@@AIAnytime thanks. Looking forward to it
Simply amazing ! this video can help a lot to,who wants to start working with LlaMa 2. thanks for sharing this.
Glad it was helpful! Please consider subscribing if you like other videos as well.
Amazing video, you have saved the time of a lot of people. Keep up the excellent work.
Glad it helped... plz look at the LLM playlist.
Thanks!!! Great presentation, super useful, amazing that you had the energy to do this while sick : )
Glad you enjoyed it!
Thank you for a smart and precise explanation of such a difficult topic
Fantastic video! A 1080p quality video would make the watching/learning experience much better. Just a candid suggestion.
Thanks for the tip! My recent videos have improved. Share your feedback on those if you have any.
Appreciate the great work!
Most of the tutorials out there are just trying these LLMs on Colab notebooks, makes you eager for more.
Would appreciate if you can also cover the deployment part, thank you :)
Glad you like them! There are a few deployment videos on my channel. Please check out.
You are incredible professor!!! Thank you so much for your tutorial, i got very good insights. Best regards for you
This was a really well put together Tutorial thank you so much. Just one question what all needs to change to run this on GPU instead of CPU. Thank you so much for your time. Keep up the awesome work!!!!
Pick a GPU LLM model from the bloke instead of CPU model. Usually GPU models have GPTQ in their name
Nice video and great learning. Liked your confidence and knowledge. Going to build this bot on over the weekend and hopefully should be a breeze by looking into your code base and video.
Glad it was helpful! Thanks.
bro what is your pc specs? and plz tell minimum system requirements for deplying llama on a computer
@@AIAnytime
@@AIAnytime how to make it run on gpu too ??
Amazing video thank you - I wanted to build similar chatbot based on open source model - now it will be easer to do it.
Thank you for your comment! As I am new on YT, your support can help me grow and creat more such videos.
May be you are new but not for long time. Sooner your such videos are going to Rock@@AIAnytime
Thank you very much for this great tutorial.
I have an error and I am struggling to solve, maybe could you help me?
The error is:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
and I have already tried to uninstall, downgrade, and upgrade versions of faiss-cpu but it does not solve this issue.
It would be nice to see how we could actually stream the responses. Also, that quantized version you are using is old, the new quantized versions that have a "K" are better.
how do you know did you use it?
Hi Sir, thank you so much for the tutorial. Do you know how to enable GPU support for this model ?
The best channel for LLMs.. thanks
In the video at 4:39 you have run a command in the terminal. Where should we run it in the windows Operating system. How to do that please give an explanation.
Great video! thank you for sharing your expertise. Keep up the good work!
Thanks, It's very useful. Upload more videos like that
Thanks for your comment! Please check my LLM playlist.
Thank you for your detailed explanation. Your classes are quite interesting and are building confidence to move further forward. I need some suggestions: I saw a medical chatbot using Llama 2 on a CPU machine, which was all open source. Similarly, I need to build an image-to-text multimodal model on a CPU using all open-source tools. Please provide your suggestions.
Have you made a follow-on video showing how to incorporate GPU acceleration (CUDA for Nvdia) into your codebase?
can i do this in colab and how can i do it in colab
Good work, How to get a stream response like chatGpt and output the stream word by word as soon as we get. If possible reply with a code example of current video.
Hii , I tried this project but when running the same code in chainlit getting tuple out of index and UserSession.set() missing 1 required positional argument: 'value', issue , can anyone help plzzz
Well done and appreciate the efforts. You have made my weekend interesting !
Glad to hear that! Please subscribe and check out the other videos too.
@@AIAnytime Sure, Thanks. Got struck at "could not reach server". 😞
i need help to run this in my system ... can anybody help ??? URGENTT!
Amazing! This pipeline doesn't work well with CSV files though. Could you make a video explaining how to use csvs with these open-source models?
Great suggestion! Will come up with something...
@@AIAnytimeCould you please suggest videos or websites I could use to create a csv chatbot using llama?
Hi nysa, find this: ua-cam.com/video/MUADZ97GgZA/v-deo.html
Hello, setting the chunk size to 500 may exceed the token limit of sentence transformers, which is by default 128 tokens.
Hi Cheng, that's a good point. Sentence Transformers do have a truncation strategy in place after 128 tokens. But yes i agree with you.
Amazing video thank you,
I had a question.
1. Unable to retrieve the answers for the question for the content out of the pdf, if we want to get a ans from pdf if not found then from pretrained model. how to configure it.
Can you tell me sir , what should I learn become like you means , tell me all thing you learn in python and ml , so I also started learning deeply about that. I can able to create ml app but not like you ,so please tell me all things you learned
Can you tell where can we get llama2demo code file
Thank you for the efforts to explain this in very simple way.
Am new to LLM's. Tried your GitHub code, When am asking the question it gives the error "Async generation not implemented for this LLM." Could you please help with a workaround.
Loved the content, it was beautifully explained. Thank you :)
Glad it helped!
Awesome, job. Could you please provide instructions or make a video to run Llama 2 for custom data via GPU? Thanks.
Thank you! If you have a good enough VRAM (GPU)... You can use original Llama2 weights and load it using Transformers to run the model. Look at the text generation interface on Huggingface.
@@AIAnytime Thanks. Could you please provide steps to get the GPU working with the code you showed in your video. Thank you.
Works ! asked how to solve pollen allergy ? bot was able to point source and page and information , Great !
f I ask questions that are not related to the data I provided, it still answers my questions (for example, I asked the chatbot about Virat Kohli, and it gave me a correct response). It should not give answers that are not present in the data
Amazing! Can you explain the problems with Langchain in production and provide alternatives for Langchain?
Fantastic questions. Let me answer it..... Harrison chase and team has done a great job with Langchain but atm they aren't enterprise ready:
1. It's an arbitrary code execution. Prone to prompt injections.
2. Edge cases issues are identified with integrations.
3. High compute costs due to CPU and memory spikes.
4. Many other vulnerabilities
Let's give Harrison some time on it.
Many other developments are happening. Stay tuned.
Great tutorial am looking to learning this skills as soon to take new role
You can do it! Best of luck.....
Wow... This is what I was looking for 😇
If I upload multiple PDF and run the chatbot, will it answer my query from searching all over the PDF's?
Absolutely yes. That's how it works.
Thank You For create this video .This video was relly help full 😃
Why is everyone using AI for chatbots when its capable of so much more. I would love to see a LLM operating system. Lets think big
Interesting! I think Google is on it to build it first.... 2024 release! Heard it from a few community friends. Let me know if you are thinking around these lines. It's an ambitious task at the moment. Alone we can't do, we need a team to work on it.
Awesome asusual, But the thing is Users who is gonna use such type of chatbots, Dont wanna see that all Info. I think you got me, It should be Simple and looks beautiful then ppl try to use that. So, Unnecessary Information is not useful for the ppl who dont know how this works, They only want to Type the query and get the response, Thats it. I hope you do keep in your mind for the next tutorials. Yohr doing GREAT BUDDY 🔥🔥🔥🙌🏻🙌🏻🙌🏻🙌🏻✨
Wow thanks for this video... Really helpful
Glad it was helpful!
awesome work. When i run , i am getting below error.
NotImplementedError: Async generation not implemented for this LLM. Please advise on this
hey, I got the same problem and I didnt find a solution, did you solve it?
Thanks, Open source AI Advocate
Thanx for the work, But that code need correction. Most probably you ran the original reference code during output and not the one you coded in video .
1. 44:11 That need to be *chain = cl.user_session.get("chain")* in *async def main(message):* and no *set* .
2. 40:20 How is *final_result(query)* called in your code? Is it just for example? 🤔 💭
Looks like a great catch, Abhishek. Can you please open a PR on the GitHub repo on this one? I have to check the code base again. Will make the necessary changes. Thanks for your comment 🙂
Many thanks for a great video. Fantastic tutorial!
Glad it was helpful!
@AIAnytime perfect one! As we saw responses took upto 2 mins. LLMChain primarily consumed all the time. How we can further tune it to speed up while staying on cpu (with respect to both hardware specs or some parameter config/code tuning that can speed up the replies). And what's a good hardware config to run this solution to get chatgpt like responses.
Thank you very much sir amazing video, very knowledgeable amazing teaching ❤
Thanks and welcome
Thank you for the very instructive video. I should bring to your attention (for the benefit of your subscribers, future and present) that there is an important note regarding ggml files and I quote from HuggingFace:
"Important note regarding GGML files.
The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models. Third party clients and libraries are expected to still support it for a time, but many may also drop support.
Please use the GGUF models instead."
My question is: How does this note affect your instructions in this video? Any code changes? Thank you.
Great video, I have a question for you, what model can I use to do it in Spanish, or does it work with the same one?
hii. so while i am on venv and trying to install the requirements.txt, facing issues for installing torch in mac m2 air. Is there any solution for the same?
Fantastic information and super useful and a big time saver. Thank You.
For anyone who may have had with an error relating to string replacement, change the chain async call like so:
res = await chain.acall(message.content, callbacks=[cb])
I am curious how to prevent context length for the transformer going beyond 512 tokens.
You can use the n_context. Use llama32k model by together computer on Huggingface.
@@AIAnytimeAwesome thank you.
Thank you for the video and I learn a lot from you.
Glad to hear that!
Hi sir, thank you so much for the video we are looking for the same type of video. ( I have one request- can you please make a video for data extraction from different types of invoice data with the help of open source model or libraries.)
Can you please share the version of each package in the description?
I’ve seen every video about embeddings, but no one talks about how do you update the embeddings?
For example, you created an embedding of a document which had a text ‘our stores open at 11am’
now the document is updated to reflect ‘our stores open at 10am’
How do you update the embeddings?
Do you delete the old document store and re-generate everything? (too much just to reflect 1 line change)
Or what is the solution to update this specific embedding?
Because if we just add this line, it’ll conflict with the previous existing embedding. (it can pick old line from top k). This is bad for production.
What’s the solution?
+You earned a sub :)
Great job dude non technical person can also understand your explaination. Thanks n respect for sharing the open source Ai. I have one question how i can restrict this chatbot not to answer any question outside of the document/PDF. For example if i ask chatbot what is python then it is giving the answer but this information is not present in PDF. How i can restrict it and make it only PDF specific bot?
this is amazing tutotial thank you sir .But unfortunately I have got error "Session disconnected" Plese sir how and why it happened explain it you fan form Pakistan❤❤❤
Beautifully explained each step, Would you like to confirm, what GPU is best for llama-2 (7B and 13B) model on PC/Laptop.
Get anything which has 24GB VRAM if that's in your budget.
Hello this is a very helpful video to me since i was working on something similar
i have a small query
does it only answer questions like "what" and " how " questions ? Like for it to be a chatbot it needs to have a conversation with the user right , for example if i say " I have certain symptoms" , what does it generate?
A reply would be helpful for me
did u concluded to a decision??
Thank you for this, but can u make a similar bot which not only gives response with text but rich media(like images,gif,links) etc. Just like how u create embeddings on the text can u do embedding on images in pdf.Would love to see ur video on this
Nice video! As an LLM newbie I might be being too optimistic trying to run this with the Llama 2 quantized model llama-2-7b-chat.Q4_K_M.gguf on a cpu with only 8GB RAM nominal. The Chainlit page loads and after entering a question after a while it appears to timeout with the message "Could not reach the server".
hi , for me also facing same issue, have u got any solution?
Great work, this video was really informative.
Glad it was helpful!
Excellent tutorial! Btw is there a way to extract images in the pdf and embed them in the response
Yeah you need to use a multimodal embedding model. If it's images, sentence Transformers still works but how do you return it in the response is something, you might need to look.
Can we say that we finetune this Llama 2 model on our own data like in this example you used a medical book? Am I right?
Awsome content! When is it adecuate to fine tune an llm instead of working or as a complement for the botpress knowledge base?
great tutorial. could you please tell, how to fine tune the model? is it possible?
Amazing video.
Is it possible to add the translation feature to the response using llm model?.if it is possible, can you tell me how to do it.
Really, it was a wonderful video!! Can I train this model in Google Colab or any other cloud GPU's??
I install ctransformer but give error ,try to resolve lot of thing using bard but unable to solve this issue please give me some suggestions.Thank u so much for this amazing content
What error are you getting? Can you post it here?
Hi great video! Can you recommend some conversational open source LLM? I want to do a IT help desk bot with custom data.
Hi, you can use Llama2, MPT, or Falcon. They are Commercially available from the licensing standpoint.
I wonder how you could get it so that you type in a bunch of symptoms and it asks follow up and then gives you possible diagnosis.
Hi sir, can you please tell me when it available for production and for deployment in different cloud provider, can you please make a video for deployment.
Coding isn't really my forte, but I gave this project a shot. Unfortunately, I encountered numerous errors towards the end because I'm unfamiliar with setting things up from scratch. The tutorial seems tailored for professionals; what about beginners who want to embark on such projects?
Sad to hear that you got error. Meanwhile you can check a similar video here that i created ua-cam.com/video/rIV1EseKwU4/v-deo.html .. create an issue on GitHub projects for any bugs. I will help debug that ...
Thanks for the explanation. How can we make this work for text to SQL conversion?
Text to SQL would be pretty easy. You can just inference the model with Langchain Chains. It should work.....
@@AIAnytime Can you please elaborate, so that I can understand...Thanks for giving the prompt reply.
Amazing Video SIr . Just had a question , i want to make it fast like chatgpt , so i changed the model_kwargs={'device': 'cpu'} to model_kwargs={'device': 'cuda'} , but it is taking the same exact time any idea how i can do changes in code to make it fast responsive
Do you cuda installed? And also you have to use cuda to run the LLM not while creating the embeddings?
@@AIAnytime yes sir , i did install cuda and installed the torch cuda version for it , changed all the values of cpu to cuda in the code ,model_kwargs={'device': 'cuda'} but after running the code still only the cpu is getting utilized when saw in the task manager
You have to pass cuda in CTransformers not in model kwargs for sentence Transformers. In CTransformers, you have to pass GPU layers. Look at CTransformers GitHub for more info.
@@AIAnytime yup did add gpu_layer=8, in the ctransformers but did not help in reducing the latency , still only cpu and ram tends to be working
Hi plese can you share the implementation of integrating memory in this so as to remember context chat history on follow up questions. Thanks in advance.
You can use the Conversational Retrieval chain. With a memorization that comes with it. Plz look at Langchain documentation.
@AIAnytime Hi ,I have used it, but I wanted to use RetrievalQAWithSourcesChain in place of that, I am facing error with same.Also when I am using Conversational Retrieval Chain with gpt 3.5 turbo llm and openai embedding, though it is able to preserve memory and answer follow up question(but not that good as sometime current prompt get drastically rephrased into entirely different prompt) , but also it is giving me answers from its own knowledge base, for e.g. say I have uploaded a document related to Medical, and then I ask Who is Sachin Tendulkar, it still answers. I want model to say "Question out of Context in those cases". See if you can help it and make a tutorial on that. A chainlit interface where we upload pdf, ask related questions and follow up questions, while discarding question out of document scope.
Thanks
I have a question ? Why it's impossible to load llama 2 based models with llamaCpp from langchain ? Thanks for your video. It helped me.
why is it slow in generating response , how can we improve it
The best developers have purple themes.
Is there any way of using embedding model from sentence_transformer/ all-minilm-l6-v2 by downloading it into our local system rather than directly using from huggingface?
What an amazing video... Thank you.
Very good content, it is very helpful
on a Mac these developments cannot be run due to the GPUs. However, I understand that in Google Colab it could be carried out, right?
Can u share the documentation of this project
@AI Anytime How do I remove the source information? I just want to keep the output
You just have to make return source documents = False.
Hey! Thanks for the video. I am running it on a Macbook Pro with M2 chip but it is taking ages for even a single response to come in. Any suggestions?
What are the specs of the hardware you are using in the video for reference so we know if our machine is going to do better or worse? .. 16gb Ram, ??? CPU
I5 Intel, 16 GB RAM, 512 GB SSD.
I got Number of tokens (1659) exceeded maximum context length (512) and it gets frozen...
I suggest You Sir If Your code is not working properly and you have missing user_session.set () and aother information time there also you have used user_session.set("chain) there need user_session.get("chain") as well you have empty context there also have issue becouse there you did't mention the context in curly brackets as well i am getting issue in Message
'Message' object has no attribute 'replace' bot response i am geeting this suggest me where i mistake
hi sir, i have tried this program but when i am running the chainlit application, 30 seconds after a query the server is getting disconnected. i have also searched but couldn't figure out what might be the reason?
Thank you so much for the video genuinely I learned something from this 1 hour , Just one question for GPU we have to change just cpu to gpu or any other package to be updated. Once again great video
Not much of change, use the CUDA kernels instead CPU. Couple of changes ofcourse. You can also use the original model for better performance.