LLaMA2 with LangChain - Basics | LangChain TUTORIAL
Вставка
- Опубліковано 1 лип 2024
- LLaMA2 with LangChain - Basics | LangChain TUTORIAL
Colab: drp.li/KITmw
Meta website: ai.meta.com/resources/models-...
HuggingFace models: huggingface.co/meta-llama
HF Spaces: huggingface.co/spaces/ysharma...
My Links:
Twitter - / sam_witteveen
Linkedin - / samwitteveen
Github:
github.com/samwit/langchain-t... (updated)
github.com/samwit/llm-tutorials
Timestamps:
00:00 Intro
04:47 Translation English to French
05:40 Summarization of an Article
07:08 Simple Chatbot
10:38 Chatbot using LLaMA-13B Model - Наука та технологія
Thank you so much for taking the time to explain the LangChain concepts with the Llama model. It was very helpful and I appreciate your effort in making this information accessible :)
Sam, thanks for doing these videos. It really helps getting proof of concept work out much more quickly. This technology will change the world as you are aware. Thanks for your help and guidance ❤.
From one nerd to another.
This (and the earlier post on prompt tricks) are awesome and helpful. Thank you Sam for putting them up so quickly!
Especially on the system prompt part. I tried a few things and found it had an impact. What you showed is much more comprehensive!
Hi Sam, thank you very much for that video. Very nice explained and I am looking forward to see more stuff like that kind of video. I really like that you are showing us how to use everything locally.
Excellent video. Good to find a LLaMa video that actually has real technical content and not "just sign up for this cloud service"
Great video - Thank you!
Thanks for the great video, feels like Llama2 is a huge leap forward for open source models.
Super helpful, thank you. Would like to see this same example using a local model.
This video is extremly clear to understand, thanks for sharing. Also the token part is a bit of a shady move but hey if it works it's ok for now.
Awesome!!
This was a big help 😊
Thank you so much! Just what I've been waiting for! Would be great to have an example to run it locally with Cuda. I'm sitting here since two days to get it running...
Super helpful! Would also love to see how we can construct agents with multiple tools with this model, I was getting some incoherent results when I tried.
Awesome videos Sam. Can you please make an instructional video for custom dataset including conversational chain?
Very good as usual.
Thank you so much for this tutorial. It seems that now everything is shifting towards using LLMs for building chatbots. My only concern is inference time, they seem to be extremely slow. Have you tried any techniques to reduce this time (such as quantized models, libraries that implement transformers in C or C++, etc.)?
If so, can you recommend any?
Thanks very much for the video, Sam. This is very valuable. The 4-bit one should work on local PC. Looking forward to your next few videos.
Hi Sam, what's the difference in the Llama2-7b model between TheBloke vs meta's ones? The 4-bit one from TheBloke seems about 3GB+ while meta's 7b version got about 10GB. Will you kindly shed some lights on this in your future video too? Thank you.
thank you , great video
Thanks for showing the 7 b model with langchain. The result are quite good. I read that there is already a 7b model with 32k. Would be nice to show if it can really summarize a big chunk of text of 20k or more....
HI Sam, thanks a ton for this. I tried integrating this with ConversationalRetrievalChain using your earlier video examples even though they were using RetrievalQA (using wrap_text_preserve_newlines and process_llm_response functions to get sources) but it didnt work. Could you tackle this in another video. Thanks
Thanks for all Sam
Why didn't you put at the beginning of the prompt ?
The tokenizer should add this automatically.
Could you make a video about API portion? Great content.
thanks, please make a video on how to fine tune and deploy?
Hey Sam , this is super Helpful. Do you have any reference or any more info about use LongChain with LLaMA2 in AWS sagemaker ? If yes then please share. Thanks in Advance.
You should make a video on integrating vllms with langchain retrieval.
You tested so many models. What is the best and closest model to chatgpt in your opinion?
Thanks.
When do you think you will have a video on agents/tools ?
I think this video is great; thank you for putting in the effort. Is there any chance you'll make a video about DemoGPT?
Very nice and helpful tutorial, thanks a lot! Would it be easy to generalize the summarization process to very long documents, where it may not be feasible to pass in, say, an entire PDF file as raw text?
In cases like that you would use map_reduce rather than stuff. I made some vids about summarization a few months back, I think mostly they are valid, but the LangChain may have slightly changed the syntax etc.
Hi Sam, thanks for a great video! A request, if you can also help with fine tuning the llama 2 model with our custom dataset from a PDF or any other document source, that would help a lot. I know langchain has a loader function for this purpose but most of the resources on the internet show the integration of OpenAI with langchain and not llama 2.
Agreed
Is there any way that I can just my custom JSON from my own POSTgres database without using HuggingFace?
Thanks for the video! You made a subscriber out of me :)
I want to use this with an M1 MacBook pro. Do you recommended to load it into cpu, or would it work with the M1 chip?
It can work with 4bit version of the model, the full size models are too big
Thank you Sam for this informative and well-explained video. I am wonder about how we can utilize the LangChain with a pert-fine tuned pretrained model? Do you think there is a trade-off between fine tuning techniques like PERT and LangChain? If you can answer with a clear short explanation, I will be appreciated :)
I am guessing you mean PeFT or do you mean BERT? I am making a set of vids about Fine-Tuning these models soon.
@@samwitteveenai I was refering to PeFT, sorry
I run on T4 but the memory is near full. So the 7B can run some short prompts. I had an error OOM when running the text summarization (940 words).
"CUDA out of memory. Tried to allocate 254.00 MiB (GPU 0; 14.75 GiB total capacity; 13.27 GiB already allocated; 210.81 MiB free; 13.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"
try running it in 8bt or 4bit and it should be fine for the 7B
@@samwitteveenai I add streamer to your notebook. It is working!
from transformers import pipeline
streamer = TextStreamer(tokenizer, skip_prompt=True)
pipe = pipeline("text-generation",
model=model,
tokenizer= tokenizer,
torch_dtype=torch.bfloat16,
device_map="auto",
max_new_tokens = 512,
do_sample=True,
top_k=10,
num_return_sequences=1,
streamer=streamer,
eos_token_id=tokenizer.eos_token_id
)
Can you tell me How much the CONTEXT Length for this model is it 32K or more
Not bad. Please show us how to combine the trained data + memory with a conversation chain. This would be awesom!
Trained data meaning ? Vectorstores?
@@samwitteveenai yes exactly. Prompt + memory/Chat History+ vectorstore embeddings+ internal knowledge from selected LLM
What happens in this instance if the conversation goes above the length of the context? Is this code smart enough to truncate the oldest prompts and responses?
Langchain has something called ConversationBufferWindowMemory which’s keeps k number of interactions. I found that useful for managing tokens. You would also want to put a token counter in and add logic to handle token length. Then after that implement Redis or something semantic for med/long term memory
Hey Sam, this Colab works great, only problem is, every time I run the cells to do the prediction it allocates more memory and never lets it go, so I get CUDA out of memory errors pretty fast. How do you release memory without restarting the whole server instance?
Are you using the T4? it could be that it is so close to the max GPU VRAM that it is crashing before getting released. I am not sure will try to check it out.
@@samwitteveenai never mind, it was a memory leak in my own code 😆
@@Ascended23 hey, i am facing the same issue. How did you identify memory leak? Can you share your solution?
Thanks for sharing. Did you you chatbots in the comment section?
Is this contain fine tuning with Langchain?
LangChain doesn't do Fine Tuning it uses FT models
Hii, Can you please make an ppt Q&A bot using LANGCHAIN AND LLAMA 2 7B
I have done a few RetrievalQA RAG vids with LLaMA 2. Check those out.