LLaMA2 with LangChain - Basics | LangChain TUTORIAL

Sam Witteveen

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 1 лип 2024
LLaMA2 with LangChain - Basics | LangChain TUTORIAL
Colab: drp.li/KITmw
Meta website: ai.meta.com/resources/models-...
HuggingFace models: huggingface.co/meta-llama
HF Spaces: huggingface.co/spaces/ysharma...
My Links:
Twitter - / sam_witteveen
Linkedin - / samwitteveen
Github:
github.com/samwit/langchain-t... (updated)
github.com/samwit/llm-tutorials
Timestamps:
00:00 Intro
04:47 Translation English to French
05:40 Summarization of an Article
07:08 Simple Chatbot
10:38 Chatbot using LLaMA-13B Model
Наука та технологія

КОМЕНТАРІ • 60

@ernikitamalviya 10 місяців тому
Thank you so much for taking the time to explain the LangChain concepts with the Llama model. It was very helpful and I appreciate your effort in making this information accessible :)
@matthewroman3832 11 місяців тому ⁺²
Sam, thanks for doing these videos. It really helps getting proof of concept work out much more quickly. This technology will change the world as you are aware. Thanks for your help and guidance ❤.
From one nerd to another.
@user-wr9ye4sh5m 11 місяців тому ⁺²
This (and the earlier post on prompt tricks) are awesome and helpful. Thank you Sam for putting them up so quickly!
@user-wr9ye4sh5m 11 місяців тому ⁺¹
Especially on the system prompt part. I tried a few things and found it had an impact. What you showed is much more comprehensive!
@IngmarStapel 11 місяців тому
Hi Sam, thank you very much for that video. Very nice explained and I am looking forward to see more stuff like that kind of video. I really like that you are showing us how to use everything locally.
@Nick-tv5pu 10 місяців тому
Excellent video. Good to find a LLaMa video that actually has real technical content and not "just sign up for this cloud service"
@micbab-vg2mu 11 місяців тому ⁺¹
Great video - Thank you!
@julian-fricker 11 місяців тому
Thanks for the great video, feels like Llama2 is a huge leap forward for open source models.
@anthonyshort8957 11 місяців тому ⁺²
Super helpful, thank you. Would like to see this same example using a local model.
@realsushi_official1116 11 місяців тому
This video is extremly clear to understand, thanks for sharing. Also the token part is a bit of a shady move but hey if it works it's ok for now.
@masamatsu7193 11 місяців тому
Awesome!!
This was a big help 😊
@kai1234763 11 місяців тому
Thank you so much! Just what I've been waiting for! Would be great to have an example to run it locally with Cuda. I'm sitting here since two days to get it running...
@sameersaraf3473 11 місяців тому
Super helpful! Would also love to see how we can construct agents with multiple tools with this model, I was getting some incoherent results when I tried.
@AmitKumar-ct8df 11 місяців тому ⁺⁴
Awesome videos Sam. Can you please make an instructional video for custom dataset including conversational chain?
@mrchongnoi 11 місяців тому
Very good as usual.
@lucasalvarezlacasa2098 11 місяців тому
Thank you so much for this tutorial. It seems that now everything is shifting towards using LLMs for building chatbots. My only concern is inference time, they seem to be extremely slow. Have you tried any techniques to reduce this time (such as quantized models, libraries that implement transformers in C or C++, etc.)?
If so, can you recommend any?
@guanjwcn 11 місяців тому ⁺¹
Thanks very much for the video, Sam. This is very valuable. The 4-bit one should work on local PC. Looking forward to your next few videos.
@guanjwcn 11 місяців тому ⁺¹
Hi Sam, what's the difference in the Llama2-7b model between TheBloke vs meta's ones? The 4-bit one from TheBloke seems about 3GB+ while meta's 7b version got about 10GB. Will you kindly shed some lights on this in your future video too? Thank you.
@TheAnna1101 11 місяців тому
thank you , great video
@henkhbit5748 11 місяців тому ⁺¹
Thanks for showing the 7 b model with langchain. The result are quite good. I read that there is already a 7b model with 32k. Would be nice to show if it can really summarize a big chunk of text of 20k or more....
@chanderbalaji3539 11 місяців тому ⁺²
HI Sam, thanks a ton for this. I tried integrating this with ConversationalRetrievalChain using your earlier video examples even though they were using RetrievalQA (using wrap_text_preserve_newlines and process_llm_response functions to get sources) but it didnt work. Could you tackle this in another video. Thanks
@loicbaconnier9150 11 місяців тому ⁺¹
Thanks for all Sam
Why didn't you put at the beginning of the prompt ?
@samwitteveenai 11 місяців тому ⁺²
The tokenizer should add this automatically.
@MikeEbrahimi 11 місяців тому ⁺²
Could you make a video about API portion? Great content.
@Ryan-yj4sd 11 місяців тому
thanks, please make a video on how to fine tune and deploy?
@rahulholkar1389 11 місяців тому
Hey Sam , this is super Helpful. Do you have any reference or any more info about use LongChain with LLaMA2 in AWS sagemaker ? If yes then please share. Thanks in Advance.
@user-wl9dt3ig9o 10 місяців тому
You should make a video on integrating vllms with langchain retrieval.
@MaxXFalcon 11 місяців тому ⁺¹
You tested so many models. What is the best and closest model to chatgpt in your opinion?
@Dr_Tripper 11 місяців тому
Thanks.
@mrchongnoi 11 місяців тому ⁺¹
When do you think you will have a video on agents/tools ?
@DemoGPT 10 місяців тому
I think this video is great; thank you for putting in the effort. Is there any chance you'll make a video about DemoGPT?
@darkknight32920 10 місяців тому ⁺¹
Very nice and helpful tutorial, thanks a lot! Would it be easy to generalize the summarization process to very long documents, where it may not be feasible to pass in, say, an entire PDF file as raw text?
@samwitteveenai 10 місяців тому ⁺¹
In cases like that you would use map_reduce rather than stuff. I made some vids about summarization a few months back, I think mostly they are valid, but the LangChain may have slightly changed the syntax etc.
@utkarshtiwari616 11 місяців тому ⁺³
Hi Sam, thanks for a great video! A request, if you can also help with fine tuning the llama 2 model with our custom dataset from a PDF or any other document source, that would help a lot. I know langchain has a loader function for this purpose but most of the resources on the internet show the integration of OpenAI with langchain and not llama 2.
@sarc007 11 місяців тому ⁺¹
Agreed
@ginam6174 6 місяців тому
Is there any way that I can just my custom JSON from my own POSTgres database without using HuggingFace?
@jzam5426 11 місяців тому
Thanks for the video! You made a subscriber out of me :)
I want to use this with an M1 MacBook pro. Do you recommended to load it into cpu, or would it work with the M1 chip?
@samwitteveenai 11 місяців тому
It can work with 4bit version of the model, the full size models are too big
@BusraSebin 10 місяців тому
Thank you Sam for this informative and well-explained video. I am wonder about how we can utilize the LangChain with a pert-fine tuned pretrained model? Do you think there is a trade-off between fine tuning techniques like PERT and LangChain? If you can answer with a clear short explanation, I will be appreciated :)
@samwitteveenai 10 місяців тому ⁺¹
I am guessing you mean PeFT or do you mean BERT? I am making a set of vids about Fine-Tuning these models soon.
@BusraSebin 10 місяців тому
@@samwitteveenai I was refering to PeFT, sorry
@UncleDao 11 місяців тому ⁺¹
I run on T4 but the memory is near full. So the 7B can run some short prompts. I had an error OOM when running the text summarization (940 words).
"CUDA out of memory. Tried to allocate 254.00 MiB (GPU 0; 14.75 GiB total capacity; 13.27 GiB already allocated; 210.81 MiB free; 13.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"
@samwitteveenai 11 місяців тому ⁺¹
try running it in 8bt or 4bit and it should be fine for the 7B
@UncleDao 11 місяців тому ⁺¹
@@samwitteveenai I add streamer to your notebook. It is working!
from transformers import pipeline
streamer = TextStreamer(tokenizer, skip_prompt=True)
pipe = pipeline("text-generation",
model=model,
tokenizer= tokenizer,
torch_dtype=torch.bfloat16,
device_map="auto",
max_new_tokens = 512,
do_sample=True,
top_k=10,
num_return_sequences=1,
streamer=streamer,
eos_token_id=tokenizer.eos_token_id
)
@Quotesfor_75_day 4 місяці тому
Can you tell me How much the CONTEXT Length for this model is it 32K or more
@eliasw.1155 11 місяців тому
Not bad. Please show us how to combine the trained data + memory with a conversation chain. This would be awesom!
@samwitteveenai 11 місяців тому
Trained data meaning ? Vectorstores?
@eliasw.1155 11 місяців тому ⁺²
@@samwitteveenai yes exactly. Prompt + memory/Chat History+ vectorstore embeddings+ internal knowledge from selected LLM
@s.patrickmarino7289 11 місяців тому ⁺¹
What happens in this instance if the conversation goes above the length of the context? Is this code smart enough to truncate the oldest prompts and responses?
@shaunoffenbacher6874 11 місяців тому ⁺²
Langchain has something called ConversationBufferWindowMemory which’s keeps k number of interactions. I found that useful for managing tokens. You would also want to put a token counter in and add logic to handle token length. Then after that implement Redis or something semantic for med/long term memory
@Ascended23 10 місяців тому ⁺¹
Hey Sam, this Colab works great, only problem is, every time I run the cells to do the prediction it allocates more memory and never lets it go, so I get CUDA out of memory errors pretty fast. How do you release memory without restarting the whole server instance?
@samwitteveenai 10 місяців тому ⁺¹
Are you using the T4? it could be that it is so close to the max GPU VRAM that it is crashing before getting released. I am not sure will try to check it out.
@Ascended23 10 місяців тому ⁺¹
@@samwitteveenai never mind, it was a memory leak in my own code 😆
@user-bn6uf1bw1p 3 місяці тому
@@Ascended23 hey, i am facing the same issue. How did you identify memory leak? Can you share your solution?
@dexterroy 3 місяці тому
Thanks for sharing. Did you you chatbots in the comment section?
@user-rr8qz5us1m 9 місяців тому
Is this contain fine tuning with Langchain?
@samwitteveenai 9 місяців тому
LangChain doesn't do Fine Tuning it uses FT models
@kishanraj2580 9 місяців тому ⁺¹
Hii, Can you please make an ppt Q&A bot using LANGCHAIN AND LLAMA 2 7B
@samwitteveenai 9 місяців тому ⁺¹
I have done a few RetrievalQA RAG vids with LLaMA 2. Check those out.

Наступне

Автоматичне відтворення