LangChain - Using Hugging Face Models locally (code walkthrough)

Sam Witteveen

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 7 бер 2023
Colab Code Notebook: [drp.li/m1mbM](drp.li/m1mbM)
Load HuggingFace models locally so that you can use models you can’t use via the API endpoints. This video shows you how to use the end points, how to load the models locally (and access model that don’t work in the end points) and load the embedding models locally.
My Links:
Twitter - / sam_witteveen
Linkedin - / samwitteveen
Github:
github.com/samwit/langchain-t...
github.com/samwit/llm-tutorials
Наука та технологія

КОМЕНТАРІ • 83

@insightbuilder Рік тому ⁺¹³
Keep up the great work. And thanks for curating the important HF models that we can use as alternate for paid LLMs. When learning new tech, using the free LLMs can provide the learner a lot of benefits.
@bandui4021 Рік тому ⁺³
Thank you! I am a newbie in this area and your vid´s are helping me a lot to get a better picture of the current landscape.
@prestigious5s23 10 місяців тому
Great tutorial. I need to train a model on some private company documents that aren't publicly released yet and this looks like it could be a big help to me. Subbed!!
@steev3d Рік тому
Nice video. Im trying to connect an LLM and use Unity 3d as my interface for STT and TTS with 3d characters. I just found a tool that enables connex to a LLM on huggingface which is how I discovered that you need a paid endpoint with GPU support to even run most of them, I kinda wish I found this video when you posted it. Very useful info.
@luis96xd Рік тому
Amazing video, everything was well explained, I needed it, thank you so much!
@tushaar9027 7 місяців тому
Great video sam , i don't know how i missed this
@sakshikumar7679 Місяць тому
saved me from hours of debugging and research! thanks a ton
@AdrienSales Рік тому
Excllent tutorial, ad so weel explained. Thanks a lot.
@hnikenna Рік тому
Thanks for this video. You just earned a subscriber
@Chris-se3nc Рік тому ⁺²
Thanks for the video. Is there any way to get an example using the lang chain JavaScript library? I am new to this area, and I think many developers would have a node versus a python background
@azzeddine1 Рік тому
How can the ready-made projects on the platform be linked to Blogger blogs? I have long days searching to no avail
@markomilenkovic2714 10 місяців тому
If we cannot afford to get A100, what's the cheaper option you would recommend to run these? I understand the models differ in size also. Thanks Sam.
@venkatesanr9455 Рік тому
Thanks for the valuable series and highly informative. Can you provide some discussions on in-context learning(providing context/query), reasoning & chain of thoughts
@samwitteveenai Рік тому ⁺²
Hi glad it is helpful. I am thinking about doing some vids on Chain of Thought prompting, Self Consistency, and PAL going through the basics of the paper and then looking at how they work in practice with an LLM. I will in the basics of in-context learning as well. Let me know if there are any others you think I should cover.
@yves1893 Рік тому
i am using huggingface model chavinlo/alpaca-native
however, when i use those embeddings with this model
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_length=248,
temperature=0.4,
top_p=0.95,
repetition_penalty=1.2
)
local_llm = HuggingFacePipeline(pipeline=pipe)
my output is always only 1 word long. can anyone explain this?
@binitapriya4976 8 місяців тому
Hi Sam, Is there any way to generate question answer from a given text in a .txt file and save those questions answers in another .txt file with the help of free huggingface model?
@DanielWeikert Рік тому
I tried to store the UA-camDownloader loads in FAISS using HuggingFace Embeddings but the LLM was not able to do the similarity search. Colab finally ran into timeout.
Can you share how to do this instead of using OpenAI? With OpenAI I had no issues but like to do it with HF Models instead e.g. Flan
br
@jzam5426 9 місяців тому
Thanks for the content!! Is there a way to run a HuggingfacePipeline loaded model using M1/M2 processors on Mac? How would one set that up?
@morespinach9832 5 місяців тому
This is helpful because in some industries like banking or telcos, it's impossible to use open source things. So we need to host.
@megz_2387 Рік тому
how to fine tune this model so that it can follow instructions on data provided
@atharvaparanjape9585 2 місяці тому
How can I load the model for some time later, once I download it on the local drive
@brianrowe1152 Рік тому
Stupid question, so I'll take a link to another video/docs/anything. Which Python version, cuda version, pytorch is the best to use for this work? I see many using python 3.9 or 3.10.6 specifically. The pytorch site recommends 3.6/3.7/3.8 on the install page. Then the cuda version 11.7 or 11.8 - it looks 11.8 is experimental? Then when I look at my nvcc output its says 11.5, but my nvidia-smi says cuda Version 12.0 .. head explodes... I'm on Ubuntu 22.04. I will google some more, but if someone know the ideal setup.. or at least the it works setup.. I appreciate it!!! Thank you
@intelligenceservices 3 дні тому
is there a way to compile a huggingface repo to a single safetensors file? (compiled from a repo that has the separate directories: scheduler, text_encoder, text_encoder_2, tokenizer, etc...)
@luis96xd Рік тому
I have a problem, when I use low_cpu_mem_usage or load_in_8_bit,
I get an error about I need to install xformers ,
When I install xformers , I get an error I need to install accelerate,
When I install accelerate, I get an error I need to install bitsandbytes,
And so on: einops accelerate sentence_transformers bitsandbytes
But finally, I got an error *NameError: name 'init_empty_weights' is not defined*
I don't know how I can solve this error and why it happens, could you help me please?
@surajnarayanakaimal Рік тому ⁺¹
Than you for the awesome content, it would be very helpful if you make tutorial on how to use custom model with langchain embed it, so i want to train some documentations , so currently we can use open ai or other service APIs
But it is very costly consuming their APIs, so can you teach how to do that locally please consider training a custom documentation of site, and it can answer from the documentation, more context aware and also history remember.
Currently for that we depend on open ai APIs. So if it's achievable using local modal it would be very helpful
@anubhavsarkar1238 Місяць тому
Hello. Can you please make a video on how to use the SeamlessM4T HuggingFace model with langchain ? Particularly for text to text translation. I am trying to do some prompt engineering with the model using Langchain's LLMChain module. But it does not seem to work ...
@MohamedNihal-rq6cz Рік тому ⁺¹
Hi sam , how do you feed your personal documents and query them and return response in a generative question answering format and not as extractive question answering , I am bit new to this library , I don't want to use Openai api keys please provide some guidance on using with open source llm models, thanks in advance!
@samwitteveenai Рік тому
that would require fine tuning the model, if you want to put the facts in there. That is probably not the best way to go though.
@halgurgenci4834 Рік тому
These are great videos Sam. I am using a Mac M1. Therefore, it is impossible to run any model locally. I understand this is because PyTorch has not caught up with M1 yet.
@samwitteveenai Рік тому ⁺²
actually I think they will wrong. I use an M1 and M2 as well but I run models in the cloud. I might try to get them to run on my M2 and make a video if it works.
@magnanimist Рік тому
Just curious, do you need to redownload the model everytime you run scripts like these? Is there a way to save the model and use it after it's been downloaded?
@samwitteveenai Рік тому ⁺¹
If you are doing this on a local machine the model will be there and HuggingFace should save it to a model. You can also do model.save_pretrained('model_name')
@srimantht8302 Рік тому
Awesome video! Was wondering how I could use Langchain with a custom model running on sagemaker? Is that possible?
@samwitteveenai Рік тому
yeah that should be possible in a similar way.
@younginnovatorscenterofint8986 Рік тому
Hello Sam,how do you solve
Token indices sequence length is longer than the specified maximum sequence length for this model (2842 > 512). Running this sequence through the model will result in indexing errors.
thank you inadavance
@samwitteveenai Рік тому ⁺¹
this is a limitation in the model not LangChain. There are some models on HF that are 2048.
@induu954 Рік тому
Hi.. i would like to know that, Can we chain 2 models like a classification model and a pretrained model using langchain?
@samwitteveenai Рік тому
You could do it through a tool. Not sure there is anything in built in LangChain for the classification models if you mean something like a BERT etc.
@stonez56 3 місяці тому
Please make a video on how to convert Safetensors to. GUFF format or format that can be used for Ollama? Thanks for these great AI videos!
@botondvasvari5758 2 місяці тому
and how can I use big models from huggingface ? I can't load them into memory because many of them are bigger than 15gb, some of them are 130gb+ . Any thoughts?
@samwitteveenai 2 місяці тому
you need a machine with multi GPUs
@human_agi Рік тому
what kind of cloab you need, becuase i am using $10 version with high ram and GPU on, and still cannotr run ValueError: A device map needs to be passed to run convert models into mixed-int8 format. Please run`.from_pretrained` with `device_map='auto'`
@samwitteveenai Рік тому
If you don't have access to the bigger GPU then go with a smaller T5 model etc.
@rudy.d Рік тому
I think you just need to add the argument device_map='auto' in the same list of arguments of your model's "*LM.from_pretrained(xxxx)" where you have "load_in_8bit=True"
@evanshlom1 10 місяців тому
U a legend
@alexandremarhic5526 Рік тому ⁺¹
Thank for he work. Just let you know Loire Valley is in the north of France ;)
@samwitteveenai Рік тому
Good for wine ? :D
@alexandremarhic5526 Рік тому ⁺¹
@@samwitteveenai depends of your taste. If you love sugar wine, south is better. Specialy for withe wine like "Jurançon".
@DarrenTarmey Рік тому
It would be nice to have someone do review fir noobies as there are so much to learn and it's hard to know we're to start from.
@samwitteveenai Рік тому
what exactly would you like me to cover? Any questions I am happy to make more vids etc.
@computadorhumano949 Рік тому
Hey, why it take time to response out? This needed of my CPU to be fast?
@samwitteveenai Рік тому
yeah for the local stuff you really need a GPU rather than a CPU
@hiramcoriarodriguez1252 Рік тому
I'm a transformers user and I don't still get the point to learn this new library. Is just for very specific use cases?
@samwitteveenai Рік тому ⁺¹
Think of it as an abstraction layer for prompting and and how to manage the user interactions with your LLM. It is not an LLM in itself.
@hiramcoriarodriguez1252 Рік тому
@@samwitteveenai I know, it's not a LLM, the biggest problem that I see is learning a new library that wraps Open AI and HuggingFace libraries just to save 3 or 5 lines of code. I will follow your work, maybe that will change my mind.
@insightbuilder Рік тому ⁺³
Consider the Transformers as the first layer of abstraction over the neural nets which create the LLMs. In order to interface with LLMs, we can use many of libraries including HF. HF Hub/ Langchain will be the 2nd layer. The USP of langchain is the ecosystem that is built around it, especially using the Agents, Utility Chains.
This ecosystem lets the LLMs to be connected with the outside world... The devs at LC have done a great job.
Do learn it, and share this absolutely brilliant vids with your friends/ team members etc.
@samwitteveenai Рік тому
great way of describing it @Kamalraj M M
@neilzedd8777 Рік тому
@@insightbuilder beyond impressed with how healthy their documentation is. Working on a flan-ul2 + lc app right now, very fun times.
@daryladhityahenry Рік тому
Hi! Do you find a way to load vicuna gptq version using this? I try your video with gpt neo 125M and it's working, but not vicuna gptq. Thank youu
@fintech1378 9 місяців тому
how to do telegram chatbot with this
@SomuNayakVlogs Рік тому ⁺¹
can you create for csv as input
@samwitteveenai Рік тому
I made another video for using CSVs with langchain check that ou
@SomuNayakVlogs Рік тому
@@samwitteveenai Thanks Sam,i already watch that video that is with opeiai but i wanted lang chain with csv and huggingface
@SomuNayakVlogs Рік тому
can you please help me on that
@Marvaniamehul Рік тому
I am also curious if we can use hugginface pipeline (local run) and langchain to load csv file.
@SD-rg5mj Рік тому
hello and thank you very much for this video,
on the other hand the problem is that I am not sure to have understood everything, I speak English badly, I am French
@samwitteveenai Рік тому
Did you try the french sub titles? I upload English subtitles so I hope youtube does a decent job translating them. Also feel free to ask any questions if you are not sure.
@XiOh 10 місяців тому ⁺²
u are not doing it locally in this video.....
@samwitteveenai 10 місяців тому
The LLMs are running locally on the machine where the code is running. The first bit shows pinging the API as a comparison.
@KittenisKitten Рік тому ⁺²
Would be useful if you explained what program your using, or what page your looking at, seems like waste of time if you don't know anything about the programs or what your doing 1/5
@samwitteveenai Рік тому ⁺¹
The Colab is linkedin in the description, its all there to use.
@mrsupremegascon 9 місяців тому
Ok, great tutorial, but as a French from Bordeaux, I am deeply disappointed by the answer of google about the best area to grow wine.
Loire valley ? Seriously ???? Name one great wine coming from Loire, Google, I dare you.
They are in the b league at best.
The answer is obviously Bordeaux, I would maybe had accepted Agen (wrong) or even Bourg*gne (very very wrong).
But Loire, it's outrageous and this answer made me certain that I will never use this cursed model.
@samwitteveenai 9 місяців тому ⁺¹
lol well at least you live in a very nice area of the world.
@nemtii_ Рік тому
What happens always with this setup langchain + HuggingFaceHub is that it only increments on 80 characters for each call, anyone else having this problem, I tried max_length: 400 and still same issue
@nemtii_ Рік тому
it's not local to langchain, I used the client directly and still getting the same issue
@samwitteveenai Рік тому ⁺¹
I think this could be an issue with their API. Perhaps on the Pro/paid version they allow more? I am not sure, to be honest I don't use their API , I tend to load the models etc.
could also the max_new_tokens setting rather than max_length, that could help.
@nemtii_ Рік тому
@@samwitteveenai wow! thank youuu!! worked with max_new_tokens
@nemtii_ Рік тому ⁺¹
@@samwitteveenai I wish someone one do a list, mapping of which model sizes runs on google colab free, versus the paid colab, and so to see if it's worth to pay, and what can u experiment with within that tier, I'm kinda lost in that sense, at a stage where I just want to evaluate models myself, and see for a production-env later
@samwitteveenai Рік тому
This would be good I agree
@ELECOEST 6 місяців тому
Hello, Thanks for your video. for now it's : llm_chain = LLMChain(prompt=prompt,
llm=HuggingFaceHub(repo_id="google/flan-t5-xxl",
model_kwargs={"temperature":0.9,
"max_length":64}))
temperature must bu >0 and model : flan-t5-xxl
@litttlemooncream5049 5 місяців тому
thx! helped a lot! but stuck at loading model...it says google/flan-t5-xl is too large to be loaded automatically (11GB > 10GB)....qaq
@samwitteveenai 5 місяців тому
try a smaller model if your GPU isn't big enough google/flan-t5-small or something like that

Наступне

Автоматичне відтворення

PAL : Program-aided Language Models with LangChain code