LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

Поділитися
Вставка

КОМЕНТАРІ • 207

  • @sharifhamza25
    @sharifhamza25 Рік тому +98

    Finally a tutorial without open ai. Open Source rules. Perfect.

  • @srikarpamidi1946
    @srikarpamidi1946 Рік тому +47

    I’ve been following along since your first video and I can say that this is the best zero-to-hero course I’ve tried. Seriously thank you for making us feel like we’re on the cutting edge with you. There’s so much to learn in this field, and every day 100 new things are happening. Thank you for making it easy to follow and code along.

  • @MichaelDude12345
    @MichaelDude12345 Рік тому +6

    Couldn't wait for it! This is the tech I have been waiting for! This is how LLM's can be powerful for us as developers even without OpenAI or any other big companies! I am here for it.

  • @redfield126
    @redfield126 Рік тому +4

    I started from your past tuto and spent hours to switch on local model (I am a noob 😅). Anyway this is perfect timing with this video to check my homework. Eager to dive into it as soon as I leave the office !

    • @redfield126
      @redfield126 Рік тому

      It was not me. I got similar weird output with other models. Trade off is the right conclusion. It is really hard to get rid of open ai or other giant models so far. Anyway. Let’s keep optimistic as huge progress is being made on a daily basis in the open source galaxy. Thanks again for the perfect content I was searching for ! Let’s keep testing and exploring as you say

  • @alessandrorossi1294
    @alessandrorossi1294 Рік тому +4

    Amazing, I've literally been spending the past day trying to learn exactly this and you boiled it down nicely!

  • @KA-kp1me
    @KA-kp1me Рік тому +9

    Decrease chunk size when splitting text so that when retriever pulls like 3 chunks out of DB those will be smaller so they will fit into context ;-) Awesome content!

  • @disarmyouwitha
    @disarmyouwitha Рік тому

    Thank you so very much for this! I have been wanting to dip my toes in with LangChain and Embeddings -- this was a very helpful and made it feel a lot more accessible.

  • @microgamawave
    @microgamawave Рік тому +3

    Make another video that you make with other models, I would be happy for bigger models or those of 13B. Love your videos❤

  • @autonomousreviews2521
    @autonomousreviews2521 Рік тому

    Thank you so much for this video! Been waiting for this :) Thumbs up before I even watch. Hoping for more on this general local llm vein, especially as new models are almost constantly coming out!

  • @tejaswi1995
    @tejaswi1995 Рік тому

    Thank you so much for this video Sam. Found the video at the right time!

  • @ozzykampha2776
    @ozzykampha2776 Рік тому +3

    Can you test the mpt-7B Model?

  • @MartinStrnad1
    @MartinStrnad1 Рік тому +1

    This is amazing stuff, I am learning a lot from the colabs! For my use case, it would be amazing to see work with larger models too (databricks Dolly comes to mind)

  • @MrOldz67
    @MrOldz67 Рік тому

    Just wanna add a comment to congrat you, you're doing a great job for the community and I personnaly learn much more watching your videos than hours to find information and make my own testing on git. Thanks for this. I will give it a try with a bloom model and report here so you guys know

    • @MrOldz67
      @MrOldz67 Рік тому

      Seems like that text2text generation doesn't support bloom do you have a list of supported model for this purpose?

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      Thanks for the kind words. Bloom is a Decoder only model so you will just use one of the notebooks that has 'text-generation' in the pipeling not 'text2text-generation'

    • @MrOldz67
      @MrOldz67 Рік тому

      @@samwitteveenai hey Sam thanks for the answer.
      Does that mean that I only have to change this setting to make decoder only model working?
      What will I loose by using text generation instead of text2text?
      Or Is it better to use encoder/decoder model?
      Thanks for the answer!

    • @samwitteveenai
      @samwitteveenai  Рік тому

      better to look for one of my notebooks with the text-generation examples in there as there are other things you need to change in how you load the model too.

  • @abhijitkadalli6435
    @abhijitkadalli6435 Рік тому +1

    Gotta try using the MPT-7B-storywritter model right ? the 65k token length will surely help?

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      I showed StoryWriter briefly in the MPT7 video. It is super slow not really great for this kind of use.

  • @DevonAIPublicSecurity
    @DevonAIPublicSecurity Рік тому

    You are the man I was working on this myself but I was started using Bert or basic NLP

  • @vijaybudhewar7014
    @vijaybudhewar7014 Рік тому +1

    Thats something unique...you are best..i think you just missed one important thing that langchain supports only text-generation and text2text-generation models from huggingface

  • @backbackduck5167
    @backbackduck5167 Рік тому

    Appreciate your sharing. Embedding with hkunlp/instructor-xl and LLM with MBZUAI/LaMini-Flan-T5-248M are the best combination i test. embedding takes some time, but searching basically takes a few seconds with this tiny model.

  • @patrafter1999
    @patrafter1999 Рік тому

    Hi Sam. Thanks a lot for you awesome videos. Your channel is probably one of the best (if not the best) sources to stay on top of LLMs in this crazy time. I'd like to make a suggestion on your content.
    When we want to use LLMs in production, we do not rely on the 'knowledge' it outputs since it's not reliable. We need accuracy and up-to-date-ness in production. And since people have their custom libraries of their own, the immediate benefit comes out of using those libraries rather than relying on Coder LLM outputs, which is largely troublesome.
    So instead of experimenting with 'What's the capital of England?', 'Tell me about Harry Potter', and 'Write a letter to Sam Altman', I think the following would be of more value to the audience.
    """
    Write a summary from the data focusing on the commonalities, differences and anomalies.
    library:
    function get_dataframe_from_csv(csvpath:str) -> pd.DataFrame
    function get_commonalities(dataframe:pd.DataFrame) -> list
    function get_differences(dataframe:pd.DataFrame) -> list
    function get_anoamlies(dataframe:pd.DataFrame) -> list
    """
    The output is expected to be a few lines of code using those library functions in the prompt. And running the automatically generated code would be quite similar to what OpenAI Code Interpreter offers. The core idea here is that this is the way I would pursue if I were to develop a production system using LLM rather than relying on the outputs directly genenerated by the LLM.
    You could perhaps do some training instead of putting the library function definitions in the prompt.
    It would be exciting to see the outputs automatically correlated and organised in the way you wanted on your own data.
    I hope this helps.

  • @unclecode
    @unclecode Рік тому +1

    It's great to see a focus on open source models. By the way, it would be better to use a stop token (such as "### Human:" and "### Assistant:") for "StableVicunaLM" instead of using regular expressions or other cleaning methods. Great video :)

  • @ryanbthiesant2307
    @ryanbthiesant2307 Рік тому

    Hi, I am having the same issue as you. That is that there was a difference between information retrieval and information analysis. These models are unable to compare and contrast two documents. So you will get against a brick wall. They cannot critically think and this is probably our problem because they seem like they can think. But they are only running through similar expected chains of words.
    Perhaps in one of your models, you are able to have a set up to critically think. I imagine it would be like when autogpt is talking to itself? But you would code the box to chat to each other in order to generate the critically evaluation of two documents, or ideas. I hope you get what I mean. For example, you would write some code to look for identifiers of keywords, like critically, evaluate, or compare, or any of those essay type terms. Then, when hearing that identifier, you engage another chat, personality. Those two personalities will then work out the opposing views, and placed this in a in the chat. There are a lot of examples of rhetorical analysis, or that you can use as a template for a prompt.

  • @Zale370
    @Zale370 Рік тому

    Thank you for this unique and super high quality content you put out for us!

  • @thequantechshow2661
    @thequantechshow2661 Рік тому

    The hashes are Markdown format! That’s actually how openai responds as well

  • @rikardotoro
    @rikardotoro Рік тому

    Thanks for the videos! Do you know how can the models be "unloaded"? Should I just delete the folders? They are taking a lot of space in my drive and I haven't been able to figure out the best way of deleting them.

  • @joser100
    @joser100 Рік тому

    Have you considered, instead of using embeddings (regardless of the model) trying to fine-tune a model, more specifically I heard the idea of using S-BERT (a simple encoder only model) to get the embeddings via fine-tuning instead, this would be efficient specially when the corpus of data is not very large (say below a million words), the idea is that fine-tuning a small model would be very fast and would eventually deliver even better results, easy to update with new data, etc. Have you heard and consider this alternative?

  • @TomanswerAi
    @TomanswerAi Рік тому

    Wild that so many models out there that can already be run locally

  • @ugurkaraaslan9285
    @ugurkaraaslan9285 8 місяців тому

    Hello, thank you so much for your video.
    I am getting error on local_llm(“text”) line. I think there is missing in langchain huggingfacepipeline.
    Value Error: The following ‘model_kwargs` are not used by the model: ['max_lenght', 'return_full_text'] (note: typos in the generate arguments will also show up in this list)

  • @mdeutschmann
    @mdeutschmann 9 місяців тому

    Would be great to have a video about top 5 open source llm allowing commercial usage. 🤓👏

  • @mao-d9v
    @mao-d9v Місяць тому

    Hi Sam may I ask what are the versions of langchain and other dependencies used in the amazing project?

    • @samwitteveenai
      @samwitteveenai  Місяць тому

      most of these are about 18 months old. I have an updated LangChain course on the way

    • @mao-d9v
      @mao-d9v Місяць тому

      @@samwitteveenai thanks,looking forward to it!

  • @Nick_With_A_Stick
    @Nick_With_A_Stick Рік тому +2

    I’ve been following auto gpt’s repos such as agent llm that allows auto gpt with local models. Its is very experimental, the models would do significantly better if they were trained with retrieval example questions. While better prompting may help, a larger context window would help, I believe a version of mpt 7 story writer that would be trained with instruction examples and examples for auto gpt for training, as well as a dataset of python code q/a already available in alpaca format on hugging face would allow the model to be trained to do many auto tasks. And for a very long time due to its context window.

    • @taiconan8857
      @taiconan8857 Рік тому

      Has this presumption panned out as you portrayed it? I had a similar impression at the time but got derailed. Trying to pick up the pieces as it were and curious if this has been at least theoretically encouraging? (Lots to still catch up on.)

    • @Nick_With_A_Stick
      @Nick_With_A_Stick Рік тому +1

      @@taiconan8857 there have no been recent attempts at making the larger token models more accurate at instruction based questions and answers, provided it would be a expensive dataset to make, unless you did it all with gpt 3.5 turbo api. I suggest you read microsoft paper Orca, its essentially the 13b llama model trained of freaking 5 Million questions where gpt 3.5 and 4 created a 5 million question step by step instructions and explanation on why it is answering the way it is answering, so instead of just mimicing chat gpt, it is actually learning and getting better at reasoning. In their paper it showed to preform similar to chat gpt. And they plan to release the model weights and they are currently working with their lawyers. This was half a week ago while ago so open ai might be a little angry 😂. But if this were to work open sourced models and ai agents are gonna be actually useful.

    • @taiconan8857
      @taiconan8857 Рік тому +1

      @nicolasmejia-petit7179 Very nice! Really appreciate all this info. I've often wondered (since I don't precisely understand the methodology between following token weights and the training data's weights) Is it not possible to have token weights that do not sufficiently resemble inputs or expected outputs it begins to favor responses that include, "I'm under the impression that" or simply "I don't know."
      Thanks again for this *great* response! 👍💎⛏️

    • @Nick_With_A_Stick
      @Nick_With_A_Stick Рік тому +1

      @@taiconan8857 yeah, that is done with open ai’s models, but with the current open source models they are called “imitation models” because they only imitate the way the model works(theres a paper about this), it has no understanding of reasoning so if it preforms something outside its training it is likley to hallucinate. This is something orca likley fixed wont know till they publicize there weights. So it is important that models know their own limits, but sometimes in the cases of uncensored models the performance is often better than their censored counterparts, meaning any phrase along the lines of “as an ai model I can not” is removed.

    • @taiconan8857
      @taiconan8857 Рік тому

      @@Nick_With_A_Stick Sounds like a classic case of "working to treat the symptoms rather than the cause" but with data instead of disease. Doesn't seem to me like people don't have sufficient tools to do this, but perhaps it's more work than they're willing to put in. 😉

  • @PrabakaranBalaji
    @PrabakaranBalaji 11 місяців тому

    Hello There, How do we validate on the source document that we receive in response?
    Because, I'm getting non matching document as sources in RetrievalQA chain

  • @alx8439
    @alx8439 Рік тому

    Incredible! Thank you so much for putting it together

  • @cinematheque20
    @cinematheque20 Рік тому

    What is the difference between HuggingFaceEmbedding vs HuggingFaceInstructEmbeddings? Are there any pros of using one over another?

  • @alizhadigerov9599
    @alizhadigerov9599 Рік тому

    Great video, Sam! One question: is it worth using another embedding model for text vectorization and use gpt-3.5-turbo model for the rest of the tasks? (agents, qa etc.) The reason behind that is - text vectorization takes too long when using openai's embedding model.

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Yes totally . I think the Instructor embeddings are actually doing better than OpanAI in many cases as well.

    • @alizhadigerov9599
      @alizhadigerov9599 Рік тому

      @@samwitteveenai Including vectorization speed? I understand, it depends on specs, but what specs would you recommend to achieve much faster model than openai's one?

  •  Рік тому

    Thank you for yet another excellent video! It's great to learn about other models and how to get started with them. Can you please provide more information about why you choose Flan T5, and the associated modeling utilities? I apologize for my lack of technical expertise, but I would appreciate a better understanding of how they operate in tandem.

  • @reezlaw
    @reezlaw Рік тому +1

    Sorry if this is an ignorant question, but is there a reason for not using 4-bit quantised models?

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      Not and ignorant question at all. In my tests for other things the current way of doing the 4bit tends to change the output. Usually just making it shorter. I will start to make some vids with the 4bit models soon as I can see it allows a lot more people to use the models with free Colab etc. Also there is a new 4bit library coming in the next few weeks.

    • @reezlaw
      @reezlaw Рік тому

      @@samwitteveenai 4bit models have been a godsend for me as I like to run them in TGUI locally

  • @ikjb8561
    @ikjb8561 Рік тому +1

    Is there a way we can store the models locally instead of downloading when script runs? I am running on command prompt in Windows. Also how do we get bits and bytes optimized for cuda? Thanks

  • @rudy.d
    @rudy.d Рік тому +1

    8:14 you might want to build a template=PromptTemplate(template="your template with a {context} and a {question}", input_variables=["context", "question"]) and pass it as a prompt=template param to your QA chain instead of overriding the .prompt.template

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Yes I just wanted people to see that all it is underneath is a string and that string can be directly manipulated.

  • @kevinehsani3358
    @kevinehsani3358 Рік тому

    Thanks for the great video. I am a bit confused about 3 tokens used for retriever! Is that the number of tokens fed into the model? I believe chatgpt does not explicitly include a retireiver component, I maybe wrong. If you could explain what is the function of these 3 tokens it would be highly appreciated, seems extremely low for any tasks.

    • @Ripshot14
      @Ripshot14 Рік тому +1

      You must be referring to the line `retriever = vectordb.as_retriever(search_args={"k": 3)`. This argument is not passed ChatGPT or any LLM for that matter. The 3 is specifying the number of documents that the "retriever" should retrieve from the Chroma vector database prepared earlier in the video. Effectively, the 3 documents found to be most relevant a given query will be pulled up and included as additional information ("context") when querying the LLM.

  • @ComicBookPage
    @ComicBookPage Рік тому

    Great video. I'm curious if there is certain types of input format of the knowledge base documents that work better than other types. Things like longer sentences versus short sentence or other things like that.

    • @samwitteveenai
      @samwitteveenai  Рік тому

      generally you want things with clear meaning in each section of text, but don't let that stop you, just try it out with a variety of docs etc.

  • @somnathdey107
    @somnathdey107 Рік тому

    Thank you so much for this nice video. Much appreciated and needed as a lot of things are happening in this area nowadays. However, I am trying the same code of WizardLM in Azure ML, but I am not getting the kind of result as you have shown in your video. A little help is much appreciated.

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Try running the Colab. What outputs are you getting different?

  • @marloparlo8594
    @marloparlo8594 Рік тому

    Amazing. I was already using WizardLM following your guide to use Huggingface Embeddings. Good to know I made the right call.
    Could you show us how to implement it with Chat memory using this method? I already have the Chat Memory tutorial from you but I don't know how to create the correct chain to combine both.

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Good idea I will add that to a future video

    • @prakaashsukhwal1984
      @prakaashsukhwal1984 Рік тому

      Great video Sam thanks for the relentless effort in helping us all.. +1 for the chat memory request and conversational QA.. hoping to see a video soon!

  • @nithints302
    @nithints302 Рік тому

    Can you do something on FLAN-T5-XXL, my Flan-T5-XL performs better than xxl, I am missing something so want to understand

  • @MrPsycic007
    @MrPsycic007 Рік тому

    Much awaited video

  • @julian-fricker
    @julian-fricker Рік тому

    How are you doing this? My UA-cam watchlist can't cope with all your content. 😂
    Thanks though, this stuff is fantastic.

  • @hiranga
    @hiranga Рік тому

    Get Sam, what is the licensing like to use these other models for commercial usage?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Some really good like the T5 models, ones like Wizard etc are not for commercial. I a, planning a part 2 to this video which will address this .

  • @jawadmansoor6064
    @jawadmansoor6064 Рік тому +1

    Any idea how to use ggml such as TheBloke/Wizard-Vicuna-13B-Uncensored-GGML (or even smaller for llama.cpp models like 7b Wizard)?

    • @jawadmansoor6064
      @jawadmansoor6064 Рік тому +1

      or this TheBloke/wizardLM-7B-GGML. please make a tutorial for this.

  • @remikapler
    @remikapler Рік тому

    Hey, love the content, might make sense if you talk about an earlier video to add a popup on the video where the earlier video is. Otherwise , well done, and thank you.

    • @samwitteveenai
      @samwitteveenai  Рік тому

      good point. Will try to do this more going forward.

  • @john.knappster
    @john.knappster Рік тому

    Are there any examples of LangChain working well to make code changes (or suggesting to make code changes) to medium to large codebases based on a prompt? For example, modify the code to make the given test pass.

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      This is an interesting idea, I haven't tried it. It might work better with a code model like Starcoder or GPT-4. The bigger models can certainly do unit tests etc.

  • @jakekill8715
    @jakekill8715 Рік тому

    This looks great! Thank you for the tutorials! Have you thought about trying langchain with different architectures like mpt? The MPT-7b-chat model seems to be quite good at coding and might have autogptq support to be able to be quantized soon so running it with langchain would be great! Thank you again for all the help

    • @snippars
      @snippars Рік тому

      Yes also interested in the MPT 7b chat. Gave it a try but failed to get it to work due to what seems to be a corrupt config file in the repo. Can't spot where the error is.

  • @geekyinnovator3437
    @geekyinnovator3437 Рік тому

    Is token size an issue if we want to train over large examples

  • @atultiwari88
    @atultiwari88 Рік тому

    Hi, Thank you for your tutorials. I am following your tutorials for quite some time now. I have watched your whole playlist on this. However I am unable to figure out best economic approach for my use case.
    I want to create a Q & A chatbot on streamlit which answers only my custom single document of about 500 pages. The document is final and won't change. From my understanding so far, I should either choose Langchain or LlamaIndex. But, I will have to use OpenAI api to get best answers, but that API is quite costly for me. So far I have thought of using Chroma for embedding and somhow storing the vectors as pkl or json on streamlit itself for re-use, so I don't have to spend again for vectors/indexing. I don't have enough credits to test different methods myself.
    Kindly guide me. Thank you.

  • @utuberay007
    @utuberay007 Рік тому

    A good benchmark chart will help

  • @thejohnhoang
    @thejohnhoang 8 місяців тому

    you are a life saver for this ty!

  • @KiranJala-j3p
    @KiranJala-j3p Рік тому

    Can you pls do vidoe on ConverstaionQA with memory, i have tried, but it is not working as per documents.

  • @cruzo333
    @cruzo333 Рік тому

    Great video man, tx !! however, I couldn't run it on my Macbook Pro (not M1). when trying to pip install transformers and xformers im getting "ERROR: Could not build wheels for xformers, which is required to install pyproject.toml-based projects" 😞

  • @MrAmack2u
    @MrAmack2u Рік тому

    unlimiformer might be interesting to look at next...unlimited context works with encoder/decoder based models.

  • @chineseoutlet
    @chineseoutlet Рік тому

    hi Sam, I tried to load WizardLM in my free Colab account. But, I tried couple times and it all failed to load. I wonder if I need to have pro or pro+ account to load your examples?

  • @redneq
    @redneq Рік тому

    I've binged watched every video and have caught up, but really sir. Did you have to be this good! ;)

  • @dadimanoj9051
    @dadimanoj9051 Рік тому

    Great work thanks, can you also try red pajama or MPT models

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Yes I will look at those soon for this kind of task

  • @ChrisadaSookdhis
    @ChrisadaSookdhis Рік тому

    This is perfect for what I am trying to do!

  • @whiteshadow5881
    @whiteshadow5881 Рік тому

    I hope these is a future edit where this is done using LLAMA 2 model

    • @samwitteveenai
      @samwitteveenai  Рік тому

      I actually have done some of these with LLaMA 2. Look for the RAG & LLaMA-2 vids

  • @cesarv-g1s
    @cesarv-g1s 7 місяців тому

    Can we use these codes in a kivy Python application??

  • @DarshitMehta-p4g
    @DarshitMehta-p4g Рік тому

    I tried with this code and it's working fine for the docs that I have given but it's also giving answers for the other questions that details are not present in the docs. so How can I put all things for only my docs and for outer questions it gives something I don't know something like this. Can you please guide me? I have used the WizardLM model.

    • @darshitmehta3768
      @darshitmehta3768 Рік тому

      ​@Sam Witteveen Do you have any idea about this and can you please guide me?

  • @clray123
    @clray123 Рік тому

    One disadvantage that needs to be mentioned vs. using OpenAI's API (besides of having to wrangle dumber/buggy models) is the model startup time. If you want to have an always-ready model loaded in the GPU waiting for your input, it gets expensive pretty quick. In fact the most baffling thing about OpenAI is how they can afford such availability at scale. But perhaps they are just bleeding cash like crazy, who knows...

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Agree, lots of disadvantages of using self hosted models.

  • @TheAnna1101
    @TheAnna1101 Рік тому

    thanks for such a great tutorial. could you show how to do the same with ConversationalRetrievalChain with return_source_documents=True? thank you

  • @theh1ve
    @theh1ve Рік тому

    How would you go about updating or removing a document from chroma. Say if a document is updated or is now out of date?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Yes take a look at the API they have functions to delete a doc etc.

    • @theh1ve
      @theh1ve Рік тому

      @@samwitteveenai thanks 👍 had a ganger and can see how you can delete collections and return the IDs based on a where clause for metadata search, guess you could then take the returned IDs and delete them or update them. Might be worth a video covering this as I can't find a single video on UA-cam for this, just a thought!

  • @ahmedkamal9695
    @ahmedkamal9695 Рік тому

    I have question .does Reterival bring data from vector db then pass it to model generate text

  • @piotrzakrzewski2913
    @piotrzakrzewski2913 Рік тому

    great content! Have you already tries fine tuning / training your own small specialised model?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Yes that is how I do it at work, but I can't release those models currently. I am working on a solution like that to show soon

  • @Maisonier
    @Maisonier Рік тому

    What about using MPT-7B-StoryWriter-65k+ instead of flan t5-xl?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      I will make a part 2 to this video with some other new models this week. The 65k is too slow though for most uses.

  • @almirbolduan
    @almirbolduan Рік тому

    Hi Sam! On your opinion, what of this models would perform better on portuguese language? Nice video! Thanks! !

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      The vector stores will be the same it really comes down to what embeddings you use. Checkout the leaderboard here and you probably want to try some multi lingual models huggingface.co/spaces/mteb/leaderboard

  • @henkhbit5748
    @henkhbit5748 Рік тому

    Great comparisons of a complete open source solution with different LLM’s. What about mpt-7b with 65 k tokens. If a LLM is big i thought u can stream it from huggingface or am i wrong? Thanks for sharing and show us about the New innovations in fast moving developments in LLM’s 👏

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      65k is too slow. You can stream some from HF, but you are limited as too how much you can stream and the context size etc. Also then your data is still going to the cloud. I will make a part 2 to this video with some other new models this week.

    • @henkhbit5748
      @henkhbit5748 Рік тому

      @@samwitteveenai Thanks, looking forward for the new LLM's..

  • @picklenickil
    @picklenickil Рік тому

    Congratulations on a brilliant video

  • @____r72
    @____r72 Рік тому

    absolute beginner q, how does one go about pouring this concoction into a flask?

    • @____r72
      @____r72 Рік тому

      or rather can a notebook be retrofitted to have REST functionality? I've googled but trying to get the lay of the land before embarking on making the first thing that comes to mind

  • @FrancChen
    @FrancChen Рік тому

    What would you use with any of these to create a UI for the chat? Gradio?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Yeah you could use Gradio or Streamlit or build a something like a NextJS app for a proper frontend

  • @What-the-Fcom
    @What-the-Fcom Рік тому

    Hi Sam,
    I found out only after importing the model that my virtualbox doesn't have access to the gpu. Could you please tell me how I can remove the model safely? I used "TheBloke/wizardLM" thank you very much if you can help and thank you for your content... so easy to listen to and watch along.
    Found the answer... using huggingface-cli delete-cache :)

  • @imanonymus5745
    @imanonymus5745 Рік тому

    How can we finetune a local LLM and can you pls make other video using search engine or other

  • @giraymordor
    @giraymordor Рік тому

    thank you :d i wonder how can i increase the answer size? is it possible to get longer answer?

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      This depends a lot on the model and the prompt we use on it. I plan to make a part 2 to this video this week.

    • @giraymordor
      @giraymordor Рік тому

      @@samwitteveenai thank you, we are looking forward you :d

  • @harshitgoyal9341
    @harshitgoyal9341 11 місяців тому

    why this collab file does not work , when i am trying to run it , it is showing multiple errors

    • @samwitteveenai
      @samwitteveenai  11 місяців тому

      A lot of of the code has been updated, so that could be the reason. I will try to make some newer vids on this topic soon.

  • @arvindelayappan3266
    @arvindelayappan3266 Рік тому

    do we have anu updated models like mistral or llama 2 index ;-) ... plz let us know

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Yes some of the new Mistral Fine Tunes are certainly worth checking out. I will try to make an updated video of this soon.

  • @loicbaconnier9150
    @loicbaconnier9150 Рік тому

    Thanks. How about using a local llm ai to find chunks using embeddings and open ai for final step: the chunks tokens and question in input ? Will it be good enought ?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Yes I did that in the previous video before this, it works well.

  • @Atlas3D
    @Atlas3D Рік тому +2

    it would be interesting to give a small model a "tool" of using a larger model for more context when needed - so that we could almost like ratchet up the intelligence needed on a per generation basis. Maybe even doing parts of the generation on smaller models then passing that information up to larger and larger models until some set threshold is reached. I feel like that is going to require some kind of lora fine tuning tho or else it will be fairly brittle.

  • @mdfarhananis8950
    @mdfarhananis8950 Рік тому

    I am using the free version of colab but it shows I ran out of ram. Do I need colab plus to use this?

    • @1509skate
      @1509skate Рік тому

      I had the same issue and upgraded to the basic plus to test this. Now it works just fine.

    • @samwitteveenai
      @samwitteveenai  Рік тому

      If you use the big wizard + stable vicuna model yes but for the others shouldn't. I am planning a vid with some smaller models this week.

  • @DevStephenW
    @DevStephenW Рік тому

    Would love to see you try different models with the question answer system. This was great. Glad to see you found one that works reasonably well. I tried T5 and mrm8488/t5-base-finetuned-question-generation-ap which worked decently well, slightly better than the plain T5. Will try the wizard but may not be able to on my GPU. Looking forward to your other tests.

    • @samwitteveenai
      @samwitteveenai  Рік тому +3

      This is great that you shared what you tried as well. I think if we all mention what we have tried it would be great and help everyone to find the best models.

  • @vijaybudhewar7014
    @vijaybudhewar7014 Рік тому

    Why are we using instruction embedding and T5 models which are different?

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      There is no reason to use the same model for the embeddings and the main LLM. Embedding LLMs are trained for different tasks. Actually Instructor uses a T5 Architecture underneath though (but thats not related to the other LLM)

    • @vijaybudhewar7014
      @vijaybudhewar7014 Рік тому

      @@samwitteveenai Thank you so much!!

  • @NavyaVedachala
    @NavyaVedachala Рік тому

    Which videos are a prerequisite for this one?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Take a look at the LangChain playlist and the few ones before this on that playlist.

  • @iaconst4.0
    @iaconst4.0 7 місяців тому

    Bro, do these codes work only for enchlish??

  • @DevonAIPublicSecurity
    @DevonAIPublicSecurity Рік тому

    You are the man ...!!!

  • @AdrienSales
    @AdrienSales Рік тому

    It's gonna be a busy week-end !😂

  • @odog8x16
    @odog8x16 Рік тому

    If I had all the files for a model downloaded to my disk, how could I use the models from my computer rather than using the huggingface pipeline?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      that HF pipeline is using models that are downloaded.

  • @charsiu8444
    @charsiu8444 Рік тому

    Attempting to run the StableVicuna and WizardLM Models and I get the error: Make sure you have enough GPU RAM to fit the quantized model. What GPU/VRAM are you running, and how do I set it as a parameter like 'gpu_memory_0'? Also, is there site which documents all parameter inputs in the transformers? (Sorry.. N00b here in Python).

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      it sounds like your GPU isn't big enough to run that one.

    • @charsiu8444
      @charsiu8444 Рік тому

      @@samwitteveenai Thanks. How much VRAM do I need to run those examples? Looking to get another graphics card soon.

  • @onroc
    @onroc Рік тому

    FINALLY!! Thanks!

  • @mjgolab
    @mjgolab Рік тому

    Sam you are the best

  • @krisszostak4849
    @krisszostak4849 Рік тому

    would WizzardLM be good for summarization as well?

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      honestly I haven't tried, but my guess is it should do a decent job at it.

  • @alipuccio3603
    @alipuccio3603 Рік тому

    Is WizardLM an encoder-decoder model? Or just a decoder model? Does this distinction matter? If anybody could get back to me, I would be so appreciative.

    • @samwitteveenai
      @samwitteveenai  Рік тому

      If it is the one based on LLaMA then it is a decoder model

  • @MariuszWoloszyn
    @MariuszWoloszyn Рік тому

    At least in principle one could use the same model to extract embedings and answer question.

    • @samwitteveenai
      @samwitteveenai  Рік тому

      This doesn't usually work well and the training for the 2 tasks is quite different.

  • @MichaelDude12345
    @MichaelDude12345 Рік тому

    I have been struggling to figure out if there is a performant way to tune and run a 13B model on a modern graphics card with 12gb of vram (I think I have seen the 4-bit mode suggested for this, but thought that the results might be more poor than just using a 7b model). I love the speeds on the device but I know for some stuff I will be bumping right up against the limits of what I can do with a 7b model. I could get and run the model on a dual-GPU setup as I have seen shown is possible by @Jeff Heaton . I just am under the impression. There would be a significant performance hit. I also really like having the model run on the one GPU I have since it is very conservative of power and I could conceivably afford to run it with the model and do some self-hosted services with it. Does anyone have any suggestions? Thank you!

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      Some new 4bit stuff is coming in a few weeks. That might help you a lot.

  • @BOMBOMBASE
    @BOMBOMBASE Рік тому

    Awesome, thanks for sharing, can run Wizard on CPU?

  • @aziz-xd4de
    @aziz-xd4de Рік тому

    what can i do if i want to do it with french documents and questions ?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      I would use a fine tuned version of the new LLaMA2 model

  • @fernandosanchezvillanueva4762

    The best video!!

  • @jasonl2860
    @jasonl2860 Рік тому +1

    can I load the models in cpu?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      probably but you will need a lot of ram and it will be very slow.