🔥🚀 Inferencing on Mistral 7B LLM with 4-bit quantization 🚀 - In FREE Google Colab

Поділитися
Вставка
  • Опубліковано 9 вер 2024
  • 🐦 TWITTER: / rohanpaul_ai
    🔥🚀 Inferencing on Mistral 7B with 4-bit quantization 🚀 | | Large Language Models
    I explain the BitsAndBytesConfig in detail
    📌 Max System RAM is only 4.5 GB and
    📌 Max GPU VRAM is 5.9 GB
    👉 *`load_in_4bit` parameter* is for loading the model in 4 bits precision
    This means that the weights and activations of the model are represented using 4 bits instead of the usual 32 bits. This can significantly reduce the memory footprint of the model. 4-bit precision models can use up to 16x less memory than full precision models and can be up to 2x faster than full precision models.
    However, if you need the highest possible accuracy, then you may want to use full precision models.
    Github - github.com/roh...
    -------------------
    🔥🐍 Check out my new Python Book - where I cover, 350+ Python Core Fundamental concepts, across 1300+ pages needed in daily real-life problems for a Python Engineer.
    For each of the concepts, I discuss the 'under-the-hood' view of how Python Interpreter is handling it.
    🔥🐍 Link to Book - rohanpaul.gumr...
    -----------------
    Hi, I am a Machine Learning Engineer | Kaggle Master. Connect with me on 🐦 TWITTER: / rohanpaul_ai - for daily in-depth coverage of Machine Learning / LLM / OpenAI / LangChain / Python Intricacies Topics.
    ----------------
    You can find me here:
    **********************************************
    🐦 TWITTER: / rohanpaul_ai
    🟠 Substack : rohanpaul.subs...
    👨‍🔧 Kaggle: www.kaggle.com...
    👨🏻‍💼 LINKEDIN: / rohan-paul-b27285129
    👨‍💻 GITHUB: github.com/roh...
    🧑‍🦰 Facebook Page: / rohanpaulai
    📸 Instagram: / rohan_paul_2020
    🟠 My UA-cam-Finance Channel: / @paulrohan
    **********************************************
    Other Playlist you might like 👇
    🟠 MachineLearning & DeepLearning Concepts & interview Question Playlist - bit.ly/380eYDj
    🟠 ComputerVision / DeepLearning Algorithms Implementation Playlist - bit.ly/36jEvpI
    🟠 DataScience | MachineLearning Projects Implementation Playlist - bit.ly/39MEigt
    🟠 Natural Language Processing Playlist : bit.ly/3P6r2CL
    ----------------------
    #LLM #Largelanguagemodels #Llama2 #opensource #NLP #ArtificialIntelligence #datascience #langchain #llamaindex #vectorstore #textprocessing #deeplearning #deeplearningai #100daysofmlcode #neuralnetworks #datascience #generativeai #generativemodels #OpenAI #GPT #GPT3 #GPT4 #chatgpt

КОМЕНТАРІ • 27

  • @javiergimenezmoya86
    @javiergimenezmoya86 11 місяців тому +1

    What is better quantify with "bitsandbytes" o do it with "cllama" GUFF? What is the difference?

  • @manueljan2117
    @manueljan2117 11 місяців тому +1

    how to use your model in the lagchain agent? I used this but it says llm value is not a valid dict
    agent = initialize_agent(tools,
    model,
    agent="zero-shot-react-description",
    verbose=True,
    handle_parsing_errors=True,
    max_new_tokens=1000)

  • @venkateshr6127
    @venkateshr6127 11 місяців тому +2

    Great video , can you make video on finetuning llm with best method.

    • @RohanPaul-AI
      @RohanPaul-AI  11 місяців тому +2

      That's exactly whats planned Venkatesh. stay tuned..

  • @MasterBrain182
    @MasterBrain182 11 місяців тому

    Astonishing content Man 🔥🔥🔥 🚀

  • @JavMend
    @JavMend 2 місяці тому

    hi, is there a simple change that can be made to the code to run inference in 8-bit?

  • @anuvratshukla7061
    @anuvratshukla7061 11 місяців тому +2

    Can you make video how to use open source LLM to query structured databse (sql/pandas) for chat

  • @gazzalifahim
    @gazzalifahim 3 місяці тому

    Hello there, this is exactly what I was looking for. Could you please give resources or any tutorial where details of those functions are discussed?
    My teammate gave a Kaggle Notebook with the exact same code and I am continuing to make that a conversational chatbot. But since I am brand new to this, I feel lost now.

  • @tomasgarcia2420
    @tomasgarcia2420 2 місяці тому

    Hi, I get my token from huggingface but I dont know where I have to put it in colab

  • @seinaimut
    @seinaimut 11 місяців тому +3

    thanks for your tutorial. I have question, how to generate output to 32k ?

  • @MihneaStefanUngurenau
    @MihneaStefanUngurenau 5 місяців тому

    Nice video, good job!

  • @Mai-sq5cc
    @Mai-sq5cc 11 місяців тому +1

    thanks for tutorial!!

  • @samketola919
    @samketola919 11 місяців тому +1

    thx 😀

  • @vinsmokearifka
    @vinsmokearifka 8 місяців тому

    Sir, any advice if I use japanese or chinese language for RAG? Thanks

  • @saravanajogan1221
    @saravanajogan1221 11 місяців тому

    Hi Sir,
    Could you tell us the mic setup and how you make your videos with such clear qulaity. Thanks

  • @user-ef2pv2du3j
    @user-ef2pv2du3j 10 місяців тому

    great video, sweet and simple. However, how can we control the token max limit, and also, do we have the option of separating our messages into a system message and a user message just like in Openai?

  • @thehkmalhotra9714
    @thehkmalhotra9714 8 місяців тому

    Loved your content buddy ❤. Can we keep this Google Colab instance keep running for free and how can we expose this model as an Rest API to use in hosted projects and that too not locally.

  • @onesecondnanba
    @onesecondnanba 8 місяців тому +1

    colab file not found pls give notebook link

    • @RohanPaul-AI
      @RohanPaul-AI  8 місяців тому +1

      Corrected the link in the description, here it is
      github.com/rohan-paul/LLM-FineTuning-Large-Language-Models/blob/main/Mistral-7B-Inferencing.ipynb

  • @mikiyasfikadu6422
    @mikiyasfikadu6422 9 місяців тому

    Help full video

  • @user-xe7wh2tw6q
    @user-xe7wh2tw6q 5 місяців тому

    Can we do this type of qunatization with any model?

    • @RohanPaul-AI
      @RohanPaul-AI  5 місяців тому

      yes we can do very much. Checkout my tweet on this
      twitter.com/rohanpaul_ai/status/1765688184753820073

  • @onesecondnanba
    @onesecondnanba 8 місяців тому +1

    how to fine tune this

    • @RohanPaul-AI
      @RohanPaul-AI  8 місяців тому +1

      For finetuning checkout this video
      ua-cam.com/video/6DGYj1EEWOw/v-deo.html&ab_channel=Rohan-Paul-AI