Mixtral 8X7B - Deploying an *Open* AI Agent

Поділитися
Вставка
  • Опубліковано 9 чер 2024
  • Mistral AI's new model - Mixtral 8x7B - is pretty impressive. We'll see how to get set up and deploy Mixtral 8X7B, the prompt format it requires, and how it performs when being used as an Agent - we even add in some Mixtral RAG at the end.
    As a bit of a spoiler, Mixtral is probably the first open-source LLM that is truly very very good - I say this considering the following key points:
    - Benchmarks show it to perform better than GPT-3.5.
    - My own testing shows Mixtral to be the first open weights model we can reliably use as an agent.
    - Due to MoE architecture it is very fast given its size. If you can afford to run on 2x A100s and latency is good enough to be used in chatbot use cases.
    📕 Mixtral 8X7B Page:
    www.pinecone.io/learn/mixtral...
    📌 Code Notebook:
    github.com/pinecone-io/exampl...
    🌲 Subscribe for Latest Articles and Videos:
    www.pinecone.io/newsletter-si...
    👋🏼 AI Dev:
    aurelio.ai
    👾 Discord:
    / discord
    Twitter: / jamescalam
    LinkedIn: / jamescalam
    00:00 Mixtral 8X7B is better than GPT 3.5
    00:50 Deploying Mixtral 8x7B
    03:21 Mixtral Code Setup
    08:17 Using Mixtral Instructions
    10:04 Mixtral Special Tokens
    13:29 Parsing Multiple Agent Tools
    14:28 RAG with Mixtral
    17:01 Final Thoughts on Mixtral
    #artificialintelligence #nlp #ai #chatbot #opensource
  • Наука та технологія

КОМЕНТАРІ • 69

  • @ZacMagee
    @ZacMagee 5 місяців тому +3

    Love your content man. Been following for about 6 months and it's great to see your evolution. It's inspiring. Keep it up🎉

    • @jamesbriggs
      @jamesbriggs  5 місяців тому

      that's awesome to hear I appreciate it, thanks for sticking around!

    • @FlipTheTables
      @FlipTheTables 4 місяці тому

      They have the 70b model on hugging face yall

    • @FlipTheTables
      @FlipTheTables 4 місяці тому

      It was the maestral leak

    • @jamesbriggs
      @jamesbriggs  4 місяці тому

      ​@@FlipTheTables the miqu model? Seems odd to me that mistral medium is a single 70B model (I haven't tested it yet though)

  • @shameekm2146
    @shameekm2146 5 місяців тому +1

    Wow this content is awesome. Looking forward to upgrading the existing Llama-2 in my RAG application to this model. Cant wait to test out how this performs. Thank you :)

    • @jamesbriggs
      @jamesbriggs  5 місяців тому

      it's awesome, better results guaranteed

  • @johnny017
    @johnny017 5 місяців тому +28

    I believe open source models will prove more beneficial in the future compared to closed models. We have transparency regarding the data they were trained on, and we can fine-tune them according to our specific requirements. Full control. I must admit, I'm a bit tired of OpenAI, ahah!

    • @jamesbriggs
      @jamesbriggs  5 місяців тому +7

      yeah me too, I'm truly very happy that this model is available, it gives me faith that we're moving in a good direction with open models

    • @bigglyguy8429
      @bigglyguy8429 5 місяців тому

      @@jamesbriggs This is obviously way beyond a home setup. If using an RTX 4070TI and 32GB what would be the most powerful model I could run, at a practical speed?

    • @victormustin2547
      @victormustin2547 4 місяці тому

      mistral is open weights but you can’t actually audit the training data. but still i agree with this point ofc

  • @travisporco
    @travisporco 5 місяців тому

    This is great...nice to see some actual nuts and bolts for a change!

  • @ElliottAbraham
    @ElliottAbraham 5 місяців тому +4

    I discovered yesterday that Mistral partnered with Google to train and tune their models on Google Vertex AI. I like this Mixtral model a lot. I set it up and tested it against Bard with Gemini Pro and Mixtral is fast but Bard gives more verbose and well thought out responses.

    • @jamesbriggs
      @jamesbriggs  5 місяців тому

      nice I didn't know this, that's awesome

  • @adityaretissin1856
    @adityaretissin1856 2 місяці тому

    Amazing video man! Learnt alot! :)

  • @noomondai
    @noomondai 5 місяців тому

    Thank you, excellent!

  • @user-lu7pp9mb8y
    @user-lu7pp9mb8y 5 місяців тому +1

    Hey James,
    Any specific reason you chose runpod for computational power instead of trying something like PoE or fireworks which has opensource models available which you can run using API?

  • @albertgao7256
    @albertgao7256 5 місяців тому +1

    ollama ai updated with the 8x7B model,
    Q4_0 on M3 Max / 64GB runs like a charm.
    prompt eval rate: 83.24 tokens/s
    eval rate: 36.76 tokens/s

  • @markjaysonalvarez7689
    @markjaysonalvarez7689 5 місяців тому

    I was messing with Mixtral via Together AI's API, and I was confused about the [INST] AND tokens. Thank you for the clarification!

    • @jamesbriggs
      @jamesbriggs  5 місяців тому

      yep I'm not sure if this is the exact way of doing it with the token, but it worked for me like this

  • @am0x01
    @am0x01 4 місяці тому

    Do you think that running inference with the code you provided pointing it to a running version of the model in LM Studio would be feasible?

  • @user-wy2mr7kg4b
    @user-wy2mr7kg4b 5 місяців тому +1

    How do I run this on a production env with a frontend interface?

  • @maxlgemeinderat9202
    @maxlgemeinderat9202 5 місяців тому

    Can you share how the prompt looks line for RAG tasks?
    Somehow Mixtral is ignoring the task I want it to do

  • @alizhadigerov9599
    @alizhadigerov9599 5 місяців тому +1

    It there a way to turn the instance on and off programmatically to save the cost? If yes, how fast/slow that operation is?
    Since most of such applications are used as a part of a bigger system, I was thinking to turn the instance only when a user sends a request and keep it off otherwise.

    • @jamesbriggs
      @jamesbriggs  5 місяців тому +1

      I didn't see the option to with runpod, you can do this with Hugging Face (but the $/h cost is higher while it is on) and there may be other options out there

  • @user-ug3pf3uw6x
    @user-ug3pf3uw6x 5 місяців тому

    Does your example mean it can do function calling?

  •  5 місяців тому

    What would be easiest way to fine-tune Mixtral for another language?

  • @Sara-he1fz
    @Sara-he1fz 5 місяців тому

    @jamesbriggs Could you please teach how we can use these open weights models locally not through APIs if we have powerful GPUs. How to read the model in our code and other stuff.

  • @angelalmalaq
    @angelalmalaq 3 місяці тому

    can my pc run the model Mixtral 8X7B ? rtx 3080 10 go ram, intel 12700, 64 ram, 6 to of space disc m.0 or ssd or hdd ... ?

  • @gangs0846
    @gangs0846 5 місяців тому

    Great content thank you! Is it possible to download the mixtral model from huggingface and use it completely offline?

    • @Purkkaviritys
      @Purkkaviritys 5 місяців тому

      Yes, provided that you have enough vram for the gpu or ram for the cpu.

  • @robertonavoni6765
    @robertonavoni6765 5 місяців тому +1

    I already did the same test on my M3 Ultra with 36 GB memory ... don't need 2 A100 for inference . could use a quantized version of models

    • @jamesbriggs
      @jamesbriggs  5 місяців тому

      yes that's awesome, testing quantized soon

    • @MrDoom4all
      @MrDoom4all 2 місяці тому

      Hey Roberto, how did the token per second rate on large prompts (e.g., 8k tokens) look like? THanks.

  • @fernandofernandesneto7238
    @fernandofernandesneto7238 5 місяців тому +1

    Running over llama.cpp on rtx 4090 (4 bits) runs fine.... I am looking forward to replicate your agent locally. Great video

    • @jamesbriggs
      @jamesbriggs  5 місяців тому +1

      nice, did you notice much change in performance?

    • @fernandofernandesneto7238
      @fernandofernandesneto7238 5 місяців тому

      @@jamesbriggs negligible. I have tested 4 bits gguf, q km, using llama.cpp

  • @Yosuke-ov7jw
    @Yosuke-ov7jw 5 місяців тому

    Hi 👋 Will this be cheaper than using chatGTP4 API?

    • @jamesbriggs
      @jamesbriggs  5 місяців тому +1

      not how I set it up here, but technically speaking it should be possible as most people in the know put the parameters of gpt-4 at a much higher number than mixtral - so I'd expect services/APIs to pop up very soon (I already saw some) that will provide it at cheaper cost

    • @Yosuke-ov7jw
      @Yosuke-ov7jw 5 місяців тому

      Thank you! I love your tutorial videos and I find it very helpful all the time. Pls consider making the video on the cheapest way to use an Open source model in production in the future videos 😄

  • @Tarantella.Serpentine
    @Tarantella.Serpentine 5 місяців тому

    Is there way to run this with Open-interpreter?

    • @jamesbriggs
      @jamesbriggs  5 місяців тому

      I don't know I haven't used it, I'm sure if not there probably will be pretty soon

  • @ronelgem23
    @ronelgem23 4 місяці тому

    am i the only one using the notebook and getting a cuda error? something about
    "Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions."

  • @DihelsonMendonca
    @DihelsonMendonca 5 місяців тому

    Can you run on Google collabs ? It can be run on a local machine as well. 🎉❤

  • @andikunar7183
    @andikunar7183 5 місяців тому

    Running it (Q5 K_M quantized) with llama.cpp locally on a M2 Max, using approx. 40GB (total) with approx. 25 token/s. Q5 performs great (low perplexity), no need for A100s. Also looking forward to QMoE quantization advances (currently under discussion) - this could even reduce memory requirements a lot more. Since llama-cpp-python uses llama.cpp binaries, it should run as well for python programs (have not tried it yet).

    • @Ayresplastering
      @Ayresplastering 5 місяців тому +1

      Wait so you can run mixtral on an m2?

    • @andikunar7183
      @andikunar7183 5 місяців тому

      @@Ayresplastering yes, but you need 40+ GB RAM for good answers, compressing to 3 Bits or less degrades quality significantly. And an M2 Max is needed for 25 token/s. A M2 pro has 1/2 the performance, a M3 even less - the M3 Pro might have a faster GPU, but its memory is much slower than the M2 Pro, so slower here. An M2 Ultra might be 2x as fast, an on pure token/s generation not much slower than a RTX 6000.

    • @danielniehoerster7460
      @danielniehoerster7460 5 місяців тому +1

      Try to convert this (or the weights) for Apple SoC with Open MXL. I did it with the original Model (M3 Max, 128GB) and it runs like a charm.. ;-)

    • @Ayresplastering
      @Ayresplastering 5 місяців тому

      @@andikunar7183 so I could run it on run pod with autogen as a quantised version? Would it lose quality drastically? This could be a cheap and easy way to get a bulk of my work done as I'm using gpt4 and it's quite slow to constantly prompt it where as this could get me relatively close as gpt 3.5 is good enough for most things for me

    • @jamesbriggs
      @jamesbriggs  5 місяців тому +1

      Yes I saw TheBloke quantized just before this video, I haven't tried it locally yet but I'm intending to - it's so cool

  • @robboerman9378
    @robboerman9378 5 місяців тому +1

    So $3.89 per hour means $2,905 per month? That seems VERY expensive, or am I missing something?

    • @jamesbriggs
      @jamesbriggs  5 місяців тому +3

      yep, and this was cheaper than most services I looked at - if you're using it 24/7 or have particular requirements that stops you using OpenAI/Anthropic etc, that may be a reason to go with runpod
      Realistically though, I think most would want to be deploying this either on their own cloud infra (cheaper, but does cost more upfront engineering budget), or another service that is cheaper

  • @Fordtruck4sale
    @Fordtruck4sale 5 місяців тому

    You can run this unquantized on 96GB of vram so 2 48GB cards would do the trick, save a bit of cash over those 80gigers

  • @user-ig2og2yq3b
    @user-ig2og2yq3b 3 місяці тому

    Hello dear,
    please let me know how to create a fixed forms with the below structures with special command to LLM:
    Give me score out of 4 for (based on the TOEFL rubric) without any explanation, just display the score.
    General Description:
    Topic Development:
    Language Use:
    Delivery:
    Overall Score:
    Identify the number of grammatical and vocabulary errors, providing a sentence-by-sentence breakdown.
    'Sentence 1:
    Errors:
    Grammar:
    Vocabulary:
    Recommend effective academic vocabulary and grammar:'
    'Sentence 2:
    Errors:
    Grammar:
    Vocabulary:
    Recommend effective academic vocabulary and grammar:'
    .......

  • @antikras666
    @antikras666 5 місяців тому

    Karpathy goes mysogony

  • @brytonkalyi277
    @brytonkalyi277 5 місяців тому

    *★ I believe we are meant to be like Jesus in our hearts and not in our flesh. But be careful of AI, for it knows only things of the flesh which are our fleshly desires and cannot comprehend things of the spirit such as true love and eternal joy that comes from obeying God's Word. Man is a spirit and has a soul but lives in a body which is flesh. When you go to bed it is the flesh that sleeps, but your spirit never sleeps and that is why you have dreams, unless you have died in peace physically. More so, true love that endures and last is a thing of the heart. When I say 'heart', I mean 'spirit'. But fake love, pretentious love, love with expectations, love for classic reasons, love for material reasons and love for selfish reasons those are things of the flesh. In the beginning God said let us make man in our own image, according to our likeness. Take note, God is Spirit and God is Love. As Love He is the source of it. We also know that God is Omnipotent, for He creates out of nothing and He has no beginning and has no end. That means, our love is but a shadow of God's Love. True love looks around to see who is in need of your help, your smile, your possessions, your money, your strength, your quality time. Love forgives and forgets. Love wants for others what it wants for itself. However, true love works in conjunction with other spiritual forces such as patience and faith - in the finished work of our Lord and Savior, Jesus Christ, rather than in what man has done such as science, technology and organizations which won't last forever. To avoid sin and error which leads to the death of your body and your spirit-soul in hell fire (second death), you must make God's Word the standard for your life, not AI. If not, God will let you face AI on your own (with your own strength) and it will cast the truth down to the ground, it will be the cause of so much destruction like never seen before, it will deceive many and take many captive in order to enslave them into worshipping it and abiding in lawlessness. We can only destroy ourselves but with God all things are possible. God knows us better because He is our Creater and He knows our beginning and our end. The prove text can be found in the book of John 5:31-44, 2 Thessalonians 2:1-12, Daniel 2, Daniel 7-9, Revelation 13-15, Matthew 24-25 and Luke 21.
    *HOW TO MAKE GOD'S WORD THE STANDARD FOR YOUR LIFE?*
    You must read your Bible slowly, attentively and repeatedly, having this in mind that Christianity is not a religion but a Love relationship. It is measured by the love you have for God and the love you have for your neighbor. Matthew 5:13 says, "You are the salt of the earth; but if the salt loses its flavor, how shall it be seasoned? It is then good for nothing but to be thrown out and trampled underfoot by men." Our spirits can only be purified while in the body (while on earth) but after death anything unpurified (unclean) cannot enter Heaven Gates. No one in his right mind can risk or even bare to put anything rotten into his body nor put the rotten thing closer to the those which are not rotten. Sin makes the heart unclean but you can ask God to forgive you, to save your soul, to cleanse you of your sin, to purify your heart by the blood of His Son, our Lord and Savior, Jesus Christ which He shed here on earth - "But He was wounded for our transgressions, He was bruised for our iniquities; the chastisement for our peace was upon Him, and by His stripes we are healed", Isaiah 53:5. Meditation in the Word of God is a visit to God because God is in His Word. We know God through His Word because the Word He speaks represent His heart's desires. Meditation is a thing of the heart, not a thing of the mind. Thinking is lower level while meditation is upper level. You think of your problems, your troubles but inorder to meditate, you must let go of your own will, your own desires, your own ways and let the Word you read prevail over thinking process by thinking of it more and more, until the Word gets into your blood and gains supremacy over you. That is when meditation comes - naturally without forcing yourself, turning the Word over and over in your heart. You can be having a conversation with someone while meditating in your heart - saying 'Thank you, Jesus...' over and over in your heart. But it is hard to meditate when you haven't let go of offence and past hurts. Your pain of the past, leave it for God, don't worry yourself, Jesus is alive, you can face tomorrow, He understands what you are passing through today. Begin to meditate on this prayer day and night (in all that you do), "Lord take more of me and give me more of you. Give me more of your holiness, faithfulness, obedience, self-control, purity, humility, love, goodness, kindness, joy, patience, forgiveness, wisdom, understanding, calmness, perseverance... Make me a channel of shinning light where there is darkness, a channel of pardon where there is injury, a channel of love where there is hatred, a channel of humility where there is pride..." The Word of God becomes a part of us by meditation, not by saying words but spirit prayer (prayer from the heart). When the Word becomes a part of you, it will by its very nature influence your conduct and behavior. Your bad habits, you will no longer have the urge to do them. You will think differently, dream differently, act differently and talk differently - if something does not qualify for meditation, it does not qualify for conversation. Glory and honour be to God our Father, our Lord and Savior Jesus Christ and our Helper the Holy Spirit. Let us watch and pray... Thank you for your time.

  • @antikras666
    @antikras666 5 місяців тому

    Andrew Tate goes nns

  • @user-cq1wc5tz7c
    @user-cq1wc5tz7c 5 місяців тому

    -∆- I believe we are meant to be like Jesus in our hearts and not in our flesh. But be careful of AI, for it is just our flesh and that is it. It knows only things of the flesh (our fleshly desires) and cannot comprehend things of the spirit such as peace of heart (which comes from obeying God's Word). Whereas we are a spirit and we have a soul but live in the body (in the flesh). When you go to bed it is your flesh that sleeps but your spirit never sleeps (otherwise you have died physically) that is why you have dreams. More so, true love that endures and last is a thing of the heart (when I say 'heart', I mean 'spirit'). But fake love, pretentious love, love with expectations, love for classic reasons, love for material reasons and love for selfish reasons that is a thing of our flesh. In the beginning God said let us make man in our own image, according to our likeness. Take note, God is Spirit and God is Love. As Love He is the source of it. We also know that God is Omnipotent, for He creates out of nothing and He has no beginning and has no end. That means, our love is but a shadow of God's Love. True love looks around to see who is in need of your help, your smile, your possessions, your money, your strength, your quality time. Love forgives and forgets. Love wants for others what it wants for itself. Take note, true love works in conjunction with other spiritual forces such as patience and faith (in the finished work of our Lord and Savior, Jesus Christ, rather than in what man has done such as science, technology and organizations which won't last forever). To avoid sin and error which leads to the death of our body and also our spirit in hell fire, we should let the Word of God be the standard of our lives not AI. If not, God will let us face AI on our own and it will cast the truth down to the ground, it will be the cause of so much destruction like never seen before, it will deceive many and take many captive in order to enslave them into worshipping it and abiding in lawlessness. We can only destroy ourselves but with God all things are possible. God knows us better because He is our Creater and He knows our beginning and our end. Our prove text is taken from the book of John 5:31-44, 2 Thessalonians 2:1-12, Daniel 2, Daniel 7-9, Revelation 13-15, Matthew 24-25 and Luke 21. Let us watch and pray... God bless you as you share this message to others.

    • @noneofyourbusiness8625
      @noneofyourbusiness8625 5 місяців тому

      Wow, these religion instructed LLM bots sure are getting advanced!

    • @user-cq1wc5tz7c
      @user-cq1wc5tz7c 5 місяців тому

      @@noneofyourbusiness8625 -∆- Good Morning, God bless you for your reply. Christianity is not a religion but a Love relationship. It's all about I love God and I love my neighbor. Matthew 5:13 says, "You are the salt of the earth; but if the salt loses its flavor, how shall it be seasoned? It is then good for nothing but to be thrown out and trampled underfoot by men." A soul can only be purified while in the body (while on earth) but after death anything unpurified (unclean) cannot enter Heaven Gates. No one in his right mind can risk or even bare to put anything rotten into his live body nor put the rotten thing closer to the those which are not rotten. Sin makes the soul unclean but we can ask God to forgive us (to cleanse us) by the blood of His Son, our Lord and Savior, Jesus Christ which He shed here on earth. Meditation in the Word of God is a visit to God because God is in His Word. We know God through His Word because the Word He speaks represent His heart's desires. Meditation is a thing of the heart not a thinking process. You think of your problems, your troubles but you should let the Word you read prevail over thinking process by thinking of it more, turning it over and over in your heart. Begin to meditate on this prayer day and night (in all that you do), "Lord take more of me and give me more of you. Give me more of your holiness, faithfulness, obedience, self-control, purity, humility, love, goodness, kindness, forgiveness, wisdom, understanding, calmness, perseverance... Make me a channel of shinning light where there is darkness, a channel of pardon where there is injury, a channel of love where there is hatred, a channel of humility where there is pride..." The Word of God becomes a part of us by meditation, not by saying words but spirit prayer (prayer from the heart). When the Word becomes a part of you, it will by its very nature influence your conduct and behavior. Glory and honour be to God our Father, our Lord and Savior Jesus Christ and our Helper the Holy Spirit. Thank you again for your time.