The Ultimate Guide to Fine Tune Mistral Easily

Поділитися
Вставка
  • Опубліковано 26 чер 2024
  • 🎉 Welcome to the Mistral AI Fine-Tuning Tutorial! 🎉
    In this video, we dive into the amazing world of fine-tuning Mistral AI models. The Ultimate Guide to Fine Tune Mistral Easily.
    Massed compute: bit.ly/mervin-praison
    Coupon: MervinPraison (50% Discount)
    You'll learn:
    What is Fine-Tuning? 📚
    Steps to Fine-Tune Mistral 7B and Small Models 🛠️
    Using Mistral AI Server for Fine-Tuning 🌐
    Creating Custom Data Sets for Specific Tasks ✨
    Monitoring Training with Weights & Biases 📊
    Implementing User Interface with Gradio 🖥️
    🔔 Don't forget to subscribe and click the bell icon for more AI tutorials. If you enjoyed this video, make sure to like and share it with your friends!
    Timestamps
    0:00 - Introduction and Overview
    1:00 - Steps to Fine-Tune Mistral Models
    2:05 - Setting Up Your Environment
    4:00 - Preparing and Formatting Data
    6:00 - Uploading Data to Mistral AI Server
    8:00 - Creating a Fine-Tuning Job
    9:00 - Monitoring Job Status
    10:00 - Using the Fine-Tuned Model
    11:00 - Implementing User Interface with Gradio
    🔗 Relevant Links:
    Patreon: / mervinpraison
    Ko-fi: ko-fi.com/mervinpraison
    Discord: / discord
    Twitter / X : / mervinpraison
    Sponsor a Video or Do a Demo of Your Product: mer.vin/contact/
    Code: mer.vin/2024/06/mistral-finet...
  • Навчання та стиль

КОМЕНТАРІ • 12

  • @harshkamdar6509
    @harshkamdar6509 14 днів тому +5

    I want to fine-tune a model on entire textbooks to give it specific knowledge instead of instruction tuned datasets like these, how can I do it ?
    I am looking for fine-tune SLMs like phi 3 128k so if you can show me some resources for the same it would be really helpful

    • @xspydazx
      @xspydazx 13 днів тому +1

      for me :
      I asked gpt o give me a python code to take my folder of text files and for each document:
      create chunks of 1024 / 2048 /4096 /8k,16k,32k: (i did this for each target size) Training for different context lengths strengthens the responses generated from the contets and makes the model more robust for recalling these books later : i asked to include the document title : for each chunk also as well as obvioulsy give them a series id : so i can reconstruct the order of this book later as well as when training i can just take random batches ... shuffleing the records to get a good spread:
      then i trained the model with various prompts :
      ie : save this document for later recal , save this important information...... (instructions like this are very good to tell the model to store the data , as well as it becomes a task) ... now later i cn call to recall these same chunks in another task ie the opposite of save book....
      here we can use the larger chunks as the smaller chunks we trained with will need the model to draw all its contexts and create larger chunks for recall hence small training chunks and large recall chunks .... forcing the data through the model :
      eventually i will recall the book !
      also i find its also pruduct to just dump the books in also as raw text ... this way its also unstructured :
      so we have added unsrructured first ... then structured ... then a task...
      now we know when training that unstructured data should be trained at a rate of 0-2 loss as its unstructured so prefereably 1.25 loss:
      for storage task we need to also make sure we are pushing 20million paramters ... indepth :
      for recall re can use 5 million paramters to extract and adjust the network to better retrieve this data... later for full recall we will return to the 20million paramters ...
      here there is another phylosphy in play :
      parameter depth !

    • @xspydazx
      @xspydazx 13 днів тому

      Sorry for the long post ! (but thats the whole thing !)

    • @harshkamdar6509
      @harshkamdar6509 13 днів тому +2

      @@xspydazx don't apologise it's was a very detailed explanation thankyou for that, I did something similar
      I created segments of 128k tokens per chunk ( the context window of phi3 ) , wrapped it with the prompt template of phi 3 then used Qlora and SFTTrainer to train on the dataset, the dataset has 16 segments of 128k tokens, it was a 600 page book but when I trained the model and inferenced using the updated weights it didn't have any effect on the model, i fail to understand why, i tried adjusting the bias from none to all tried with different hyper params but to no luck

    • @xspydazx
      @xspydazx 13 днів тому

      @@harshkamdar6509 yes I have been able to recall a whole dataset of books or papers now using the techique ...
      I used it to get the. Bible in ! So we can call for a verse then a chapter then a book or summary of the book (hmm) .. and sometimes the whole document ... Without alteration. It varys .. as the bibles are clearly marked so to reference asv instead of king Jane version is no problem because I was very meticulous about the markdown Bibles as well as the normal bibles ...
      .it seems long winded I know but the bible would not take into the model ... So I had to examine the whole process of fine tuning information into the model for verbatim recall ... Hence we also will require discourse. .. but everything in order first. .. it also worked for the transformers documentation so I also did the same for langchain and gradio

  • @rodrimora
    @rodrimora 14 днів тому

    Will this work for WizardLM 2 8x22B as it's based on the mixtral 8x22B?

    • @w1ll2p0wr
      @w1ll2p0wr 14 днів тому

      WLM8x22B is so slept on, but I figure you’d have to fine tune each of the 22B experts for your use case then combine them in the MoE layer…but idk I’m just some guy

  • @aryanakhtar
    @aryanakhtar 14 днів тому +1

    Is it necessary to have an account on Massed compute for fine tuning the mistral model?

    • @MervinPraison
      @MervinPraison  13 днів тому

      No you don’t need to have massed compute account

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 14 днів тому

    Is the GPU free on Mistral server?

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 14 днів тому

    Only thing is that it has abstracted away all aspect of training into a black box such that you have no idea of the inner workings