Embeddings vs Fine Tuning - Part 2, Supervised Fine-tuning

Поділитися
Вставка
  • Опубліковано 19 лис 2024

КОМЕНТАРІ • 26

  • @Ev3ntHorizon
    @Ev3ntHorizon Рік тому +4

    Appreciate the illustration of the difference between base and chat models.

    • @TrelisResearch
      @TrelisResearch  Рік тому +2

      thanks yeah I was thinking maybe it's overkill, but then felt it was important

  • @simonstrandgaard5503
    @simonstrandgaard5503 Рік тому +3

    Excellent and well presented. Different than other fine tuning tutorials. I appreciate that it's an unfamiliar topic "Touch Ruby" that it has no knowledge about, that's interesting seeing how it progresses. Great job.

  • @alienorreads3170
    @alienorreads3170 7 місяців тому

    Going to a interview right now. This has been really helpful for remembering and learning the logic. Thank you so much❤

  • @abadidibadou5476
    @abadidibadou5476 Рік тому +3

    Keep up the good tutorials. you are doing a really good job

  • @mark-pw1xf
    @mark-pw1xf Рік тому +2

    Enjoyed your series of tutorials. Thanks!

  • @rbdom5607
    @rbdom5607 Рік тому

    I just wanted to let you know that these videos are really fantastic compared to many of the other ones I've seen. I really appreciate it!

  • @carydunn22
    @carydunn22 8 місяців тому

    awesome job explaining everything in extra detail

  • @ghrasko
    @ghrasko 11 місяців тому +1

    From other sources I hear that one uses finetuning not for knowledge teaching, but for improving process following or output format (if prompting is not giving satisfactory results). They suggest using RAG for knowledge improvement. What are your thoughts about this?

    • @TrelisResearch
      @TrelisResearch  11 місяців тому +1

      Yes Retrieval Augmented Generation (RAG) approaches are one way to go (as in the first part of the video). The other is to do fine-tuning (parts 2 and 3). As you can see in this series, neither RAG nor fine-tuning are perfect... Depending on the use case, one or the other may be better OR sometimes both in combination. I would recommend always trying RAG and only then considering fine-tuning.

  • @aasembakhshi
    @aasembakhshi 6 місяців тому

    Thanks for this video as I keep coming back to it. Here is a scenario.
    If we have a working RAG with an open source model, (let's say, Gemma-7B) and we have a fixed corpus (let's say a pdf book), would it be a good choice if I replace the open source model with a fine-tuned version tuned on my corpus?
    I guess it should work better than the simple RAG. But now I have second question. For fine tuning such a model, should I have a data set with three columns: Chunk-of-Text, Question, Answer? Or should I have two columns (Prompt, Response), with the prompt including the chunk-of-text as the context as well?

  • @AIToluna
    @AIToluna Рік тому +1

    Well done!
    Purchased the notebook, hopefully it will help support the channel.
    Quick question: in "prepare_dataset", response_lengths and input_lengths are lists of ints, which then gives an error in TextDataset __getitem__,
    This is because the "idx" parameter in __getitem__ is a list (size of batch_size), and not an int, so doing
    self.input_lengths[idx]
    gives """TypeError: list indices must be integers or slices, not list"""
    Even if batch_size=1, the "idx" parameter is a list with 1 element, and still getting the same error, am i doing something wrong?
    Or should the input_lengths turned into tensors to support list idx? thank you in advance

    • @TrelisResearch
      @TrelisResearch  Рік тому

      Yes, another user has this issue too. Oddly, when I run the notebook, idx is an index, not a list.
      That said, can you try this fix:
      ```
      def __getitem__(self, idx):
      # print(f"__getitem__ called with index {idx}")
      if isinstance(idx, int):
      item = {key: torch.tensor(val[idx]).clone().detach() for key, val in self.encodings.items()}
      response_start_position = self.input_lengths[idx]
      response_end_position = self.input_lengths[idx] + self.response_lengths[idx]
      elif isinstance(idx, list):
      item = {key: torch.stack([val[i].clone().detach() for i in idx]) for key, val in self.encodings.items()}
      response_start_position = [self.input_lengths[i] for i in idx]
      response_end_position = [self.input_lengths[i] + self.response_lengths[i] for i in idx]
      ## Prior code for idx being an integer, not list.
      # item = {key: val[idx].clone().detach() for key, val in self.encodings.items()}
      # # Calculate the start and end positions of the response
      # response_start_position = self.input_lengths[idx]
      # response_end_position = self.input_lengths[idx] + self.response_lengths[idx]
      # Set labels to be the same as input_ids
      item["labels"] = item["input_ids"].clone()
      ```

  • @okj1999
    @okj1999 11 місяців тому +1

    Would it be better to do orca style prompts for q/a dataset?

    • @TrelisResearch
      @TrelisResearch  11 місяців тому

      Yes, using Orca tricks like including system prompts to think step by step - when generating the prompt - response pair, makes a lot of sense for the data preparation.
      arxiv.org/pdf/2306.02707.pdf

  • @Mohamed-sq8od
    @Mohamed-sq8od 6 місяців тому

    great content really helpful, quick question if the model doesn't have access to the whole rule book documents, how could he reason to answer questions other than the ones given in the train data ?

    • @TrelisResearch
      @TrelisResearch  6 місяців тому +1

      yeah, if the model hasn't been trained on the full rule book then (unless the base model has the knowledge) there's no way to answer questions correctly other than those in the training data.

  • @aneerpa8384
    @aneerpa8384 Рік тому +1

    Really informative, your channel should get wider traction.

  • @TheRealRoySadaka
    @TheRealRoySadaka Рік тому +1

    Great content, appreciate 👏🏼
    There are many QLoRa tutorials out there, including some “official” ones, but I haven’t seen mask handling in them as you describe, they use SFTTrainer, yet, what you present with the custom trainer makes perfect sense, does this means the rest of the tutorials miss the mask handling? Or is it baked into the SFTTrainer already?

    • @TrelisResearch
      @TrelisResearch  Рік тому +2

      The SFTTrainers are good and they have various options for masking. The reason I make custom trainers is because I find I always need to manually inspect the tokens. It's hard to inspect both the tokens and also the losses in the SFTTrainers, which makes it hard to troubleshoot when things go wrong (which, for me, they always do for a bit before I get things working).

    • @TheRealRoySadaka
      @TheRealRoySadaka Рік тому +1

      @@TrelisResearch
      Much appreciate the quick response

    • @TrelisResearch
      @TrelisResearch  Рік тому

      @@TheRealRoySadaka btw the other thing that I found hard to do with SFTT is training the model to have a stop token . One nuance there is that often these special tokens accidentally get tokenized differently depending on what comes before or after. This is particularly an issue with [INST] which is not in the tokenizer vocab and tends to get tokenized differently depending on what is nearby.

  • @yusufkemaldemir9393
    @yusufkemaldemir9393 Рік тому +1

    What do you think of using embeddings+supervised fine tuning? Thanks

    • @TrelisResearch
      @TrelisResearch  Рік тому +1

      Going into this, I thought embeddings would be better and more robust. However, even with embeddings, there can be inaccuracies, and even with gpt-4. Another learning is that everything depends on model strength, really you need a big model to do well with either approach. Broadly, I would think: