Fine Tune GPT In FIVE MINUTES with RLHF! - "Perform 10x Better For My Use Case" - FREE COLAB 📓

Поділитися
Вставка
  • Опубліковано 19 лис 2024

КОМЕНТАРІ • 13

  • @nadimkaysar257
    @nadimkaysar257 Рік тому +3

    Your tutorial is very helpful for us. please make a video for chatbots with reinforcement learning. (RLHF)

    • @WhisperingAI
      @WhisperingAI  Рік тому +1

      Thanks for the comment. Sure i will, as soon as possible

  • @animatorai
    @animatorai Рік тому +2

    Great video

  • @Techbytes-on7zl
    @Techbytes-on7zl 6 місяців тому +1

    I think this RLAIF instead of RLHF because the feedback is generated using BERT model instead of a human which forms a reward model

    • @WhisperingAI
      @WhisperingAI  6 місяців тому

      You are somewhat right and wrong aswell.
      But mostly we need to train reward model in order for this to give feedback in human label data. So in that case its rlhf.
      So happy that you point out the something interesting.❤️

  • @shivamshrivastava4242
    @shivamshrivastava4242 Рік тому +1

    Please make video covering all 3 steps from scratch, with less parameter LLM, Pleaseeee

    • @WhisperingAI
      @WhisperingAI  Рік тому

      Sure but i have already created a video, where we have finetuned tinystarcoder which is 164M parameter model.
      you can check it here:
      1. ua-cam.com/video/G3RZoxPIpXw/v-deo.html
      2. ua-cam.com/video/R2paulc3P2M/v-deo.html

    • @shivamshrivastava4242
      @shivamshrivastava4242 Рік тому

      @@WhisperingAI I am getting errors while implementing that notebook.

    • @WhisperingAI
      @WhisperingAI  Рік тому +1

      Sorry for that let me revisit the notebook and make necessary changes. Will update the notebook in couple of hours.

    • @shivamshrivastava4242
      @shivamshrivastava4242 Рік тому

      @@WhisperingAI yes please 🥺

    • @shivamshrivastava4242
      @shivamshrivastava4242 Рік тому +1

      @@WhisperingAI please also clarify the path of models and tokenizer for SFT, REWARD MODEL AND POLICY MODEL