NEW IDEA: RL-based Fine-Tuning (Princeton, UC Berkeley)

Поділитися
Вставка
  • Опубліковано 18 лис 2024

КОМЕНТАРІ • 12

  • @criticalnodecapital
    @criticalnodecapital 3 місяці тому +4

    Yeh Dr Dre you are an asset to the community with your german sarcasm... love it.

  • @hoangnam6275
    @hoangnam6275 3 місяці тому

    Great job. My No.1 AI channel. Best wishes for u and ur family

    • @code4AI
      @code4AI  3 місяці тому

      Thank you so much 😀

  • @anu_umate
    @anu_umate 3 місяці тому

    sir how to perform fine tuning for chatbot building task

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 3 місяці тому

    But does fine tuning really matter when you have prompting?

    • @code4AI
      @code4AI  3 місяці тому +3

      Let me reformulate your question. Why should we learn a machine complex stuff like bioinformatics (via RL-based fine-tuning) when we can alternatively just provide it w/ some examples (few-shot prompting) in the prompt?
      Or: Why should we learn a human how to practice medicine, why should this human go to the university and study for 8 years (via RL-based fine-tuning), when we just take someone from the crowd and give him/her some live instructions (prompting) how to operate on an open heart?
      It all depends on your level of performance.
      An oversimplification: If your AI /machine already achieves 90% performance on a very special task (after RL-based fine-tuning on this special task), then with prompt engineering (TextGRAD and 100 examples in your prompt) you could achieve 92%.
      IF - on the other hand - your AI /machine achieves 36% performance (without RL-based fine-tuning) on your very special task it has not been pre-trained on, then with 200 shot-prompting we could see performance indicators of 43%. But remember: AI results have to be reproducible, reliable and (hopefully) explainable.
      We have to understand the methods, what they do within our machines /AI, and not just hang on to some patterns (like today prompting is more cool than fine-tuning). We have to understand the intrinsic reasons why in some situations - in some configurations of the Ai - you should choose one or the other.

  • @starbrandX
    @starbrandX 3 місяці тому +1

    Call Dr Dre. We got another one!

  • @smicha15
    @smicha15 3 місяці тому

    maybe you should plan a video where you create GraphRAG on all the recent papers you discuss, and then you show us what kind of intelligence you can glean from that.

    • @code4AI
      @code4AI  3 місяці тому +1

      Your brain doesn't need a graph for understanding. Only an AI does. 😊

  • @Sir_Ray_LegStrong_Bongabong
    @Sir_Ray_LegStrong_Bongabong 3 місяці тому +1

    first

  • @sharannagarajan4089
    @sharannagarajan4089 3 місяці тому

    Wdym when you say forget 3 steps of LLM. This is helpful onlY for the finetuning phase right? Why do you post great content and also have clickbaity content. I would love your content more if you remove the clickbaitiness. It would also increase the reputation and reliance

    • @code4AI
      @code4AI  3 місяці тому +3

      I understand you. Like you I try to hold on to the patterns that we learned: that there is an easy three step approach to train LLMs, at first the pre-training phase then the fine tuning phase and then the alignment phase by reinforcement learning by human feedback.
      But this is not true anymore. The complexity in each of those subtopics increases significantly. And the clear border between fine tuning and alignment vanish. I have to remind myself, that whatever I learned as a pattern it's just a pattern because it worked, it worked in the past.
      With new development, with an advanced understanding of new methods we can design now training cycles much more efficient. And this is the most important fact, that we have to understand what is happening. It is not anymore that we as human just follow the pattern of his three step approach in the training phase, but we have to absolutely understand why certain steps are necessary. why we have a Jeffrey divergence and not any more a asymmetric divergence, because the moment we understand how the inner workings are designed, we can code it and we don't need anymore a simple 3 step guidance.
      The more scientists built better and more complex models, the more we have to be able to understand how those models are designed internally and then you will see that those simple border lines between pre-training, fine-tuning and alignment will vanish more and more.
      And this was the whole part of my video, to make myself aware and share my knowledge with my audience, that there are no any more simple patterns. But we really have to understand the reasons WHY models are designed with increased mathematical complexities. Because the simple stuff doesn't work any more.
      And I like your comment about clickbait. Normally, if I would click bait, I would generate something that somebody can click on. But since you understand, that my main message - in the middle of my video, and not in a clickable video title - is so important, you understand that I did not designed a clickbait title, but I try to communicate that the systems are changing, that the complexity increases, and we have to understand WHY AI models are now designed in complete new ways. And we have to let go of old training patterns. Smile.