Does Size Matter? Phi-3-Mini Punching Above its Size on "BENCHMARKS"

Поділитися
Вставка
  • Опубліковано 31 тра 2024
  • Microsoft just released their Phi-3 family of models that are SOTA for their weight class. The best part, the weights are publicly available and can be used locally.
    🦾 Discord: / discord
    ☕ Buy me a Coffee: ko-fi.com/promptengineering
    |🔴 Patreon: / promptengineering
    💼Consulting: calendly.com/engineerprompt/c...
    📧 Business Contact: engineerprompt@gmail.com
    Become Member: tinyurl.com/y5h28s6h
    💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
    Signup for Advanced RAG:
    tally.so/r/3y9bb0
    LINKS:
    Blogpost: tinyurl.com/h4pxa25c
    Where to test: huggingface.co/
    Model Weights:
    huggingface.co/microsoft/Phi-...
    huggingface.co/microsoft/Phi-...
    Technical Report: arxiv.org/html/2404.14219v1
    Results: tinyurl.com/bdf3j3w6
    TIMESTAMPS:
    [00:00] Introducing Phi-3
    [01:23] Performance on Benchmarks
    [02:32] Testing Pi 3's Ethical Boundaries and Logical Reasoning
    [06:55] Exploring Pi 3's Coding and Creative Writing
    [09:25] Analyzing Agent Interactions
    All Interesting Videos:
    Everything LangChain: • LangChain
    Everything LLM: • Large Language Models
    Everything Midjourney: • MidJourney Tutorials
    AI Image Generation: • AI Image Generation Tu...
  • Наука та технологія

КОМЕНТАРІ • 18

  • @alexxx4434
    @alexxx4434 Місяць тому +5

    I think that "Sorry, I can't assist you with that. However..." is the pattern the model learned in-context for answering. Small models are more prone to such in-context pattern repetition.
    Same with other questions, it may pick up patterns from previous QA pairs in context. That's why each test question should be taken within separate empty context.

  • @olivert.7177
    @olivert.7177 Місяць тому +7

    6:53 Isn't the answer with the flowers wrong. If the flowers decreased by half every day, it takes one day for the field to be half filled.

    • @CRGreathouse
      @CRGreathouse Місяць тому +1

      The question is nonsensical; if the number of flowers is halved every day and on the 9th day it is empty, then it's empty every day and will never be half-filled.

    • @engineerprompt
      @engineerprompt  Місяць тому +2

      You guys are right. The question is wrong. Didn't think about it when I changed it from the original question.

    • @testales
      @testales Місяць тому

      @@engineerprompt The question is quite smart actually and even ChatGPT 4 gets it wrong. My local Senku 70b q5 solved it correctly instantly, to my surprise. The emotional intelligence leaderboard seems to be quite accurate.

  • @joshbane1
    @joshbane1 Місяць тому +5

    You should have an updated opensource comparison between wizard-LM2 7b llama-3 8b and phi-3.

    • @engineerprompt
      @engineerprompt  Місяць тому +1

      good suggestion. Will see what I can do.

  • @unclecode
    @unclecode Місяць тому

    When I received the UA-cam notification for your video on my phone, I saw "Does Size Matter?" and I burst out laughing! YES, SIZE DOES MATTER, as we all know ;) Very witty and creative title. However, seems in the land of LLMs we hope smaller, with better data, more training beats the rest.

  • @fenix20075
    @fenix20075 Місяць тому

    if you give the information what phi 3 mini needs, and then give it a question related to the information you have given, and it cannot answer the question according to your previous information, basically this model is just a chat model, only can use in chat, surely cannot use in agent system.

  • @soonheng1577
    @soonheng1577 29 днів тому

    thought I want to share some of my test:
    I ask it to code a snake game, the code seems ok with all the logic.
    but when I ask it to code a snake game with javascript, initially it did ok, half way through, it start to give me none-sense that with a lot of gibberish like "import pygame
    import
    import py
    ..."
    seems like they only trained it to code with python.

    • @engineerprompt
      @engineerprompt  29 днів тому

      Could be and also you need to consider that it might be just retrieving the training data. the only way to really test these models is when you ask or change the prompts from what it might have seen in the training data.

  • @rcohen79
    @rcohen79 Місяць тому

    Exploring the realms of storytelling and video creativity. VideoGPT quietly made its presence known, enhancing my content with its seamless professionalism.

  • @raghuvallikkat3384
    @raghuvallikkat3384 Місяць тому

    can we use it on localGPT?

  • @RobertoFabrizi
    @RobertoFabrizi Місяць тому

    Isn't it pronounced Fai rather than Pai?

  • @kabaduck
    @kabaduck Місяць тому

    You artificial intelligence models with these features to limit their capabilities are disturbing... So I guess we're not going to ever have any comedy artificial intelligence models