Does Size Matter? Phi-3-Mini Punching Above its Size on "BENCHMARKS"
Вставка
- Опубліковано 31 тра 2024
- Microsoft just released their Phi-3 family of models that are SOTA for their weight class. The best part, the weights are publicly available and can be used locally.
🦾 Discord: / discord
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: / promptengineering
💼Consulting: calendly.com/engineerprompt/c...
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
Signup for Advanced RAG:
tally.so/r/3y9bb0
LINKS:
Blogpost: tinyurl.com/h4pxa25c
Where to test: huggingface.co/
Model Weights:
huggingface.co/microsoft/Phi-...
huggingface.co/microsoft/Phi-...
Technical Report: arxiv.org/html/2404.14219v1
Results: tinyurl.com/bdf3j3w6
TIMESTAMPS:
[00:00] Introducing Phi-3
[01:23] Performance on Benchmarks
[02:32] Testing Pi 3's Ethical Boundaries and Logical Reasoning
[06:55] Exploring Pi 3's Coding and Creative Writing
[09:25] Analyzing Agent Interactions
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu... - Наука та технологія
I think that "Sorry, I can't assist you with that. However..." is the pattern the model learned in-context for answering. Small models are more prone to such in-context pattern repetition.
Same with other questions, it may pick up patterns from previous QA pairs in context. That's why each test question should be taken within separate empty context.
that is a possibility.
6:53 Isn't the answer with the flowers wrong. If the flowers decreased by half every day, it takes one day for the field to be half filled.
The question is nonsensical; if the number of flowers is halved every day and on the 9th day it is empty, then it's empty every day and will never be half-filled.
You guys are right. The question is wrong. Didn't think about it when I changed it from the original question.
@@engineerprompt The question is quite smart actually and even ChatGPT 4 gets it wrong. My local Senku 70b q5 solved it correctly instantly, to my surprise. The emotional intelligence leaderboard seems to be quite accurate.
You should have an updated opensource comparison between wizard-LM2 7b llama-3 8b and phi-3.
good suggestion. Will see what I can do.
When I received the UA-cam notification for your video on my phone, I saw "Does Size Matter?" and I burst out laughing! YES, SIZE DOES MATTER, as we all know ;) Very witty and creative title. However, seems in the land of LLMs we hope smaller, with better data, more training beats the rest.
if you give the information what phi 3 mini needs, and then give it a question related to the information you have given, and it cannot answer the question according to your previous information, basically this model is just a chat model, only can use in chat, surely cannot use in agent system.
thought I want to share some of my test:
I ask it to code a snake game, the code seems ok with all the logic.
but when I ask it to code a snake game with javascript, initially it did ok, half way through, it start to give me none-sense that with a lot of gibberish like "import pygame
import
import py
..."
seems like they only trained it to code with python.
Could be and also you need to consider that it might be just retrieving the training data. the only way to really test these models is when you ask or change the prompts from what it might have seen in the training data.
Exploring the realms of storytelling and video creativity. VideoGPT quietly made its presence known, enhancing my content with its seamless professionalism.
can we use it on localGPT?
Yes, that's possible
Isn't it pronounced Fai rather than Pai?
You artificial intelligence models with these features to limit their capabilities are disturbing... So I guess we're not going to ever have any comedy artificial intelligence models