What is Q* | Reinforcement learning 101 & Hypothesis

AI Jason

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 25 лис 2023
🔗 Links
- Jim Fan’s tweet: / 1728100123862004105
- Reinforcement learning deep dive: • Reinforcement Learning...
- Github: Q-learning AI to play snake game - www.crafters.ai/aitools/teach...
- Lets verify step by step: arxiv.org/abs/2305.20050
- Tree of thought: arxiv.org/abs/2305.10601
- Graph of thought: arxiv.org/abs/2308.09687
👋🏻 About Me
My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com
#chatgpt #gpt4 #gpt5 #ai #artificialintelligence #tutorial #stepbystep #openai #llm #chatgpt #largelanguagemodels #largelanguagemodel #agent #reinforcementlearning
Наука та технологія

КОМЕНТАРІ • 44

@AIJasonZ 6 місяців тому ⁺¹⁴
Anything else I missed about Q*? Leave comment & let me know!
@manofsan 6 місяців тому ⁺¹
Can this approach work on Small Language Models (Alpaca, Orca, etc)? Can existing LM which has already been trained, be further trained by transfer learning which uses this Reinforcement Learning technique? Can I train LM to play Snake Game using RL?
How can ordinary people make something like Q*? I realize it will be hard to attain Q* level of performance, due to OpenAI's huge resources. But to even just demonstrate this Multi-Step Reasoning to solve some math problems would be great as proof of concept. I'm doing grad studies and would like to attempt a project on this.
@TheDessertFaux 6 місяців тому ⁺²³
That AlphaGo documentary remains so good, even a few years later. They found the human empathy and passion in a cold technical challenge, all without any narration. It gets me excited about hard tech.
@AIJasonZ 6 місяців тому ⁺²
Yea so good!
@Laurie-eg8ct 6 місяців тому
I'll bet it does.
@LukePuplett 6 місяців тому ⁺⁹
Really well put together, Jason, with use of interviews and clips.
@jasonfinance 6 місяців тому ⁺⁴
Thanks for organising the insights! this hypothesis is very exciting
@HarpaAI 6 місяців тому
Great overview! Jason, your videos on the AI topic are the best!
00:00 🤖 *"Q Star" is generating a lot of discussion in the AI community, and it's associated with OpenAI's recent actions, but its exact nature remains speculative.*
01:08 🎮 *Reinforcement learning is a machine learning framework where an agent learns from trial and error, aiming to maximize future rewards. It involves policy networks and value networks.*
03:25 🧠 *Reinforcement learning allows AI agents to self-play and discover new strategies, as demonstrated by DeepMind's achievements in games like Breakout and AlphaGo.*
08:01 📚 *There's speculation that "Q Star" could involve using policy networks and value networks, similar to AlphaGo, to improve reasoning and logic in large language models like GPT.*
11:14 🐍 *You can experiment with reinforcement learning in simple games with open-source projects, even if you're new to the field.*
@picksalot1 6 місяців тому ⁺¹¹
A clear dfinition of AGI has been difficult to find. Temporarily constraining it to a specific field for evaluation might be helpful. For instance, AGI was achieved in Chess and Go when the best humans could not beat the game programs. At a certain point, the number of fields in which AGI has been achieved will far outweigh the fields that it hasn't. When that happens, the "General" in AGI will have been attained.
@flflflflflfl 6 місяців тому
AGI was achieved in Chess and Go???
@pandoraeeris7860 6 місяців тому ⁺⁵
Q-Star is AGI.
@nickstaresinic4031 6 місяців тому
Very well organized and informative presentation.
@half_way_expert 6 місяців тому ⁺¹
Thanks for sharing man. Keep it up
@homelessrobot 6 місяців тому ⁺³
A lot of people are saying that Q* is some product of A* and Q-learning, but I think that mathematically inclined scientists are using a more formal application of _* than this. I would guess it is a generalization of Q-learning, in the same way that A* is a generalization of 'A': Dijkstra's algorithm. Maybe it involves graph search, but that is probably coincidental to the name. Pretty much everything these days involves graph search.
@AIJasonZ 6 місяців тому
Yea very likely they have a breakthrough of search, as it is what unlock truely new/innovative strategy for alpha go
@ran_domness 6 місяців тому ⁺¹
Excellent. Thanks!
@Jim-ey3ry 6 місяців тому ⁺²
This can be really huge!
@agenticmark 6 місяців тому ⁺¹
I was waiting for your video Jason! Thanks! Have you done any monte-carlo or genetic algorithm? My quess is Q* is a similar process but done at inference or a precache inference
@AIJasonZ 6 місяців тому
Thanks! I haven’t don’t it personally, but gonna give it a try! My guess is something related to the search is the main break through
@BooleanDisorder 6 місяців тому ⁺¹
Q* is the optimal route in Q-learning.
@jayhu6075 6 місяців тому
I think Q* must be OPEN SOURCE for benefit humanity. Not only for big companies.
@utkua 6 місяців тому ⁺⁶
if it is just an optimization of training, I don;t see how it unlocks the abstract thinking. If it is actually another multi-model approach, bandwidth will be a limiting factor. But I think your guesses are not far off, OpenAI focuses on training more than anything else from the start. That is how you make your product look like a breakthrough without an actual breakthrough.
@NobleCaveman 6 місяців тому
Isn't abstract thinking kind of just like increasing the chaos factor and seeking out connections between more 'random' topics or ideas?
@utkua 6 місяців тому ⁺¹
⁠@@NobleCavemanabstract thinking is defining a concept and simulating it using current knowledge of the world. State of the art ai is still one directional flow over frozen links. It is like a record of intelligence. Useful but we are still multiple breakthroughs away from a true AGI. Funny thing is nobody is chasig those problems because they are high risk in terms of ROI, everyone wants to make a small improvement that will look cool enough to secure investment.
@dancingdudezz 5 місяців тому
hey , Can you please make a video on detection on some significant insight using the reinforcement learning.
I was curious about making the model to learn itself about the irregular patterns that needs to be classified using the reinforcement learning
@agenticmark 6 місяців тому ⁺¹
Dr Jim is the shit. I will read anything his name is on.
@Laurie-eg8ct 6 місяців тому
How does the reward system work for reinforcing behavior beyond Pavlovian bell sounds that signal approval?
@nucleusaccumbens3228 6 місяців тому
tyvm
@pauldannelachica2388 6 місяців тому
❤❤❤
@PDragonLabs 6 місяців тому
👍
@igorkudryk2199 6 місяців тому
What are you recording with?
@BR-hi6yt 6 місяців тому
Still don't understand Q* although I am a little clearer about AlphaGo
@MuzhiLi 6 місяців тому
can someone explain why didn't Google figure this out despite of developing some many groundbreaking research in the last decade?
@middle-agedmacdonald2965 6 місяців тому
Sam has a heck of an enthusiastic team, that seems really tightly united. It just takes one "einstein" like thought to make it through a wall nobody else could think of.
@csabaczcsomps7655 6 місяців тому ⁺¹
Q is question and * is repeat, so make sintezis of lot answer you got general inteligent ansver. My noob opinion.
@andrewcampbell3100 6 місяців тому
Q is blowing up and its not even monday on cable....
@dwikristianto 6 місяців тому
in science, technology and engineering world,
there is no such thing (physical entity, or just an idea) have two or more names.
each one, is only represented by single name.
but not in marketing world, plenty of things have identical names or have several names.
Q* and RLHF is something in the science worlds,
so it must be pointing and representing different idea.
IMO
@lucamatteobarbieri2493 6 місяців тому ⁺¹
I hate to brake this to you but we are made mostly of dihydrogen monoxide.
@victordelmastro8264 6 місяців тому ⁺²
I don't believe for a second that The Board would panic over an improved LLM or Transformer. :P Q* was an AGI hooked up to a Quantum Computer IMO. That would freak out the Board. Quantum Singularity Core based AGIs concern me. Don't turn off that device!! Infinite knowledge is infinite negative entropy. :O
@lucamatteobarbieri2493 6 місяців тому
Open*AI

Наступне

Автоматичне відтворення

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning