Do you think that ChatGPT can reason?

Machine Learning Street Talk

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 26 гру 2024

КОМЕНТАРІ • 375

@luke.perkin.online 4 місяці тому ⁺²¹
My notes from this episode:
Formal languages have intepreters, and can accept any gramatically correct code.
The world is the interpreter for natural languages.
We can't tell the difference between internaly reasoning from first principles and retrieval.
Planning is an example of reasoning, e.g. stacking blocks that results in a certain sequence or shape. Swapping out the words 'stack' and 'unstack' for 'fist' and 'slap' and GPT4 fails.
Reasoning is defined from a logical perspective. Deductive closure based on base facts. You don't need to just match distribution of query - answer, you need to do deductive closure. Transitive closure for example is a small part of deductive closure.
People stop at the first intresting result from an LLM. For example, it can do a 13 rotational cypher, but it can't do any other number. If you can execute the general principle you should be able to do it.
Ideation requires shallow knowledge of wide scope.
Distributional properties versus instance level correctness. LLMs and diffusion models are good at one and not at the other.
When an LLM critiques its own solutions, its accuracy goes down - it halucinates errors and incorrect verifications.
Companies tell us they have million word context, but they make errors an intelligent child wouldn't make in a ten word prompt.
They're good at 'style' not 'correctness'. Classical AI was better at correctness, not style.
Teach a man to fish example - LLMs need 1 fish, 2 fish.. 3 fish... to N fish.
A 'general advice taker' is roughly equivalent to the goal of general AI.
"Modulo LLMs" - LLMs guess, and bank of external verifiers, verify. Back prompt, chain of thought, etc.
Agentic systems are worthless without planning. It's not interchangable - toddlers can operate guns, cows with a plan can't answer the phone.
@SouhailEntertainment 5 місяців тому ⁺²⁶
Introduction and Initial Thoughts on Reasoning (00:00)
The Manhole Cover Question and Memorization vs. Reasoning (00:00:39)
Using Large Language Models in Reasoning and Planning (00:01:43)
The Limitations of Large Language Models (00:03:29)
Distinguishing Style from Correctness (00:06:30)
Natural Language vs. Formal Languages (00:10:40)
Debunking Claims of Emergent Reasoning in LLMs (00:11:53)
Planning Capabilities and the Plan Bench Paper (00:15:22)
The Role of Creativity in LLMs and AI (00:32:37)
LLMs in Ideation and Verification (00:38:41)
Differentiating Tacit and Explicit Knowledge Tasks (00:54:47)
End-to-End Predictive Models and Verification (01:02:03)
Chain of Thought and Its Limitations (01:08:27)
Comparing Generalist Systems and Agentic Systems (01:29:35)
LLM Modulo Framework and Its Applications (01:34:03)
Final Thoughts and Advice for Researchers (01:35:02)
Closing Remarks (01:40:07)
@rossminet Місяць тому
Formal language (mathematical logic) draws from natural langauge (if then and or not every exits) but gives a very precise meaning to these terms.
Natural language has a logical underpinning with logical inferences but adds pragmatics inferences (implicitation) drawn from a an exchange. Both types of inferences must not be confused or you end up with contradictions.
Example of pragmatic inference: - Did the student attend the lecture? - Some did.
The answer suggests that they did not all attend. But it's not a logical inference. The speaker has to give the MAXIMAL true answer, which he fails to do in the example.
@NunTheLass 5 місяців тому ⁺⁴³
Thank you. He was my favorite guest that I watched here so far. I learned a lot.
@trucid2 5 місяців тому ⁺²¹⁷
I've worked with people who don't reason either. They exhibit the kind of shallow non-thinking that ChatGPT engages in.
@ericvantassell6809 5 місяців тому ⁺¹⁸
ayup. keywords provoke response without understanding.
@billykotsos4642 5 місяців тому ⁺⁵
especially CEOs
@ihbrzmkqushzavojtr72mw5pqf6 5 місяців тому ⁺¹⁶
Why are you talking about me in public????
@2dapoint424 5 місяців тому ⁺¹
😎
@stevengill1736 5 місяців тому ⁺⁷
LOL - I imagine we all visit this probability space occasionally... ;*[}
@DataTranslator 5 місяців тому ⁺¹⁹
His analogy of GPT to learning a second language makes 100% sense to me.
I’m a nonnative speaker of English; yet I mastered it through grammar first and adding rules and exceptions throughout the years.
Also, concepts were not the issue; but conveying those concepts was initially very challenging.🇲🇽🇺🇸
@AICoffeeBreak 3 місяці тому ⁺³
Thanks for having Prof. Kambhampati! I got to experience him first hand at this year's ACL where he also gave a keynote. What a great character! 🎉
@memetb5796 4 місяці тому ⁺¹⁶
This guest was such pleasant person to listen to: there is a indescribable joy in listening to someone that is clearly intelligent and a subject matter expert that just can't be gotten anywhere else.
@espressojim 4 місяці тому ⁺⁹
I almost never comment on youtube videos. This was an excellent interview and very informative. I'd love to hear more from Prof. Subbarao Kambhampati, as he did an amazing job of doing scientific story telling.
@jonashallgren4446 4 місяці тому ⁺⁹
Subbarao had a great tutorial at ICML! The general verification generation loop was very interesting to me. Excited to see more work in this direction that optimise LLMs with verification systems.
@elgrego 5 місяців тому ⁺⁸
Bravo. One of the most interesting talks I’ve heard this year.
@stephenwallace8782 Місяць тому ⁺¹
Dude what's cool about this format is how much you trust your audience. Really great concentration and even a lot of more subtlely inspiring kinds of understanding. It makes the relationship between computer science and philosophy very clear.
@Hiroprotagonist253 4 місяці тому ⁺³
For natural languages the world is the interpreter. What a profound statement 🤯. I am enjoying this discussion so far!
@sammcj2000 4 місяці тому ⁺³
Fantastic interview, Prof Kambhampati seems to be not just wise but but governed by empathy and scepticism which is a wonderful combination.
@dr.mikeybee 5 місяців тому ⁺⁴²
Next word prediction is the objective function, but it isn't what the model learns. We don't know what the learned function is, but I can guarantee you it isn't log-odds.
@ericvantassell6809 5 місяців тому
croissants .vs. yogurt
@Lolleka 5 місяців тому ⁺¹³
At the end of the day, the transformer is just a kind of modern Hopfield network. It stores patterns, it retrieves patterns. It's the chinese room argument all over again.
@memegazer 5 місяців тому ⁺⁴
@@Lolleka
Not really.
You can point to rules and say "rules can't be intelligent or reason"
But when it is the NN that makes those rules, and the humans in the loop are not certain enough what they are to prevent hallucination or prevent the alignment problem then that is not the chinese room anymore.
@xt-89907 5 місяців тому ⁺⁴
Research around mechanistic interpretability is starting to show that TLLMs tend to learn some causal circuits and some memorization circuits (I.e., grokking). So they are able to learn some reasoning algorithms but there’s no guarantee of it. Plus, sequence modeling is weak on some kinds of graph algorithms necessary for certain classes of logical reasoning algorithms
@synthclub 5 місяців тому ⁺¹
@@memegazer not hotdog, hotdog!
@rrathore01 5 місяців тому ⁺¹⁵
Great interview!! Some of the examples given in this interview which provides evidence that llms are not learning the underlying logic , colored block , 4*4 matrix multiplication, chain of thoughts issues.
Best quote: i need to teach llms how to fish 1 fish and then how to fish 2 fish and fish 3 fish and so on and it would still fail on task of how to fish "N" fish for N> n it has not seen before
@XOPOIIIO 5 місяців тому ⁺⁴
Reasoning requires loop thinking, to sort through the same thoughts from different angles, NNs are linear, they have input, output and just a few layers between them, their result is akin to intuition, not reasoning. That's why they give better results if you simulate loop thinking by feeding it's result to itself to create reasoning-like step-by-step process.
@jakobwachter5181 4 місяці тому ⁺²
Rao is wonderful, I got the chance to briefly chat with him in Vancouver at the last AAAI. He's loud about the limitations of LLMs and does a good job of talking to the layman. Keep it up, loving the interviews you put out!
@johnheywood1043 3 місяці тому ⁺¹
Best conversation on AI that I've been able to follow (not being a PhD in CS).
@pranavmarla 2 місяці тому ⁺²
I come back to this podcast every 2 weeks. Absolutely brilliant!
@swarnavasamanta2628 5 місяців тому ⁺⁴
The feeling of understanding is different from the algorithm of understanding that's being executed in your brain. The feeling of something is created by consciousness while that something might already be going on in your brain. Here's a quick thought experiment: Try adding two numbers in your mind, and you can easily do it and get an answer. Not only that, but you have a feeling of the understanding of the addition algorithm in your head. You know how it works and you are aware of it being executed and the steps you're performing in real time. But imagine if you did not have this awareness/consciousness of this algorithm in your head. That's how LLMs can be thought of, they have an algorithm and it executes and outputs an answer but they are not aware of the algorithm itself or it is being performed and neither have any agency over it. Doing something and perception that you are doing something is completely different.
@prasammehta1546 4 місяці тому ⁺¹
Basically they are soulless brain which they actually are :P
@thenautilator661 5 місяців тому ⁺²⁷
Very convincing arguments. Haven't heard it laid out this succinctly and comprehensively yet. I'm sure Yann LeCunn would be in the same camp, but I recall not being persuaded by LeCunn's arguments when he made them on Lex Fridman
@edzehoo 5 місяців тому ⁺¹⁰
Basically there's a whole bunch of "scientists and researchers" that don't like to admit the AGI battle is being won (slowly but surely) by the tech bros led by Ilya and Amodei. AI is a 50-year old field dominated in the past by old men, and is now going through recent breakthroughs made by 30 year olds, so don't be surprised that there's a whole lot of ego at play to douse cold water on significant achievements.
@bharatbheesetti1920 5 місяців тому ⁺⁸
Do you have a response to Kambhampati's refutation of the Sparks of AGI claim? @edzehoo
@kman_34 5 місяців тому ⁺¹¹
@@edzehooI can see this being true, but writing off their points is equally defensive/egotistical
@JD-jl4yy 4 місяці тому
@@edzehoo Yep.
@jakobwachter5181 4 місяці тому ⁺⁴
@@edzehoo Ilya and Amodei are 37 and 41 respectively, I wouldn't call them "young", per se. Research on AI in academia is getting outpaced by industry, and only capital rivalling industry can generate the resources necessary to train the largest of models, but academics young and old are continuously outputting content of higher quality than most industry research departments. It's not just ego, it's knowing when something is real and when it is smoke and mirrors.
@whiteycat615 5 місяців тому ⁺⁷
Fantastic discussion! Fantastic guy! Thank you
@oscarmoxon 5 місяців тому ⁺¹⁷
There's a difference between in-distribution reasoning and out-of-distribution reasoning. If you can make the distribution powerful enough, you can still advance research with neural models.
@SurfCatten 5 місяців тому ⁺³
Absolutely true. As an example I tested its ability to do rotation ciphers myself and it performed flawlessly. Obviously the reasoning and logic to do these translations was added to its training data since that paper was released.
@PrinceCyborg 4 місяці тому
Easy, it’s all about prompting. Try this prompt with the Planbench test: Base on methodical analysis of the given data, without making unfounded assumptions. Avoid unfounded assumptions this is very important that you avoid unfounded assumptions, and base your reasoning directly on what you read/ see word for word rather than relying on training data which could introduce bias, Always prioritize explicitly stated information over deductions
Be cautious of overthinking or adding unnecessary complexity to problems
Question initial assumptions. Remember the importance of sticking to the given facts and not letting preconceived notions or pattern recognition override explicit information. Consider ALL provided information equally.
re-check the reasoning against each piece of information before concluding.
@timcarmichael 5 місяців тому ⁺¹⁶
Have we yet defined intelligence sufficiently well that we can appraise it and identify it hallmarks in machines?
@stevengill1736 5 місяців тому
I think if we qualify the definition of intelligence as including reasoning, then yes.
I'd rather use the term sentience - now artificial sentience...that would be something!
@benbridgwater6479 4 місяці тому
@@johan.j.bergman Sure, but that's a bit like saying that we don't need to understand aerodynamics or lift to evaluate airplanes, and can just judge them on their utility and ability to fly ... which isn't entirely unreasonable if you are ok leaving airplane design up to chance and just stumbling across better working ones once in a while (much as the transformer architecture was really a bit of an accidental discovery as far as intelligence goes).
However, if we want to actively pursue AGI and more intelligent systems, then it really is necessary to understand intelligence (which will provide a definition) so that we can actively design it in and improve upon it. I think there is actually quite a core of agreement among many people as what the basis of intelligence is - just no consensus on a pithy definition.
@jakobwachter5181 4 місяці тому
@@johan.j.bergman A spatula serves a helpful purpose that no other cooking tool is able to replace in my kitchen, so I find it incredibly useful. Turns out they are rather mass market too. Should I call my spatula intelligent?
@Cammymoop 4 місяці тому
no
@rey82rey82 4 місяці тому
The ability to reason
@JurekOK 5 місяців тому ⁺⁴
29:38 this is an actually breakthrough idea addressing a burning problem, that should be discussed more!
@aitheignis 5 місяців тому ⁺¹⁵
I love this episode. In science, it's never about what can be done or what happen in the system, but it's always about mechanism that lead to the event (how the event happen basically). What is severely missing from all the LLMs talk today is the talk about underlying mechanism. The work on mechanism is the key piece that will move all of these deep neural network works from engineering feat to actual science. To know mechanism, is to know causality.
@stevengill1736 5 місяців тому ⁺²
...yet they often talk about LLM mechanism as a "black box", to some extent insoluble...
@Thierry-in-Londinium 4 місяці тому ⁺¹
This professor is clearly 1 of the leaders in his field. When you reflect & dissect what he is sharing. It stands scrutiny!
@shyama5612 5 місяців тому ⁺¹
Sara Hooker said the same about us not fully understanding what is used in training - the low frequency data and memorization of those being interpreted as generalization or reasoning. Good interview.
@Paplu-i5t 5 місяців тому ⁺¹
This discussion makes it totally clear about what we can expect from the LLMs, and the irrefutable reasons for it.
@PrinceCyborg 4 місяці тому ⁺¹
Easy, it’s all about prompting. Try this prompt with the Planbench test: Base on methodical analysis of the given data, without making unfounded assumptions. Avoid unfounded assumptions this is very important that you avoid unfounded assumptions, and base your reasoning directly on what you read/ see word for word rather than relying on training data which could introduce bias, Always prioritize explicitly stated information over deductions
Be cautious of overthinking or adding unnecessary complexity to problems
Question initial assumptions. Remember the importance of sticking to the given facts and not letting preconceived notions or pattern recognition override explicit information. Consider ALL provided information equally.
re-check the reasoning against each piece of information before concluding.
@HoriaCristescu 4 місяці тому ⁺²
What you should consider is the environment-agent system, not the model in isolation. Focusing on models is a bad direction to take, it makes us blind to the process of external search and exploration, without which we cannot talk about intelligence and reasoning. The scientific method we use also has a very important experimental validation step, not even humans could reason or be creative absent environment.
@snarkyboojum 5 місяців тому ⁺⁹
Great conversation. I disagree that LLMs are good for idea generation. In my experience, they're good at replaying ideas back to you that are largely derivative (based on the data they've been trained over). The truly 'inductive leaps' as the Professor put it, aren't there in my interaction with LLMs. I use them as a workhorse for doing grunt work with ideas I propose and even then I find them lacking in attention to detail. There's a very narrow range they can work reliably in, and once you go outside that range, they hallucinate or provide sub-standard (compared to human) responses.
I think the idea that we're co-creating with LLMs is an interesting one that most people haven't considered - there's a kind of symbiosis where we use the model and build artefacts that future models are then trained on. This feedback loop across how we use LLMs as tools is interesting. That's the way they currently improve. It's a symbiotic relationship - but humans are currently providing the majority of the "intelligence", if not all of it, in this process.
@larsfaye292 5 місяців тому ⁺²
What a fantastic and succinct response! My experience has been _exactly_ the same.
@sangouda1645 4 місяці тому
That's exactly it, they start to really act as good creative partner at Nth iteration after explaining to it back and forth by giving feedback, but once it gets the hang of it, really acts like a student wanting get good score from a teacher :)
@notHere132 4 місяці тому
We need an entirely new model for AI to achieve true reasoning capability.
@KRGruner 4 місяці тому
Great stuff! ACTUAL non-hype commentary on AI and LLMs. I am familiar with Chollet and ARC, so no big surprises here but still, very well explained.
@sofoboachie5221 4 місяці тому
This probably the best episode I have watched here and I watch this channel as a podcast. Fantastic guest
@prabhdeepsingh5642 4 місяці тому ⁺¹
Leaving the debate of reasoning aside, this discussion was a damn good one. Learned a lot. Dont miss out on this one due to some negative comments. Its worth your time.
@vishalrajput9856 5 місяців тому ⁺²
I love Rao's work and he's funny too.
@scottmiller2591 4 місяці тому ⁺¹
Good take on LLMs and not anthropomorphizing them. I do think there is an element of "What I do is hard, what others do is easy" to the applications of LLMs in creativity vs. validation, however.
@CoreyChambersLA 4 місяці тому ⁺¹
ChatGPT simulates reasoning surprisingly well using its large language model for pattern recognition and prediction.
@annette4718 4 місяці тому ⁺¹
This is a very refreshing episode. Lots of complex topics synthesized into easily digestible insights
@Redx3257 5 місяців тому ⁺⁴
Yea this man is brilliant. I could just listen to him all day.
@yafz 4 місяці тому ⁺¹
Excellent, in-depth interview! Thanks a lot!
@JohnChampagne Місяць тому
I accidentally typed using a standard (Qwerty) keyboard, rather than Dvorak, so I asked GPT to convert the text. It was beyond its ability, (more than a year ago, I think).
Qwerty was made to be intentionally slow, to accommodate the mechanical devices that would tend to jam if typists went too fast. We should change outmoded patterns of behavior. After about eight hours of using Dvorak, you will match, then exceed your speed on the Qwerty keyboard.
@fedkhepri 4 місяці тому
This is the first time I'm seeing either of the two people in the video, and I'm hooked. Lots of hard-punching and salients points to be gotten from the guest, and kudos to the interviewer for steering the discussion.
@swarupchandra1333 4 місяці тому
One of the best explanations I have come across
@weftw1se 5 місяців тому ⁺²⁹
Disappointing to see so much cope from the LLM fans in the comments. Expected, but still sad.
@yeleti 4 місяці тому
They are rather AGI hopefuls. Who's not a fan of LLMs including the Prof ;)
@weftw1se 4 місяці тому
@@yeleti yeah, I think they are very interesting / useful but I doubt they will get to AGI with scaling alone.
@Paplu-i5t 5 місяців тому ⁺⁶
Such a sharp mind of a senior man.
@luke.perkin.online 4 місяці тому ⁺¹
Great episode and fantastic list of papers in the description!
@GarthBuxton 5 місяців тому ⁺³
Great work, thank you.
@ACAndersen 4 місяці тому ⁺¹
His argument is that if you change the labels in classical reasoning tests the LLM fails to reason. I tested GPT 4 on the transitive property, with the following made up prompt: "Komas brisms Fokia, and Fokia brisms Posisos, does Komas brism Posisos? To brism means to contain." After some deliberation it concluded that yes, the statement holds true. Thus there is some reasoning there.
@hashp7625 4 місяці тому ⁺²
How did you test his primary point on this topic - that the GPT 4 training data is so large that it has been trained on common statements like this and that answering true is a likely distribution?
@rolandhochmuth1951 4 місяці тому ⁺²
I originally tried, Let a = b and b = c, does a = c? The response cited the transitive property and said yes, a = c. So, then I tried something like, Let asdf = hjkl and let hjkl = vbnm, does asdf = vbnm to try and eliminate a potential reference to a description of the transitive property that uses a, b, and c as variable names? The response was not able to identity the transitive property in this case and didn't know how to treat the = sign. I then modified the prompt after a few iterations too, Let asdf, hjkl and vbnm be variables. Let = be equality. Given asdf = hjkl and hjkl = vbnm, does asdf = vbnm? The response was:
You're getting closer! Now we have a clearer setup.
Since we've defined "=" as equality and the variables are unspecified, we can indeed apply the transitive property.
Yes, given asdf = hjkl and hjkl = vbnm, then asdf = vbnm.
The transitive property holds true in this case because we're dealing with standard equality.
@rolandhochmuth1951 4 місяці тому
Extending this model a bit more. Let asdf, hjkl, vbnm and zxcv be variables. Let = be equality. Given asdf = hjkl, hjkl = vbnm, and vbnm = zxcv, does asdf = xxcv?
Model
2.5s
Yes, you're absolutely right!
Following the transitive property:
asdf = hjkl
hjkl = vbnm
vbnm = zxcv
Therefore:
asdf = zxcv
Let me know if you have any other logic puzzles you'd like to try!
@willd1mindmind639 4 місяці тому ⁺¹
Reasoning in humans is about using abstractions or general understanding of concepts to arrive at a result. A perfect example is math problems. Most humans use shortcuts to solve math calculations which can be a form of reasoning. In a computing sense, reasoning would be calculating a math answer without using the ALU (Arithmetic logic circuts on the CPU). In a GPT context it would mean arriving at a result without having the answer (and question) already in the training distribution set. So for example, a human using reasoning can add two plus two as follows: 2 is a number representing a quantity of items in a set that can be counted. So 2 plus 2 becomes 1, 2, 3, 4 (counting up 2 places and then counting up 2 more places with 4 being the answer. Something like that is not possible on a CPU . And ChatGPT would also not be able to do that either because it wouldn't be able to generalize that idea of counting to any kind of addition of 2 numbers. If it could, without every combination of numbers written out using the counting method in its training data (or distribution), then it would be reasoning.
@virajsheth8417 2 місяці тому
Not just counting, it can do even summation step by step like humans. So basically you're wrong. Look at my other comment.
@virajsheth8417 2 місяці тому
To solve using traditional addition, we'll add the numbers digit by digit from right to left, carrying over when necessary.
Step-by-Step Calculation:
1. Units Place:
Write down 6, carry over 1.
2. Tens Place:
Write down 2, no carry over.
3. Hundreds Place:
Write down 5, carry over 1.
4. Thousands Place:
Write down 9, no carry over.
5. Ten-Thousands Place:
Write down 6, carry over 1.
6. Hundred-Thousands Place:
Write down 1, carry over 1.
7. Millions Place:
Write down 2, carry over 1.
8. Carry Over:
Since there's an extra 1 carried over, we place it at the next leftmost position.
Final Result:
Answer: 12,169,526
@willd1mindmind639 2 місяці тому
@@virajsheth8417 You are missing the point. Numbers are symbols that represent concepts and because of that have various ways the human mind can use those concepts to solve problems. It is that ability to explore concepts and apply them in a novel fashion is what is called reasoning. Your example is not "reasoning", as opposed to more of a "step by step" approach which is the most common pattern that exists to solve any particular mathematical problem. Which implies that those steps are easily found the training data and model distribution so of course that is what the LLM is using. Because what you described is the typical way math is taught in grade school.
It in no way shape or form implies understanding fundamental math concepts and using those concepts in an ad hoc fashion to solve any problem. Ad hoc in this context would mean using a pattern not found explicitly in the distribution of the language model. The point you missed is that numbers being symbols that in themselves represent quantities of individual discrete elements is an abstract concept. And the ability to apply that kind of abstract understanding to solving or coming up with approaches to solve math problems is unique to humans, because that is how math came about in the first place.
Another example of human reasoning: You can add two numbers such as 144 to 457, by simply taking each column and add them up with place value separately and then add the sums of the columns, without the need to calculate a remainder. Which results in: 500 + 90 + 11 = 601 or (5 x 100) + ( 9 x 10) + (11 x 1). It is not a common way of doing addition is my point and not something one would expect an LLM to come up with, unless of course you prompted it to do so and even then it may not come up with that exact same approach unless it is found in the training data.
At the end of the day, what this is about is not "reasoning" as opposed to explaining how the LLM came up with an answer to a math problem. And having these AI algorithms be able to explain how it came to a answer has been something that has been requested for quite a while. But it is not "reasoning" in the sense of coming up with unique or novel approaches outside of the training examples based purely on understanding of underlying concepts.
@techchanx 4 місяці тому ⁺¹
I agree fully with the points here. LLMs are good at "creative" side of language and media, though its not really the same creativity as humans. However its best to use that capability of LLMs to construct responses in an acceptable manner, while the actual data is coming from authoritative sources and the metrics coming from reliable calculations based on formulas, calculators or rule engines.
Btw, I have given below a better written professional version of my above post, courtesy Google Gemini. I could not have said it any better.
I concur with the assessment presented. Large language models (LLMs) excel at generating creative language and media, albeit distinct from human creativity. Leveraging this capability, LLMs can effectively construct responses in an appropriate manner, while sourcing data from authoritative references and deriving metrics from reliable calculations based on formulas, calculators, or rule engines. This approach optimizes the strengths of both LLMs and traditional information systems for a comprehensive and accurate solution.
@Neomadra 5 місяців тому ⁺²⁴
LLMs definitely can do transitive closure. Not sure why the guest stated otherwise. I tried it out with completely random strings as object names and Claude could do it easily. So it's not just retrieving information.
@autingo6583 5 місяців тому ⁺⁵
this is supposed to be science. i hate it so much when people who call themselves researchers do not really care for thoroughness, or even straight out lie. don't let them get away with it.
@jeremyh2083 5 місяців тому ⁺¹²
It struggles with it if you create something it’s never seen before. It’s a valid point on his part.
@st3ppenwolf 5 місяців тому ⁺⁶
transitive closures can be done from memory. It's been shown these models perform bad with novel data, so he has a point still
@SurfCatten 5 місяців тому ⁺³
And it was also able to do a rotation cipher of any arbitrary length when I just tested it. There are definite limitations but what they can do is far more complex than simply repeating what's in the training data. I made a separate post but I just wanted to add on here that it can also do other things that he specifically said it can't.
@gen-z-india 4 місяці тому
Ok, everything they speak is guess work, and it will be so until deep learning is there.
@jeremyh2083 5 місяців тому ⁺⁸
Those people who are assuming the AGI is going to be achieved have never done long-term work inside any of the major GPT systems if you want to have a quick and dirty test, tell it to create you a fiction book first make 15 chapters and 10 sections with each chapter And then have it start writing that book look at it in detail and you will see section after section it loses sight of essentially every detail. It does a better job if you are working inside the universe, another author has already made and does the worst job if you were creating a brand new universe, even if you have it define the universe.
@mattwesney 5 місяців тому
sounds like youre bad at prompting
@jeremyh2083 5 місяців тому
@@mattwesney lol it does, doesn’t it, but you haven’t tried it and I have.
@phiarchitect 5 місяців тому ⁺²
what a wonderfully exuberant person
@JG27Korny 5 місяців тому ⁺¹
I think there is broad misconception. LLMs are LLMs they are not AGI (artificial general intelligence).
Each AI has a world model. If the question fits the world model it will work. It is like asking a chess ai engine to play checkers.
That is why multimodal models are the big thing as they train not just on corpus of texts and on images too. So those visually trained AI models will solve the stacking problem on minute 19:00.
It is not that chatgpt does not reason. It reasons but not as a human does.
@MateusCavalcanteFonseca 4 місяці тому
Hegel said long time ago that deduction and induction are diferent aspects of the same process, the process of aquiring knlowdge about the world. great talk
@PhilGandFriends 17 днів тому
Wow, this is great, His views needs to get more exposure.
@martin-or-beckycurzee4146 3 місяці тому ⁺¹
Very interesting. Wonder whar mr kambhampati thinks about strawberry? Best for creative use cases? Now maybe, but progress is looking good - better than prof kambhampati was expecting…
@falricthesleeping9717 3 місяці тому ⁺¹
01:01:31
I had to listen to it multiple times, can't focus these days, that section he's specifically talking about chain of thought, and the guy that wrote the paper, he's saying it's one more way of brute-forcing it with more data, the data is just solving stuff with thought-chains, and it's kinda obvious, it's impressive that it can do alot of code-challenges, but almost all of the code challenges have solutions with in-depth explanations after the competition is done, so many people wrote so many things about how to solve them, and open ai said it themselves they fine-tuned the model with the solutions of other participants to increase their accuracy.
even with all of that, given the most complex programming challenges it still fails to keep its consistency given real-world projects, now one way of reading this can be just give it more data, until every possible problem is in the training data and just improve the context window, but the point still stands, they're really not reasoning
@ej3281 4 місяці тому
Very nice to hear from an LLM guy that hasn't lost his mind. He's simply wrong about LLMs being useful for unconstrained idea generation, but as far as his other views go, very enjoyable to watch.
@shizheliang2679 Місяць тому
Is there any references for the part about LLMs cannot detect transitive closure? I would love to see the details
@markplutowski 5 місяців тому ⁺¹
1:31:09 - 1:31:32. “People confuse acting with planning“ . “We shouldn’t leave toddlers alone with a loaded gun.” this is what frightens me : agent based systems let loose in the wild without proper controls. A toddler AI exploring the world, picking up a loaded gun and pulling the trigger.
@markplutowski 5 місяців тому ⁺⁸
if the title says “people don’t reason” many viewers think it makes the strong claim “ALL people don’t reason“, when it is actually making the weaker claim “SOME people don’t reason“. that title is factually defensible but misleading. one could be excused for interpreting this title to be claiming “ChatGPT doesn’t reason (at all)“, when it is actually claiming “ChatGPT doesn’t reason (very well)“.
One of the beauties of human language is that the meaning of an utterance derived by the listener depends as much on the deserialization algorithm used by the listener as on the serialization algorithm employed by the speaker. the UA-cam algorithm chose this title because the algorithm “knows” that many viewers assume the stronger claim.
nonetheless, be that as it may, this was a wonderful interview. many gems of insight on multiple levels ; including historical, which I enjoyed. I especially liked your displaying the title page of an article that was mentioned. looking forward to someone publishing “Alpha reasoning: no tokens required“.
I would watch again.
@阳明子 5 місяців тому ⁺²
Professor Kambhampat is making the stronger claim that LLMs do not reason at all.
@markplutowski 5 місяців тому
@@阳明子 1:20:26 "LLMs are great idea generators", which is such an important part of reasoning, he says, that Ramanujan was great largely because he excelled at the ideation phase of reasoning. 16:30 he notes that ChatGPT 4.0 was scored at 30% on a planning task. 1:23:15 he says that LLMs are good for style critiques, therefore for reasoning about matters of style, LLMs can do both ideation and verification.
@阳明子 4 місяці тому
@@markplutowski 3:14 "I think the large language models, they are trained essentially in this autoregressive fashion to be able to complete the next word, you know, guess the next word. These are essentially n-gram models."
11:32 Reasoning VS Retrieval
17:30 Changing predicate names in the block problem completeley confuses the LLMs
32:53 "So despite what the tenor of our conversation until now, I actually think LLMs are brilliant. It's just the brilliant for what they can do. And just I don't complain that they can't do reason, use them for what they are good at, which is unconstrained idea generation."
@markplutowski 4 місяці тому ⁺¹
@@阳明子 Ok, I see it now. I originally misinterpreted his use of a double-negative there where he says "And just I don't complain that they can't do reason".
That said, he contradicts himself by admitting that they can do a very limited type of reasoning (about matters of style), and are weakly capable of planning (which is considered by many as a type of reasoning, although he seems to disagree with that), and can be used for an important component of reasoning (ideation).
But yeah, I see now that you are correct - even though there are these contradictions he is indeed claiming "that they can't do reason".
@davidcummins8125 5 місяців тому ⁺¹
Could an LLM for example figure out whether a request requires a planner, a math engine etc, transform the request into the appropriate format, use the appropriate tool, and then transform the results for the user? I think that LLMs provide a good combination of UI and knowledge base. I was suspicious myself that in the web data they may well have seen joke explanations, movie reviews, etc etc and can lean on that. I think that LLMs can do better, but it requires memory and a feedback loop in the same way that embodied creatures have.
@PaoloCaminiti-b5c 4 місяці тому ⁺³
I'm very skeptic of this, Aristotele inferred logic by looking at rhetoric arguments, LLMs could being extracting those features already while building their model to compress the corpus of data and this seems equivalent to propositional logic. It seems this researcher is pushing too much the accent on agents needing to be able of mathematical proof, which utlity in agents - including humans - is not well stated.
@virajsheth8417 2 місяці тому
Absolutely agreed!
@DataJuggler 5 місяців тому ⁺²
0:18 When I was 4 years old, I was often stuck at my parents work. The only thing for me to do that was entertaining, was play with calculators or adding machines. I memorized the times table, because I played with calculators a lot. My parents would spend $8 at the drug store to keep me from asking why is the sky blue and other pertinent questions. I was offered to skip first grade by after kindergarten, and my parents said no. Jeff Bezos is the same age from me, and also from Houston. His parents said yes to skipping first grade. I told my parents this forever until they died.
@tylermoore4429 4 місяці тому
This analysis is of LLM's as a static thing, but the field is evolving. Neurosymbolic approaches are coming, a couple of these are already out there in the real world (MindCorp's Cognition and Verses AI).
@alexandermoody1946 5 місяців тому
Not all manhole covers are round.
The square manhole covers that have a two piece triangular tapered construction are really heavy.
@VanCliefMedia 4 місяці тому
I would love to see his interpretation of the most recent gpt4 release with the structured output and creating reasoning through that output
@Drone256 17 днів тому
GPT 4o does the ROT13 cipher just fine if you ask it to move ahead 4 characters, etc. How did they get this to work?
@hayekianman 4 місяці тому ⁺²
the caesar cipher thing is already working for any n for claude 3.5. so donno
@benbridgwater6479 4 місяці тому
Sure - different data set. It may be easy to fix failures like this by adding corresponding training data, but this "whack-a-mole" approach to reasoning isn't a general solution. The number of questions/problems one could pose of a person or LLM is practically infinite, so the models need to be able to figure answers for themselves.
@januszinvest3769 3 місяці тому
@@benbridgwater6479so please give one example that shows clearly that LLMs can't reason
@Tititototo 4 місяці тому
Pertinently pinpointed, one killed 'the beast'. LLMs are just wonderful 'bibliothèques vivantes', quite great tools that save time by ignoring any educated protocols
@johnkost2514 5 місяців тому
This aligns nicely with the work Fabrice Bellard has been doing using Transformers to achieve SOTA lossless compression in his NNCP algorithm.
Coincidence .. I think not!
@maxbuckley1246 4 місяці тому
Which paper is referred to at 1:03:51 when multiplication with four digit numbers is discussed?
@MachineLearningStreetTalk 4 місяці тому
Faith and Fate: Limits of Transformers on Compositionality "finetuning multiplication with four digit numbers"
arxiv.org/pdf/2305.18654
@therainman7777 4 місяці тому ⁺¹
We already do have a snapshot of the current web. And snapshots for every day prior. It’s the wayback machine.
@anuragshas 4 місяці тому ⁺¹
On the Dangers of Stochastic Parrots paper still holds true
@lystic9392 4 місяці тому
I think I have a way to allow almost any model to 'reason'. Or to use reasoning, anyway.
@hartmut-a9dt 5 місяців тому ⁺¹
great interview !
@Jukau 5 місяців тому ⁺³
what is the bombshell? This is absolutly clear and known...it would be a bombshell if it would
@MachineLearningStreetTalk 5 місяців тому ⁺²
Read the comments section here, I wish it was clear and known. It's subtle and requires a fair bit of CS knowledge to grok unfortunately.
@SurfCatten 5 місяців тому ⁺¹¹
Claude just deciphered a random biography, in rotation cipher, for me. All I told him was that it was a Caesar cipher and then gave him the text. I didn't tell him how many letters it was shifted or rotated by and I didn't use rot13. I tried it three times with three different shift values and it translated it perfectly each time. There's no way that Claude has memorized every single piece of information on the internet in cipher form. Don't know if it's "reasoning" but it is certainly applying some procedure to translate this that is more than just memorization or retrieval. ChatGPT also did it but it had some errors.
Instead of criticizing other scientists for being fooled and not being analytical enough maybe you should check your own biases.
I have found it true that it can't do logic when a similar logic problem was not in its training data but it definitely can generalize even when very different words are used.
@SurfCatten 5 місяців тому ⁺⁵
@@quasarsupernova9643 That's a perfectly reasonable statement. However my comment was that I tested it and it did generalize and use logic to solve a cipher that the speaker just said it could not do unless it had memorized it, which is impossible if you think about it. The amount of information contained in these models is infinitesimal compared to that used their training data. The idea that it can explain all jokes simply because it read a website explaining them is so simplistic compared to the way that LLMs operate that it's absurd. Or that it can translate a particular cipher because it read a website containing every single possible word in English translated into cypher using ROT13. So I tested it specifically not using ROT 13 and using an obscure long biography with lots of non-English names etc. and it had no problem, not only identifying the specific shift used in the cipher, but then applying it.
@Neomadra 5 місяців тому ⁺³
It could also be that they used synthetic data to train the model specific for this task. For this specific task creating synthetic data is trivial. Unfortunately no of the major players reveal the training data they use so it's hard to know when a model truly generalizes. That said, I tested the transitive closure task and using completely random strings as objects it nailed it with ease. So at least it has learned a template to solve unseen problems, which I consider at least a weak form of reasoning.
@graham2409 Місяць тому
@@Neomadra Exactly. To say it is purely regurgitating memorized content is just unreasonable at this juncture. That it has a weak form of reasoning is true though. Within the (very wide) domain of toys in the toybox it played with during training, it can do amazing things. When you hand it a new toy, it's not that it has *no* reasoning capability - it's just surprisingly limited compared to its versatility with its familiar toys.
Also, I did the same tests with ciphers... didn't use rot13, and I even used complete nonsense sentences, and then even complete nonsense arrangements of letters that don't translate to real words at all. I even just threw it in there with zero context. I was testing with o1-mini, and it completely nailed every test.
@akaalkripal5724 4 місяці тому ⁺¹²
I'm not sure why so many LLM fans are choosing to attack the Professor, when all he's doing is pointing out huge shortcomings, and hinting at what could be real limitations, no matter the scale.
@alansalinas2097 4 місяці тому ⁺²
Because they don't want the professor to be right?
@therainman7777 4 місяці тому
I haven’t seen anyone attacking him. Do you mean in the comments to this video or elsewhere?
@plasticmadedream 5 місяців тому ⁺²
A new bombshell has entered the villa
@lauralhardy5450 2 місяці тому
At 38:45 Prof Kambhampati says LLMs are good at generating ideas but not doing the hard yards of generating the proofs. He gives the examples of Fermat and Ramanujan producing far reaching conclusions but not having the proofs available.
Would not the fact that humans who generated the ideas have some inkling that the end results were possible ?
If LLMs produced conclusions or ideas without proofs how do they reach that deductive end point to generate them ?
The bigger view is that if we are swamped with numerous ideas without paths to prove them, that's just garbage.
@luisluiscunha 4 місяці тому
Maybe in the beginning, with Yannick, these talks were properly named "Street Talk". They are more and more Library of the Ivory Tower talks, full of deep "philosophical" discussions that I believe will be considered all pointless. I love the way Heinz Pagels described how the Dalai Lama avoided entering into arguments of this kind about AI. When asked his opinion about a system he could talk as to a person, he just said "sit that system in front of me, on this table, then we can continue this talk". This was in the 80s. Even to be profoundly philosophical you can think in a very simple and clear way. It is a way of thinking epistemologically most compatible with Engineering, that ultimately is where productive cognitive energy should be spent.
@PhysicalMath 5 місяців тому
I've been working with it for a year. It can't reason. It still forgets how to do things it's done before more than once.
@sonOfLiberty100 4 місяці тому ⁺¹
Wow a good one thank you
@pallharaldsson9015 4 місяці тому
16:44 "150% accuracy [of some sort]"? It's a great interview with the professor (the rest of it good), who knows a lot, good to know we can all do such mistakes...
@benbridgwater6479 4 місяці тому
I processed it as dry humor - unwarranted extrapolation from current performance of 30%. to "GPT 5" at 70%. to "GPT 10" at 150%. Of course he might have just mis-spoke. Who knows.
@kangaroomax8198 Місяць тому
It’s a joke, obviously. He is making fun of people extrapolating.
@MrBillythefisherman 4 місяці тому
The prof seems to be saying that we do something different when we reason to when we recall. Is there any evidence from the processes or structure of the brain that this is the case? It always seems as if people are saying they know how the human brain works when to my knowledge at least we haven't really a clue, more than neurons fire signals to other neurons via synapses and that we have dedicated parts of the brain for certain functions.
@ntesla66 4 місяці тому
So at 1:12:00 , is he alluding to computational complexity and the overall informational entropy of the calculation? In a sense the fact that all calculations take the same time then it would be in violation of the second law of thermodynamics to assume that a rational calculation took place?
@benbridgwater6479 4 місяці тому ⁺¹
I think he's really just pointing out one well known shortcoming of LLMs in that they do a fixed amount of compute per token regardless of the question, when preferably they should really put in the amount of compute/steps necessary to answer the question (no more, no less). The "magical thinking" is that if you just force them to spend more compute by padding the input (more tokens = more compute) then you'll get a better answer!
@ntesla66 4 місяці тому
@@benbridgwater6479 So the clocking framework is such that there are only synchronous "computes" even within the parallelism and feedback of the architecture? No buffers or "scratchpads" for an indeterminate holding time. The clock strikes one, it computes one? Forgive me if I seem unknowledgeable in these things, it is because I am. My background is FPGA's and dsp.
@benbridgwater6479 4 місяці тому
@@ntesla66 Yes, a transformer is strictly a pass-thru feed forward design - no feedback paths or holding/scratchpad registers. It's actually a stack of replicated transformer layers, each comprised of a "self-attention" block where the model has learnt to to attend to other tokens in the sequence, plus a feed-forward block that probably(!) encodes most of the static world knowledge.
Each transformer layer processes/transforms the a sequence of input tokens (embeddings) in parallel (the efficiency of this parallelism being the driving force behind the architecture), then passes the updated embeddings to the next layer and so on. By the time the sequence reaches the output layer, an end of sequence token that had been appended to the input will have been transformed into the predicted "next word (token)" output.
After each ""next word" has been generated, it is appended to the input sequence which is then fed into the transformer, and the the process starts over from scratch, with nothing retained from before, other than some internal "K-V caching" that is purely an optimization.
In our brain (cortex) we have massive amounts of top down feedback connections which allow prediction errors to be derived which are the basis of our learning mechanism. In a transformer the learning all takes place at training time using gradient descent ("back propagation") which also propagates prediction errors down through the model, but purely as a weight update mechanism, without there actually being any feedpack paths in the model itself.
@TastyGarlicBread 5 місяців тому ⁺⁶
There are a lot of people in this comment section who probably can't even do basic sums, let alone understand how a large language model works. And yet, they are very happy to criticize.
We are indeed living in an Idiocracy.
@siddharth-gandhi 5 місяців тому ⁺²
Hi! Brilliant video! Much to think about after listening to hyper scalers for weeks. One request, can you please cut on the clickbait titles? I know you said for YT algo but if I want to share this video with say PhD, MS or profs, no one takes a new channel seriously with titles like this one (just feels clickbaity for a genuinely good video). Let the content speak for itself. Thanks!
@MachineLearningStreetTalk 5 місяців тому ⁺³
I am really sorry about this, we will change it to something more academic when the views settle down. I’ve just accepted it as a fact of youtube at this point. We still use a nice thumbnail photo without garish titles (which I personally find more egregious)
@siddharth-gandhi 5 місяців тому ⁺¹
@@MachineLearningStreetTalk Thanks for understanding! 😁
@jonathanjonathansen 4 місяці тому ⁺¹
Where did Keith go?
@MachineLearningStreetTalk 4 місяці тому ⁺¹
He's still with us! We are filming most stuff in person these days, there is a bunch in the backlog with him in coming out - and he joins our Patreon calls every 2 weeks
@wtfatc4556 5 місяців тому ⁺³
Gpt is like a reactive mega wikipedia....
@jimbarchuk 5 місяців тому ⁺¹
I have to stop to ask if '150% accuracy' is an actual thing in LLM/GPT? Or other weird number things that I'll have to go read. Keywords?
@dankprole7884 5 місяців тому
I think it was more a passing observation that measuring accuracy in LLM responses in the way we do for numeric models is kind of not possible.
@benbridgwater6479 4 місяці тому
I think he was just being facetious about unwarranted trend extrapolation
@idck5531 4 місяці тому
Possible LLMs do not reason, but they sure are very helpful for coding. You can combine and generate code easily and advance much faster. Writing scripts for my PhD is 10x easier now.
@dylanmenzies3973 3 місяці тому
Reasoning is pattern matching. Verfication is something to be learnt and reinforced - humans are very often wrong, even complex math proofs. Generating possible paths for reasoning is inherently very noisy, and needs a balnce of verification to keep it on ttack.
@life42theuniverse 5 місяців тому
The most likely response to logical questions is logical answers.
@HoriaCristescu 4 місяці тому
If you look at the human brain, any neuron taken in isolation doesn't "understand" or "reason". By induction we could be tempted to say the brain doesn't understand or reason, but we know that to be wrong. Similarly, AI models are made of simple components that don't understand but when we consider the whole data loop, they can develop their own semantics.
@notHere132 4 місяці тому ⁺⁵
I use ChatGPT every day. It does not reason. It's unbelievably dumb, and sometimes I have trouble determining whether it's trying to deceive me or just unfathomably stupid. Still useful for quickly solving problems someone else has already solved, and that's why I continue using it.
@simpleidea2825 4 місяці тому
When the first stream engine were tested, people were saying that it was a ghost. But now you see...

Наступне

Автоматичне відтворення

It's Not About Scale, It's About Abstraction