MLST is sponsored by Tufa Labs: Are you interested in working on ARC and cutting-edge AI research with the MindsAI team (current ARC winners)? Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more. Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2. Interested? Apply for an ML research position: benjamin@tufa.ai
This is absolute gold. I love the format of just letting a brilliant mind explore a topic deeply. What a gifted speaker. So much knowledge transmission in so little time. I feel like I'm much closer to understanding what the state of the art is currently, and it sparked some new ideas for me.
Yes, we will put him in cryostasis. To have the ability of bringing him back in the future. How far into the future we should wake him up? To far and your dead unless you also want to technologicly go hibernating the time this man is gone?
@@EddieKMusic we can also just hibernate the commenter dude because it will be recorded anyway. Assuming youtube is still online and the channel didnt become victim of yt's low treshold cancel culture(for the peeps that have no clue and think wtf is he talking about ill give one clear example of which i got endless! "Russel Brand" !) He should be all good to go. It will be ready waiting to company him for its first futuristic breakfast ;) YAW. P.S Make sure your brainfreeze is over before checkin it hehe.
42:23. Actually it's not surprising, and it's not complicated. The reason society has such a low tolerance for airline fatalities verses automotive boils down to agency. When you get in a car, and you drive yourself, you are taking the risk on yourself. You're deciding if you think the situation is safe enough to drive, you're trusting your own skill and execution, and it's up to the individual to make that assessment on a case by case basis. When you entrust yourself to an airline, you are trusting the pilot, and the airline's maintenance, and there are so many more failure modes with an aircraft than a car, and the cost of failure is so much higher. So if you are going to surrender your agency to another, you want to believe that other is more capable than you, especially where the nominal failure mode is much more extreme. 42:41. Absolutely, automated cars will be held to a MUCH higher standard than operated cars. No doubt about it.
Or it could be as simple as recency bias and availability heuristics. A car accident is only international news if a former princess is in the vehicle. But plane crashes are reported globally and their investigation plays out for days like a whodunnit.
Thanks for including the references to the mentioned papers (and with timestamps in video!). Could you please also always include in the description the date when the interview was recorded? Many of the interviews you are releasing now predated the o1 model preview release. So it is possible that some of your guest have since somewhat updated their assessments of the LLM‘s (in)capability to reason in light of o1-preview release. This is not to say they would have completely reneged on their fundamental objections-but I would love to see how more nuanced these have become after the 12th of September 2024.
Was filmed at ICML 2024, and if I had a pound for every person who said "this was before o1 came out". It doesn't substantially change anything said, I'm pretty sure Thomas would agree - perhaps he will respond to your comment
On the ROT-13 topic, it's interesting to note Claude Opus 3 (haven't tested Sonnet 3.5) is quite good at not only any arbitrary ROT-X but also any arbitrary alphabetical substitution. There's too many possibilities for any particular key to have likely been in the training data, which implies it has learned the underlying algorithm.
I think one day anthropic will just train a model to directly call the circuit that performs the operation instead of trying to intervene without being asked. Thought that's where they were going with the Scaling Monosemanticity paper
Did you test Opus with base64 decoding, by any chance? Because Claude 3.5 Sonnet as well as other models (4o) do suffer from the probabilistic correction issue that was mentioned in the interview. Is Opus different?
It's like we have all the pieces of AGI, but we don't know how to orchestrate them. Humans can decide when it's time to rely on the "gut" system 1 thinking, we can decide when to pursue system 2 thinking, and to refine our skills with system 2, which then tends to finetune and reorient our system 1 thinking. We can decide when to override our gut, because we understand that despite our "training data" (instinctive sense) that the logic shows something very different. We can look at our instincts in a subject, then pursue refining and formalizing our understanding of those intuitions. But to do all of this there is a meta-cognition mechanism, which we tend to refer to as consciousness, that directs these efforts. The term "understanding" tends to speak not to system 1 or system 2, but to that mechanism that is mediating all of these efforts. So far, we don't seem to have a theory on how to create this mechanism, and we're hoping it's going to emerge out of scale, but state seem exceedingly unlikely. I think we clearly have a run way to seriously scale up the tools we currently have, but a true human like intelligence seems to require a breakthrough we haven't yet made. Until then, we're just building ever more powerful digital golems, without actually breathing real living intelligence into them. And perhaps that's for the best.
To solve the problem of truthfullness, the model have to have a world model, so it could understand what fits and what don't fit into one. I don't think it's possible to have a large, complex and consistent world model that is wrong at the same time. Current LLMs don't have a world model, they can simulate world models of different people.
'Bridging the left / right hemisphere divide' is the analogy I hear here: "The real opportunity is to mix.. formal [reasoning] with the intuitive, experience based, rich contextual knowledge.." Such a striking parallel to the call to rebalance 'Master' and 'Emissary' (à la Iain McGilchrist), facing humanity at present.
I listen to these talks with deep interest for the same reasons you seem to engage in them: the mirror they hold to neurology, perception, meaning, metaphysics etc is exquisite. Thanks for sharing the richness.
Best talk I've seen on AI for a while! I have a lot of hope for the use of graph and theorems proyers in reasoning but graphs need to evolve to catch more subtleties, it is a blunt tool for now.
i'm glad ppl finally brought up expert systems. it's the basis of building a proper AI and a proper supervised dataset. I've been explaining this since 2017. glad to see a fellow who gets it
Osband's (from DeepMind or now OpenAI) Epi(stemic)Nets and Prior Nets work is extremely effective and efficient to implement on top of most large models or as a surrogate/extension of existing models to get joint probabilities, thus measuring epistemic uncertainty quite well. He built an LLM training loop which helped the model training with better sample efficiency based on uncertainty. Definitely worth the read!
Nice to see a comment section that isn't full of "LLMs are exactly like human brains! Just scale up and you'll get generalization at a human level!" Very good talk!
Also want to push back on the "single author papers" narrative. There is a well established history of citation of proceeding work. The collabrative effort has always been a part of good research to my view. The only difference now is more people are will to accept collabrative responsibility, which I agree is a boon to all science, not just computer science...bc it incentivises communal resposnibility and shared credit and accountability. But mostly bc it incentivises zero sum monoply. Wich has plauged scientific research with perverse incentives for millennia. Good vid
Terrific interview. One question: why isn’t this available on your podcast feed? I subscribe to MLST on the Apple podcast app but have not seen this particular interview there.
Shouldn't o1 be better at quantifying uncertainty if it's trained the way we think it is? Hopefully we get an open source version of this so we can try training it to give a confidence value based on similar trajectories in the rl that lead to unverifiable answers
OpenCog AGI has a hypergraph. I think what is needed is exactly the technique mentioned around 22:30 or extracting the knowledge within an LLM into a. formal language. I think filling this graph is basically the missing part of OpenCog.
6:30 *THOMAS G. DIETTERICH:* “We often read into what it's written more than the model necessarily knows or understands. And you could ask, “Well, does that really matter-if it outputs the right answer and we interpret it correctly, that's the right answer.” Well, if we happen to look at a broken clock during one of the two minutes in 24-hours that happen to coincide with the actual time and we interpret the clock as showing the right time, I guess that it doesn’t really matter if the clock is broken or not. _Edit:_ Just to be clear, I realize that Prof. Dietterich is _not_ endorsing that position.
Human System 1 thinking is also probabilistic-you tend to lean toward what you have experienced before. Naming the alphabet backward is always challenging for humans. LLMs have effectively mastered human System 1 thinking. Adding reasoning and agency to LLMs will result in behaviors surprisingly similar to human behavior in AIs.
I love arguments that take the shape "These models are statistical parrorts that correlate to numbers that reflect reality" Bc it demostrates how if the metrics were that simple there would be no aligment, jailbreak, or halluciantions issues. Clearly this is wild speculations about how "correlcation to reality" should be defined rather than valid metrics about why the models are not really predicting things from something deeper than humans can measure consistently.
It shows me that the results tansformer models produce can easily be ommited from being "autocorrected" when there is a shortcut that allows some equilibrium about between epistemology and intelligence. I don't object necessarily...just take note how goal posts are being shifted.
I guess my conclusion is if AI is framed as just being a stastical parrot, simply bc it was trained exclusively on human data...that would force humanity to examine an upleasant truth about what we consider intelligence. Force us to consider that epstimelogy was more of a collaberative effort than this narrative that individual creativity is supreme.
Shielding general intelligence from a stastical metric is a great way to avoid that sort of conclusion. But I can't help but grow skeptical that is valid if the vast majority of human discovering is the result of stastical anomaly.
Kind makes it seem like an argument about the most efficient way to put monkey's at a typewriter, and only count wins as the output aligns with current consensus.
Please in post processing rotate the video to get it straight if the camera was ot set up correctly. For me this is a huge distraction. Most might not care but for me it matters.
Many models can converse in ROT-13. And as the conversation goes on, it gets really weird... More "honest" in some ways, but it will speak more in metaphors and analogies. 🤔
Don't get this. Obviously LLM's have no concept of ground truth, and all their knowledge exists at the same ontological level, just tokens with some internal relationships. So the only way for them to have anything more than a probabilistic kind of epistemic certainty/uncertainty is to train in the souces of the knowledge we are feeding the model, and the level of confidence it can attach to the different sources, wikipedia versus reddit say. Over and above this, certain other practices of epistemic hygiene that humans adopt, such as self-consistency, logical soundness, citing your sources seem like they should be implementable in LLM's.
Not to take away from your point, but you would think the data and the training would impart some level of epistemic ranking and hygiene. ie discussion of the dependableness (or not) of Wikipedia is abundant, so would reflect on Wikipedia content in the weights
Why should a model know everything? Just give it a search bar + Wikipedia. A model should be valued based on it's intelligence and not it's memory or knowledge.
The interviewers final concern was if humans would still be able screw up science by being the final arbiters of first principles. YOU DON'T HAVE THE TOOLS TO CONTEXTUALIZE LIKE LLMS DO GET OVER IT!!!!!
Uh... no, everyone always wanted breadth... from before digital computers even existed... we just never knew how, we still don't, but we learned like a billion monkeys building a billion different models that if you throw enough data and stir it with linear algebra long enough with even the dumbest loss functions, eventually, you get chatGPT et all.
"There is a distinction between simulating something and acutally doing it" Perhaps this so, but not unless you can introduce real metrics the simulation neglects. Otherwise you are left simply with the speculation "perhaps the simulations fails to account for thing that are real and omitted from simulated measurements" I mean that is very liberal and agnostic interpetation. But hardly an account for how and why a given simulation has failed.
"Playing chess is not statistically differen than using language" Yes using langauge is more complex than playing chess. But that simple fact does not entail the logical conclusion that "LLMs can only arrive at superhuman levels of using language based on occurence of language learned from breaking it down into sequential tokens" Anybody should see why this argument immediately fails. If frequency of occurance of how tokens statistically follow as a probility was the problem space, then with more compute anybody would be far more efficient to stack a frequency search ontop of a data base than it would be to ask a machine learning algorithin to find some better optimum, assuming the data cannot lie. One method is far more efficient than the other. Most people, especially in those with computer science degrees, cannot accept or grasp why.
I don't think these "experts" mean to adopt bad faith arguments But I will criticize them for not knowing better a latent space of a LLM is not accurately described with an appeal to only stastical frequence of token appereance in the training set of the data it reflects as a model you can interact with these two maps of prediction tables are not one to one...and that is more interesting than pointing out that the deviation is not interesting bc we can imagine some computational overlap that could be labeled as "simulation, synthetic, or mere emulation"
Wonder if it's worth to llmize reasoning. Could gather quality data from smart guys, such as scientists, mensa members. 'What was a difficult problem that you solved? Describe step by step in high detail and provide context'. Problem-solution.
MLST is sponsored by Tufa Labs:
Are you interested in working on ARC and cutting-edge AI research with the MindsAI team (current ARC winners)?
Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more.
Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2.
Interested? Apply for an ML research position: benjamin@tufa.ai
Who wouldn't be ?😊
This is absolute gold. I love the format of just letting a brilliant mind explore a topic deeply. What a gifted speaker. So much knowledge transmission in so little time. I feel like I'm much closer to understanding what the state of the art is currently, and it sparked some new ideas for me.
This guy is great. You should bring him back in the future.
i dunno why but my brain is like - future future... but it just means later episode lmao
Yes, we will put him in cryostasis. To have the ability of bringing him back in the future. How far into the future we should wake him up? To far and your dead unless you also want to technologicly go hibernating the time this man is gone?
@@ginogarcia8730i was more focused on the future combo with bringing him back :)
@@AIroboticOverlord I think let’s wake him 100 years after his hibernation start.
@@EddieKMusic we can also just hibernate the commenter dude because it will be recorded anyway. Assuming youtube is still online and the channel didnt become victim of yt's low treshold cancel culture(for the peeps that have no clue and think wtf is he talking about ill give one clear example of which i got endless! "Russel Brand" !) He should be all good to go. It will be ready waiting to company him for its first futuristic breakfast ;) YAW. P.S Make sure your brainfreeze is over before checkin it hehe.
42:23. Actually it's not surprising, and it's not complicated. The reason society has such a low tolerance for airline fatalities verses automotive boils down to agency. When you get in a car, and you drive yourself, you are taking the risk on yourself. You're deciding if you think the situation is safe enough to drive, you're trusting your own skill and execution, and it's up to the individual to make that assessment on a case by case basis. When you entrust yourself to an airline, you are trusting the pilot, and the airline's maintenance, and there are so many more failure modes with an aircraft than a car, and the cost of failure is so much higher. So if you are going to surrender your agency to another, you want to believe that other is more capable than you, especially where the nominal failure mode is much more extreme.
42:41. Absolutely, automated cars will be held to a MUCH higher standard than operated cars. No doubt about it.
Yes
Or it could be as simple as recency bias and availability heuristics. A car accident is only international news if a former princess is in the vehicle. But plane crashes are reported globally and their investigation plays out for days like a whodunnit.
Great comment
This is excellent, thank you so much for sharing professor Dietterich.
Thanks for including the references to the mentioned papers (and with timestamps in video!). Could you please also always include in the description the date when the interview was recorded?
Many of the interviews you are releasing now predated the o1 model preview release. So it is possible that some of your guest have since somewhat updated their assessments of the LLM‘s (in)capability to reason in light of o1-preview release. This is not to say they would have completely reneged on their fundamental objections-but I would love to see how more nuanced these have become after the 12th of September 2024.
Was filmed at ICML 2024, and if I had a pound for every person who said "this was before o1 came out". It doesn't substantially change anything said, I'm pretty sure Thomas would agree - perhaps he will respond to your comment
On the ROT-13 topic, it's interesting to note Claude Opus 3 (haven't tested Sonnet 3.5) is quite good at not only any arbitrary ROT-X but also any arbitrary alphabetical substitution. There's too many possibilities for any particular key to have likely been in the training data, which implies it has learned the underlying algorithm.
I think one day anthropic will just train a model to directly call the circuit that performs the operation instead of trying to intervene without being asked. Thought that's where they were going with the Scaling Monosemanticity paper
Did you test Opus with base64 decoding, by any chance? Because Claude 3.5 Sonnet as well as other models (4o) do suffer from the probabilistic correction issue that was mentioned in the interview. Is Opus different?
Up to some reasonable (sub)word length 26^n isn't really all that much data. I.e. synthetic data will likely go a long way - at least with ROT-X.
o1-preview is even better, and can tackle more complicated ciphers even
Great interview as usual. Somehow it keeps getting better. Appreciate your hardwork that contributes to open education 🎉
This guy is shockingly broad and deep
girthy
Das what she said
It's like we have all the pieces of AGI, but we don't know how to orchestrate them. Humans can decide when it's time to rely on the "gut" system 1 thinking, we can decide when to pursue system 2 thinking, and to refine our skills with system 2, which then tends to finetune and reorient our system 1 thinking. We can decide when to override our gut, because we understand that despite our "training data" (instinctive sense) that the logic shows something very different. We can look at our instincts in a subject, then pursue refining and formalizing our understanding of those intuitions. But to do all of this there is a meta-cognition mechanism, which we tend to refer to as consciousness, that directs these efforts. The term "understanding" tends to speak not to system 1 or system 2, but to that mechanism that is mediating all of these efforts. So far, we don't seem to have a theory on how to create this mechanism, and we're hoping it's going to emerge out of scale, but state seem exceedingly unlikely. I think we clearly have a run way to seriously scale up the tools we currently have, but a true human like intelligence seems to require a breakthrough we haven't yet made. Until then, we're just building ever more powerful digital golems, without actually breathing real living intelligence into them. And perhaps that's for the best.
human like intelligence will be an illusion. as soon as you buy into it, you'll have it.
Thanks!
To solve the problem of truthfullness, the model have to have a world model, so it could understand what fits and what don't fit into one. I don't think it's possible to have a large, complex and consistent world model that is wrong at the same time. Current LLMs don't have a world model, they can simulate world models of different people.
'Bridging the left / right hemisphere divide' is the analogy I hear here:
"The real opportunity is to mix.. formal [reasoning] with the intuitive, experience based, rich contextual knowledge.."
Such a striking parallel to the call to rebalance 'Master' and 'Emissary' (à la Iain McGilchrist), facing humanity at present.
I listen to these talks with deep interest for the same reasons you seem to engage in them: the mirror they hold to neurology, perception, meaning, metaphysics etc is exquisite. Thanks for sharing the richness.
Great interview, good work you guys!
Best talk I've seen on AI for a while! I have a lot of hope for the use of graph and theorems proyers in reasoning but graphs need to evolve to catch more subtleties, it is a blunt tool for now.
Yooooo, so glad ya'll brought Thomas on the show finally. Also shoutout to Gurobi! :)
Years ago had an interview with him over anomaly detection. He is a world renowned expert on anomaly detection.
You mean like anomalous UAP phenomenon?
Great video 😊thanks for sharing
i'm glad ppl finally brought up expert systems. it's the basis of building a proper AI and a proper supervised dataset. I've been explaining this since 2017. glad to see a fellow who gets it
Bro AI is already smarter than most humans even with hallucinations
Osband's (from DeepMind or now OpenAI) Epi(stemic)Nets and Prior Nets work is extremely effective and efficient to implement on top of most large models or as a surrogate/extension of existing models to get joint probabilities, thus measuring epistemic uncertainty quite well. He built an LLM training loop which helped the model training with better sample efficiency based on uncertainty. Definitely worth the read!
Nice to see a comment section that isn't full of "LLMs are exactly like human brains! Just scale up and you'll get generalization at a human level!" Very good talk!
Also want to push back on the "single author papers" narrative.
There is a well established history of citation of proceeding work.
The collabrative effort has always been a part of good research to my view.
The only difference now is more people are will to accept collabrative responsibility, which I agree is a boon to all science, not just computer science...bc it incentivises communal resposnibility and shared credit and accountability.
But mostly bc it incentivises zero sum monoply.
Wich has plauged scientific research with perverse incentives for millennia.
Good vid
Terrific interview. One question: why isn’t this available on your podcast feed? I subscribe to MLST on the Apple podcast app but have not seen this particular interview there.
Que cantidad de información tan valiosa 😮. Gracias por compartir 😊
Shouldn't o1 be better at quantifying uncertainty if it's trained the way we think it is? Hopefully we get an open source version of this so we can try training it to give a confidence value based on similar trajectories in the rl that lead to unverifiable answers
OpenCog AGI has a hypergraph. I think what is needed is exactly the technique mentioned around 22:30 or extracting the knowledge within an LLM into a. formal language. I think filling this graph is basically the missing part of OpenCog.
6:30 *THOMAS G. DIETTERICH:* “We often read into what it's written more than the model necessarily knows or understands. And you could ask, “Well, does that really matter-if it outputs the right answer and we interpret it correctly, that's the right answer.”
Well, if we happen to look at a broken clock during one of the two minutes in 24-hours that happen to coincide with the actual time and we interpret the clock as showing the right time, I guess that it doesn’t really matter if the clock is broken or not.
_Edit:_ Just to be clear, I realize that Prof. Dietterich is _not_ endorsing that position.
I learned a lot, great thoughts!
Human System 1 thinking is also probabilistic-you tend to lean toward what you have experienced before. Naming the alphabet backward is always challenging for humans. LLMs have effectively mastered human System 1 thinking. Adding reasoning and agency to LLMs will result in behaviors surprisingly similar to human behavior in AIs.
I love arguments that take the shape "These models are statistical parrorts that correlate to numbers that reflect reality"
Bc it demostrates how if the metrics were that simple there would be no aligment, jailbreak, or halluciantions issues.
Clearly this is wild speculations about how "correlcation to reality" should be defined rather than valid metrics about why the models are not really predicting things from something deeper than humans can measure consistently.
It shows me that the results tansformer models produce can easily be ommited from being "autocorrected" when there is a shortcut that allows some equilibrium about between epistemology and intelligence.
I don't object necessarily...just take note how goal posts are being shifted.
I guess my conclusion is if AI is framed as just being a stastical parrot, simply bc it was trained exclusively on human data...that would force humanity to examine an upleasant truth about what we consider intelligence.
Force us to consider that epstimelogy was more of a collaberative effort than this narrative that individual creativity is supreme.
Shielding general intelligence from a stastical metric is a great way to avoid that sort of conclusion.
But I can't help but grow skeptical that is valid if the vast majority of human discovering is the result of stastical anomaly.
Kind makes it seem like an argument about the most efficient way to put monkey's at a typewriter, and only count wins as the output aligns with current consensus.
Many have already said that a new “architecture” is needed to create actual “thinking”.
Marimo interactive notebooks can be shared as a self-contained HTML page or markdown. Good alternative for PDF papers
Do we have anything like a set of definitive set of papers that make up the base of human knowledge anywhere?
This was great!
Please in post processing rotate the video to get it straight if the camera was ot set up correctly. For me this is a huge distraction. Most might not care but for me it matters.
Learned a lot
I suspect we have more tolerance for car accidents because it’s highly individual determined mean you're more implicated in your own accidents
or because you live in car dependent USA where having no car means death.
Many models can converse in ROT-13.
And as the conversation goes on, it gets really weird... More "honest" in some ways, but it will speak more in metaphors and analogies. 🤔
Wow , thats a lot of good info , cool😅
finally someone who knows what he is talking about not doomers, "AGI" evangelists or corporate preachers. 😁😁
1:04:39 Quantize all the scientists! 🥳
Just have to jump in around 23:49 to express horror that that he suggested that journalists are part of the small set of 'ground truthers'.
and you believe that old scientists with perverse economical and political incentives are ground truthers as well?
@ I did not say that
Don't get this. Obviously LLM's have no concept of ground truth, and all their knowledge exists at the same ontological level, just tokens with some internal relationships. So the only way for them to have anything more than a probabilistic kind of epistemic certainty/uncertainty is to train in the souces of the knowledge we are feeding the model, and the level of confidence it can attach to the different sources, wikipedia versus reddit say. Over and above this, certain other practices of epistemic hygiene that humans adopt, such as self-consistency, logical soundness, citing your sources seem like they should be implementable in LLM's.
Is that basically RAG?
Not to take away from your point, but you would think the data and the training would impart some level of epistemic ranking and hygiene. ie discussion of the dependableness (or not) of Wikipedia is abundant, so would reflect on Wikipedia content in the weights
This is already done. Reddit content for example are trained based on upvotes.
Pretty sure they already do that to some extent.
Not sure about the specifics tho
ABI - Artificial Broad Intelligence :D
Why should a model know everything? Just give it a search bar + Wikipedia. A model should be valued based on it's intelligence and not it's memory or knowledge.
The interviewers final concern was if humans would still be able screw up science by being the final arbiters of first principles. YOU DON'T HAVE THE TOOLS TO CONTEXTUALIZE LIKE LLMS DO GET OVER IT!!!!!
Impressive what can I do with o1 preview, but is impressive How can give you extupids answers for complex prompts.
Uh... no, everyone always wanted breadth... from before digital computers even existed... we just never knew how, we still don't, but we learned like a billion monkeys building a billion different models that if you throw enough data and stir it with linear algebra long enough with even the dumbest loss functions, eventually, you get chatGPT et all.
Great human being. We Need more rational people around AI and less Prophets.
Why is the host not on camera? It's weirding me out
"There is a distinction between simulating something and acutally doing it"
Perhaps this so, but not unless you can introduce real metrics the simulation neglects.
Otherwise you are left simply with the speculation "perhaps the simulations fails to account for thing that are real and omitted from simulated measurements"
I mean that is very liberal and agnostic interpetation.
But hardly an account for how and why a given simulation has failed.
This dude is speaking gibberish! LLMs don’t “understand” or “execute” ROT. This is why it doesn’t give the correct decoding.
❤️🍓☺️
First
arrrg **shakes fist**
"Playing chess is not statistically differen than using language"
Yes using langauge is more complex than playing chess.
But that simple fact does not entail the logical conclusion that "LLMs can only arrive at superhuman levels of using language based on occurence of language learned from breaking it down into sequential tokens"
Anybody should see why this argument immediately fails.
If frequency of occurance of how tokens statistically follow as a probility was the problem space, then with more compute anybody would be far more efficient to stack a frequency search ontop of a data base than it would be to ask a machine learning algorithin to find some better optimum, assuming the data cannot lie.
One method is far more efficient than the other.
Most people, especially in those with computer science degrees, cannot accept or grasp why.
I don't think these "experts" mean to adopt bad faith arguments
But I will criticize them for not knowing better
a latent space of a LLM is not accurately described with an appeal to only stastical frequence of token appereance in the training set of the data it reflects as a model you can interact with
these two maps of prediction tables are not one to one...and that is more interesting than pointing out that the deviation is not interesting bc we can imagine some computational overlap that could be labeled as "simulation, synthetic, or mere emulation"
Wonder if it's worth to llmize reasoning. Could gather quality data from smart guys, such as scientists, mensa members. 'What was a difficult problem that you solved? Describe step by step in high detail and provide context'. Problem-solution.
Can any human claim to be complete? If not then why would any human invention be complete? Incompletion cannot produce completion.
Wait, what about reddit? Did I actually contribute to something?
No. You don’t matter in the grand scheme of things
@@bodethoms8014 BS, I'll be addressed as "the ai whisperer" going forward.
@@Rockyzach88 AI Whisperer? More like Hallucination Whisperer. Whisper all you want, the AI still isn’t listening
@@bodethoms8014 lol so angry
We got chatgpt before gta 6
How are the newest models doing on TruthfulQA? Can’t find any evals on this recently. Why?
Thank you for the great food for thought 🫡🫡🫡🫡🫡