No AGI without Neurosymbolic AI by GaryMarcus

Поділитися
Вставка
  • Опубліковано 4 бер 2024
  • Gary Marcus
    No AGI (and no Trustworthy AI) without Neurosymbolic AI
    Ontology Summit 2024
    6 March 2024
    bit.ly/3P4YxYw
  • Наука та технологія

КОМЕНТАРІ • 58

  • @dg-ov4cf
    @dg-ov4cf 4 місяці тому +9

    ok but claude 3 opus is actually insane at generalizing its knowledge to reason intelligently

    • @ChaoticNeutralMatt
      @ChaoticNeutralMatt 3 місяці тому +3

      I mean even Claude Sonnet.. well I take it that they are the same model at different context limitations. Claude used to use big words but didn't really feel genuine in the same way it engages with topics now. It's hard to really explain the difference, but I stopped using Claude before for a reason.

  • @melkenhoning158
    @melkenhoning158 3 місяці тому +11

    Gary Marcus is absolutely terrible at arguing a valid point about the state of AI. He constantly shows instances of these models screwing up something at inference which all the scale-bros will just refute with their next beefed up language model. He needs to stop pointing at bad inference, it makes him an easy target to dunk on when these models inevitably get trained to fix them. He needs to focus publicly on the flaws of the LM architecture alongside his examples of bad inference. I can’t say whether his neuro-symbolic approach is the key to his “AGI” but his overall criticisms are mostly valid.

    • @MuantanamoMobile
      @MuantanamoMobile 13 днів тому +1

      I don't believe his thesis is that Neuro-symbolic AI is THE key to AGI, rather that it is a more promising and empirically more probable path to AGI, one that could lead to the actual key to AGI than Auto-regressive transformer based ML architectures whose limitations are a non starter.

    • @melkenhoning158
      @melkenhoning158 8 днів тому

      @@MuantanamoMobile well put, agreed

  • @veganphilosopher1975
    @veganphilosopher1975 3 місяці тому +3

    My new favorite science channel. great content

  • @dibyajibanpradhan7218
    @dibyajibanpradhan7218 13 днів тому +1

    Story time.
    When I was a kid, I used to read lots of science books, whether I understood it or not. The books were for people who were more advanced in these fields. So sometimes my teachers and my friends you to get curious and they used to ask me about what I had learnt. I had vague understanding of the subject then, and I used to tell them this or that. And most of my answers were believable but not quite true. It was like a game of playing detectives. I knew some facts and I tried to fit a story on to it. But people had established the facts before me. Facts that held true. Perhaps I could have modified my stories had I had access to all of the facts. It may be would not have made an interesting story, but it would have been a true one. I think the problem with today's AI is this; only it has all the facts, but not their relation to one another. In case of image generation, it has to understand or be programmed to understand, that it cannot work with the facts. How true things are arranged in a sentence can change the anticipated output into a statement that is untrue but desired. The AI is capturing a bunch of objects in the prompt and fitting them into the relations it has been trained on. (Horse riding an astronaut in a bowl of fruits)
    In the case of historically inaccurate astronauts, it is simply a case of bad instruction. It has been instructed to be diverse regardless of what you prompted it to do. Increasing the weights for the trigger word 'historically accurate' will not simply solve it. Because it will trigger the neurones which control the background for example and if the prompt was for historically accurate astronauts in a mediaeval setting, it would have produced hot garbage.

    • @dibyajibanpradhan7218
      @dibyajibanpradhan7218 13 днів тому

      To add some context, I used to make up stories of what I thought was true when I did not know something. It isn't that chatGPT hasn't been programmed to doubt but it is simply lying through its teeth when it does not know something. The fact that it has no awareness of the immorality of lying is another story.

  • @shimrrashai-rc8fq
    @shimrrashai-rc8fq 4 місяці тому +2

    This is exactly what I've been thinking - and trying to even dabble in some as someone with a lot of coding experience (right now mostly limited to playing around with building a proper parser to parse natural language using an _ambiguous_ formal grammar combined with a neural disambiguator). The sense I get with it is that neural components should be used to create "articulations" between components of an otherwise-engineered AI system where that things could be "expected to get fuzzy". For example, consider an object recognizer. Ideally it should not just be a "black box". You should be able to supply an explicit polygon mesh model or the like indicating what it's supposed to recognize "should" look like "prototypically". In reality, of course, an explicit instance of the object will not be _exactly_ the polygon mesh - e.g. think a "real teapot" vs. the archetype given by the "Utah teapot" model in classical computer graphics. The neural part would bridge that gap - by learning, not how to recognize each object anew, but instead how to "deform" or "warp" _any given_ object to fit it to a candidate picture. Heck, we might envision a multi-stage system, where that one neural stage deforms the mesh geometry, then it is passed to a conventional render engine to generate a comparison picture, then a second stage deforms/tweaks that comparison image at a pixel level for things like light, shadow, and obfuscation - which one sees even more obviously could be factored out still further, e.g. with a shader in the pipeline one can have the renderer just generate lighting in the comparison image, and there'd be another neural network which adds and positions light sources around the object. In any case, once you have such a system, recognizing new objects is then as simple as just adding more poly mesh files to the system, which is exactly how it should be (and thus would likely result in dramatically smaller data load), and the operation is mostly transparent - you can even tap the neural output to see just how it is deforming the object to try and recognize it. And the trick is to find the "sweet spot" of how much is "programmed" and "engineered" by conventional software engineering versus how much is "trained" by the neural joints between components to minimize data demands (and thus allowing, say, 100% sourcing of data from ethical sources) while maximizing performance.

  • @rightcheer5096
    @rightcheer5096 4 місяці тому +9

    This talk is better than valium for a Doomer. Temporarily.

  • @mnrvaprjct
    @mnrvaprjct 4 місяці тому +3

    This is basically discussing the notion of AIs running on neuromorphic architectures as opposed to LLMs running on classic computer architectures?

  • @true_xander
    @true_xander 3 місяці тому +1

    "There is no physics theory where the objects can spontaneously appear and disappear"
    Well, I have some news for ya...

  • @dibyajibanpradhan7218
    @dibyajibanpradhan7218 13 днів тому

    About the glass breaking and the basketball example, maybe when it creates an environment like that we can program it like the physics engine of of a game

  • @mitchdg5303
    @mitchdg5303 3 місяці тому +1

    can the transformer network behind LLM's not develop a neurosymbolic system within itself in order to predict the next token

  • @justinlloyd3
    @justinlloyd3 4 місяці тому +2

    LeCunn would agree with you that language models simply being scaled up is NOT the future of AI

    • @asta3457
      @asta3457 3 місяці тому

      But he wouldn’t agree with this neurosymbolic gibberish

  • @50ci4l_T1lt
    @50ci4l_T1lt 4 місяці тому +2

    I think you could use the principals of Otter AI on a more general AI. Otter AI listens to your conversation with another person and is able to tell you anything about the conversation. It doesn't add anything or take away. This property of only telling you about what was said can be changed to telling anything relating to that subject and only what is known about that subject. This avoids potential hallucinations.

    • @true_xander
      @true_xander 3 місяці тому

      The problem goes much deeper than that: current neural networks aren't able to distinguish between the domains of what was shown, what wasn't and what is known at all. They are lacking of the ability to "understand" things, I'd say they could not understand things at all. Modern "AI's" are just complex large pattern recognition programs, good for some specific tasks and awful for everything else. But stupid people are easy to impress, so the world is holding its breath thinking its some sort of tech revolution. No, its not, and its far from it yet. We just built things that could sometimes tell a picture of a cat from the picture of a dog, but has no idea what is a "cat", a "dog", an "animal", "pet", "picture" etc.

  • @OpenSourceAnarchist
    @OpenSourceAnarchist 4 місяці тому +2

    This video has confirmed my own thoughts and analysis. I've been reading and studying linguistics and philosophy for too long... from a computational paradigm, this all just seems obvious?

  • @stfu_ayden
    @stfu_ayden 4 місяці тому +1

    Interesting ideas for sure. I think some new breakthrough definitely needs to occur.

  • @novantha1
    @novantha1 4 місяці тому +5

    "LLMs cannot achieve a certain level of abstraction, knowledge, agency, and reasoning ability due to inerrant limitations in the presentation of data, architecture, and methods of information propagation."
    This is not a direct quote from the presentation, but I feel that this is a summary of Mr. Marcus' opinions as presented in this talk.
    I'm...Not entirely sure this is "right", but it may be "correct".
    A lot of the examples given in this presentation were akin to cheapshots, in my opinion, a good example being the Sora video in which liquid for some reason just vacates the glass through a solid barrier for lack of a better word.
    The ability to have water behave as well as it did through entirely "latent" means was remarkable in and of itself, so I don't feel that being reductive about it having a hiccup and claiming "there is no way this system, even if scaled, will ever produce a reasonable understanding of the world" is totally fair. I feel that taking into account the world we live in, one would also have to argue "Due to quantum tunneling, it's inconceivable we could have solid objects" in order to maintain an internally consistent worldview. Obviously anyone reading this has the benefit of scrolling down to read this comment precisely because scale in emergent systems can produce reliable behavior (allowing you to lay your hand on a mouse that functions as though it were a solid barrier).
    It's worth noting that even Sora probably isn't remotely near the number of neurons and synapses that a human brain has, and given the improvements we've seen in information density with sparse architectures, distillation, and quantization, I think it's not unreasonable to propose that a version 10x the size of Sora (in its quantized state), distilled from a variant of Sora 100x the size of that, could probably have a stronger latent understanding than we're seeing now, and much as quantum tunneling is evened out by an increase in scale stabilizing the probabilities of that tunneling, I think many inexplicable behaviors of Sora would be evened out.
    But another major point: MLP blocks are universal function predictors. Yes, there are levels of abstractions these models may have yet to understand, and yes it would be nice to encode a bootstrapped understanding of those ideas inherently with the architecture, but given sufficient examples of that concept, and the neurons to model it, I as well posit that you will find those ideas to simply be an unknown function.
    It's worth noting that at almost every stage of the development of neural networks as a field of study, people have thought "Oh, we have to encode a human level understanding of these ideas into the structure of the network" and it's only with modern scales of data and neural networks that we've achieved it, not necessarily with architectural improvements. If you go back and replicate research from studies ten, twenty, fourty, seventy years old, I think you'll find that the capabilities of those studies, when scaled to modern levels, are far more capable and "modern" than you would expect, simply due to scale.
    But I think there's going to be a fair bit of skepticism to this comment, and so I'd like to address a few ideas that come to mind.
    "We don't have the data to encode an understanding of these high level ideas, because it's not exactly something found naturally on the internet, and even when it is, it's difficult to annotate that example in a way that could tech that high level understanding to an LLM"
    Sure. Looking purely at language, it can be difficult to convey the nature of, for instance, transparent objects, and notably transparent objects as they relate to theory of mind. So don't convey it with language. Multimodality will likely offer a huge advantage in general world knowledge, and is probably a step closer to AGI, and we have a variety of modern software tools (Blender and so on) which can convey various important ideas. We can also do simulations in game engines for things like theory of mind to an extent, which could keep track of objects a specific figure has seen, for instance. I also think that multi-turn training, where LLMs interact with one another (or instances of itself) for things like theory of mind will also be a big step, in the sense that we're seeing their performance when they haven't really "interacted" in their training phase, they've only "learned passively" if you put it in human terms.
    "LLMs don't understand things! They only do statistical analysis"
    All right, sure. So why is it that they can generalize to moves that they haven't seen in chess? Why can they produce a relative understanding of the chess board from one dimensional data? Why can you remove all instances of addition equaling a certain number and still find that the model can perform that addition? Why can they produce a higher dimensional understanding of lower dimensional data?
    Because it's the most efficient way to model those predictions accurately. My suspicion is that this is related to "Grokking".
    It's also worth remembering that it's probably possible to break these ideas down into tiers of difficulty to understand, and the tier nearest to the current capabilities can probably be modeled in answers to prompts from humans, so we can probably produce synthetic data (which by its nature is remarkably well labelled and numerous) to solve a variety of high level understanding problems.
    "Look at this capability this model still doesn't have! It'll never achieve it!"
    Models will also never be able to write stories, tell jokes, function as agents, paint pictures, produce video, or annotate whether a bird is in an image or not. These are all capabilities that were said to be impossible, and have become possible over time, to varying degrees.
    Yes, it's possible that there are major things that they can't achieve currently, but in the face of the massive level of progress that we've seen, especially recently, I'm not terribly inclined to bet against neural networks achieving many things (though in the interest of being fair to a speaker who likely will never have the opportunity to defend himself against this comment, I'll allow that preventing a divorce resulting from an argument about the color of the blinds in a Home Depot may be one of them).
    It might be that we hit a scaling wall, beyond which more parameters cannot accurately convey information to long range dependencies, so for instance, it's not possible to move information satisfactorily from neuron 1 to neuron 1 trillion, and no scaling beyond that will help (I could get behind this opinion actually, it would explain why mixture of experts is so effective at larger parameter counts), but even if that's the case, I'm fairly confident that there should be something that can be solved in the workflow.
    Nobody said that these models have to produce the correct answer to a prompt in a single shot.
    What if it thinks about it in steps? What if it prepares a prompt for itself? What if it externalizes reasoning and mathematics to code? Can it produce images to visually reason through a problem? Can it produce thousands of responses and evaluate each one until it finds the right approach? Can it break the problem down into steps and solve it one by one in a multi-turn manner (not a simple Chain of Thought prompt)?
    There are problems that humans can't solve in a way that we expect LLMs to. What happens when they have access to the same tools and autonomy that we do? What happens when they have a "sticky note"?
    I'm not sure that we need anything more than refinement.
    To reiterate:
    I think you'll find that most failures of AI models going forward will be because they just didn't have the quantity of data, quality of data, or parameters to model "the last function".

    • @smalltimep
      @smalltimep 4 місяці тому +2

      Thanks for taking the time to share your thoughts.
      I'm inclined to agree, though my logic is probably more reductive - it seems unlikely to me that all the work and developments done with deep learning and neural networks has been in the wrong direction, and that something newer like neuromorphic architecture will quickly surpass and surplant it.
      I think you'd be interested to look at the work OriginTrail is doing on a decentralised knowledge graph, essentially providing a knowledge audit for a chatbot with the aim on reducing hallucinations.
      I think you'd also be interested in the Bittensor network for what they are doing with mixture of experts and decentralisation.

  • @loren1350
    @loren1350 3 місяці тому

    Not enough people seem to understand that the currently popular batch of AI is essentially just leveraging very very complex averaging. Or maybe it's just that too many people think that's what intelligence is.

  • @Threchette
    @Threchette 3 місяці тому

    great talk and content, thnank you!

  • @dibyajibanpradhan7218
    @dibyajibanpradhan7218 13 днів тому

    I think we should redefine agi as something that learns how to handle chaos. Or try to do it. Because as humans we too have a problem with handling chaos. So something like us or better.
    Goddamn! I said the same thing as this guy.

  • @glasperlinspiel
    @glasperlinspiel 4 місяці тому +3

    Once you have that you provide an epistemological scaffolding that supports communicable reasoning. You also get a way to model human-similar psychodynamics. At least close enough for an AI to refine a predictive model.

  • @glasperlinspiel
    @glasperlinspiel 4 місяці тому +2

    Think like a primitive cell, then it’s not so hard to get an ontological infrastructure that will eventually support sentience

    • @9000ck
      @9000ck 4 місяці тому +1

      while true, there are many steps from cell to a creature that expresses sentience. some of those steps involve pathogens, tapeworms and sharks.

    • @glasperlinspiel
      @glasperlinspiel 4 місяці тому

      Yes, but I'm talking about ontological steps. In that case there are only (depending on how you count) 5 (three of which are variants on the second which is made up of 4 classes each composed of two sets that overlap).@@9000ck

    • @quantumastrologer5599
      @quantumastrologer5599 4 місяці тому

      Inspiring

  • @houseofvenusMD
    @houseofvenusMD 4 місяці тому +3

    Thanks for sharing this discussion with us Dr. Marcus. As an urban farmer who has grown up with drones and plans to use them to operate my family's farm after university, this is a very pressing issue for me. I am actually more optimistic than you are with regards to the timeline for the "revelation" of AGI, but as an expert your perspective is invaluable. Stay visionary 🚀

  • @stcredzero
    @stcredzero 4 місяці тому

    A person (accustomed to using metric weights) has a neural network which contains data derived from the experience of lifting 1 kg vs. 2 kg. The current crop of LLMs do not have such data. They only have access to people writing about weights. If multi-modal models start getting trained with this kind of data, they will likely do much better with the 1 kg vs. 2 kg example.
    EDIT: Serious bit of misinformation at the end. Tesla's FSD 12 is an "end to end" model, or "pixels in, driving controls out." It does not need detailed maps. (That's the piece of misinformation. You say they all need this.) It's probably not familiar with driving patterns in Mumbai, but lots of human drivers wouldn't be able to handle that either without great alarm. Also, the Tesla architecture does have some mechanism for object permanence.

  • @ChaoticNeutralMatt
    @ChaoticNeutralMatt 3 місяці тому

    I agree with the sentiment, but i would also disagree with how far it can take us. I can easily see AGI version 1 being an LLM, while future versions with better overall balance and function being hybrid models.
    As far as calling it such.. well i expect the first AGI won't be called AGI for various reasons.
    That said your complaints are warranted. You can tell when it doesn't fully grasp something.

  • @reinerwilhelms-tricarico344
    @reinerwilhelms-tricarico344 3 місяці тому

    Now we need a lecture series about how to build machine learnable neurosymbolic systems. It wasn’t really clear what that actually is other than getting sobered by this talk to understand that it’s necessary. It will be hard to turn this big ship around. After so much hype and money was spent on these new awe inspiring AI fabulation machines it’s hard to tell people that the research has to go back to the drawing board with a very different direction.

  • @prevarikator
    @prevarikator 4 місяці тому

    18:10 is really important.

  • @reinerwilhelms-tricarico344
    @reinerwilhelms-tricarico344 3 місяці тому

    So - Gary Marcus has no pet chicken named Henrietta? 😅

  • @SuperFinGuy
    @SuperFinGuy 3 місяці тому

    You obviously haven't used Claude 3 opus.

  • @glasperlinspiel
    @glasperlinspiel 4 місяці тому

    Sounds like you read Amaranthine: How to create a regenerative civilization using artificial intelligence. With respect to AGI, ontology is destiny

  • @thesleuthinvestor2251
    @thesleuthinvestor2251 4 місяці тому

    The ultimate Turing Test for AGI is: Write a novel that, (1) once a human starts reading it, he/she cannot put it down, and (2) once he/she has finished it, he/she cannot forget it. How many years do you think we have to wait for this task to be accomplished by an AI?

    • @nilskp
      @nilskp 4 місяці тому +1

      Why would that be AGI? Thats just an improved LLM. AGI would solve most current problems, if solvable, like self-driving, autonomous robots doing surgery, give us a unified theory of everything, etc

    • @thesleuthinvestor2251
      @thesleuthinvestor2251 4 місяці тому +1

      No AI today or in the near future, LLM or AGI, can write a novel. Ask any of the existing ones and see. They can have no ideas how to create a character, show it developing, demonstrate the character in action (showing rather than telling), have it speak differently than other character, and develop a plot. No AI today or in the future can do that. I know Ai (in 1994 I went to Hinton's classes), and used them, and also write books. As far as I know, no AI or AGI or LLM has even a hope of writing a novel.

    • @nilskp
      @nilskp 3 місяці тому

      I don't think you understand what AGI means. Once we have AGI, whatever the timeframe, it will by definition be able to write a novel. @@thesleuthinvestor2251

  • @glasperlinspiel
    @glasperlinspiel 4 місяці тому +1

    Neurosymbolism is epiphenominal, you have to build up from an experiential ground of being.

  • @birgirkarl
    @birgirkarl 4 місяці тому

    24:55

  • @user-mj2lm5fh1j
    @user-mj2lm5fh1j 4 місяці тому

    Great Video. I have used all the LLMs and they are all crap. I use them all the time and I can't trust any LLM unless I know the thing I am searching for. Also, non of these LLMs can pass my test of consciousness for AI

    • @mykalkelley8315
      @mykalkelley8315 4 місяці тому

      These ai's are pretty good until they are aligned. (aka lobotomized)

  • @9000ck
    @9000ck 4 місяці тому

    Gary Marcus doesn't realise he actually has a pet chicken.

  • @kubexiu
    @kubexiu 4 місяці тому

    What's ironic, that we have to give a goal, a target to an AI. And no one knows what would be such a general goal. It has to be general goal so You can call it general AI. And why its ironic? Because everyone is afraid of any "aiming" artificial intelligence.

  • @glasperlinspiel
    @glasperlinspiel 4 місяці тому +1

    Hmmm, 9:58 no you need to read Amaranthine. For instance planning is not as hard as you think and reasoning is, well, read the book…..

    • @dg-ov4cf
      @dg-ov4cf 4 місяці тому

      is this suitable for non-CS majors?

    • @glasperlinspiel
      @glasperlinspiel 4 місяці тому

      Yes, Amaranthine examines the bio-mimetic, ontological, epistemological, ethical, philosophical, psychological, economic, and organizational foundation for a type of AI that does not hallucinate like either an LLM or a human being. His aim is to develop an AGI immune to the biases that undermine civilizing behavior in contrast to what is currently in development. The writers Ph.D. is in psychology, but what makes the book powerful is its multi-disciplinary approach. The concepts are translated into procedural language to support programming, and in the penultimate chapter he does specify an "algorithm,' but it is an ontological algorithm, rather than an actual algorithm. The author has specified software and facilitated software development, but he is not a computer scientist. @@dg-ov4cf

  • @Probablee_Ashlee
    @Probablee_Ashlee 4 місяці тому +7

    I get what you’re saying however it’s gonna be a no for me dawg.

  • @BryanWhys
    @BryanWhys 3 місяці тому

    You're only referencing failures of basic, non fine tuned models, and you're only referencing the models themselves, not their behavior during proper application... You have abysmal references to the GENUINE ontology and epistemology of machines, you know the actual math and mechanistic interpretability, but a weak synopsis of your bad anecdotal testing. Present real science, not a bunch of arbitrary and poorly executed use cases.