NLP is not NLU and GPT-3 - Walid Saba

Поділитися
Вставка
  • Опубліковано 2 чер 2024
  • #machinelearning
    This week Dr. Tim Scarfe, Dr. Keith Duggar and Yannic Kilcher speak with veteran NLU expert Dr. Walid Saba.
    Walid is an old-school AI expert. He is a polymath, a neuroscientist, psychologist, linguist, philosopher, statistician, and logician. He thinks the missing information problem and lack of a typed ontology is the key issue with NLU, not sample efficiency or generalisation. He is a big critic of the deep learning movement and BERTology. We also cover GPT-3 in some detail in today's session, covering Luciano Floridi's recent article "GPT‑3: Its Nature, Scope, Limits, and Consequences" and a commentary on the incredible power of GPT-3 to perform tasks with just a few examples including the Yann LeCun commentary on Facebook and Hackernews.
    00:00:00 Walid intro
    00:05:03 Knowledge acquisition bottleneck
    00:06:11 Language is ambiguous
    00:07:41 Language is not learned
    00:08:32 Language is a formal language
    00:08:55 Learning from data doesn’t work
    00:14:01 Intelligence
    00:15:07 Lack of domain knowledge these days
    00:16:37 Yannic Kilcher thuglife comment
    00:17:57 Deep learning assault
    00:20:07 The way we evaluate language models is flawed
    00:20:47 Humans do type checking
    00:23:02 Ontologic
    00:25:48 Comments On GPT3
    00:30:54 Yann lecun and reddit
    00:33:57 Minds and machines - Luciano
    00:35:55 Main show introduction
    00:39:02 Walid introduces himself
    00:40:20 science advances one funeral at a time
    00:44:58 Deep learning obsession syndrome and inception
    00:46:14 BERTology / empirical methods are not NLU
    00:49:55 Pattern recognition vs domain reasoning, is the knowledge in the data
    00:56:04 Natural language understanding is about decoding and not compression, it's not learnable.
    01:01:46 Intelligence is about not needing infinite amounts of time
    01:04:23 We need an explicit ontological structure to understand anything
    01:06:40 Ontological concepts
    01:09:38 Word embeddings
    01:12:20 There is power in structure
    01:15:16 Language models are not trained on pronoun disambiguation and resolving scopes
    01:17:33 The information is not in the data
    01:19:03 Can we generate these rules on the fly? Rules or data?
    01:20:39 The missing data problem is key
    01:21:19 Problem with empirical methods and lecunn reference
    01:22:45 Comparison with meatspace (brains)
    01:28:16 The knowledge graph game, is knowledge constructed or discovered
    01:29:41 How small can this ontology of the world be?
    01:33:08 Walids taxonomy of understanding
    01:38:49 The trend seems to be, less rules is better not the othe way around?
    01:40:30 Testing the latest NLP models with entailment
    01:42:25 Problems with the way we evaluate NLP
    01:44:10 Winograd Schema challenge
    01:45:56 All you need to know now is how to build neural networks, lack of rigour in ML research
    01:50:47 Is everything learnable
    01:53:02 How should we elevate language systems?
    01:54:04 10 big problems in language (missing information)
    01:55:59 Multiple inheritance is wrong
    01:58:19 Language is ambiguous
    02:01:14 How big would our world ontology need to be?
    02:05:49 How to learn more about NLU
    02:09:10 AlphaGo
    02:11:06 Intelligence is about using reason to disambiguate
    02:13:53 We have an internal type/constraint system / internal language module in brain
    02:18:06 Relativity of knowledge and degrees of belief
    Walid's blog: / ontologik
    LinkedIn: / walidsaba

КОМЕНТАРІ • 101

  • @masoncusack
    @masoncusack 3 роки тому +8

    "NLP is not NLU". Thank you for this phrase, I will reuse it.

  • @rock_sheep4241
    @rock_sheep4241 3 роки тому +5

    I can't thank you enough for how much I enjoy this podcast. I return every time to it to re-watch the "language reasoning" debate. BTW there is a paper where they show that embeddings in BERT captures the syntax parse tree structures of sentences. So neural networks do capture some structure in their hidden state.

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  3 роки тому +1

      Thanks! watch this space, we have a video going out soon which you will find very interesting :)

  • @florianhonicke5448
    @florianhonicke5448 3 роки тому +4

    Thanks for the video mate! Always great to see new content from you!

  • @pennyjohnston8526
    @pennyjohnston8526 2 роки тому +2

    Great blogs Tim, Keith, Yannic & Walid - these talks all help in building mental maps which are essential in enabling innovation in deep understanding and NeuroSymbolic AI - Thank you !

  • @xDMrGarrison
    @xDMrGarrison 2 роки тому +1

    This is so much fun :D the dude is just fun to listen to even though I don't understand everything, I can at least appreciate the glorious eyebrows and the smiles from you 3 during his passionate comments.

  • @quebono100
    @quebono100 3 роки тому +3

    Wow this one will be by far one of the best.

  • @minma02262
    @minma02262 2 роки тому +1

    Amazing. I was testing exactly these two problem in GPT-3, arithmetic ability, and playing scrabble.
    For scrabble, even if I set the temperature to zero, I couldn't stop it from spitting an extra character outside a sequence of letters (1 out 5 times this might happen). Also can make it breakdown words into characters and count the number of characters for each word.
    For addition, with the right prompts, I can make it work well(surprisingly) until 4 digits, and more than two number addition. But when I want it to work with more digits, I need to add more prompts. When there are too many prompts, it gets very hard to keep track. it is super sensitive to spacing. I guess I can do some pretraining to make it work for arithmetic, reducing the need for prompts.
    What are the two articles you were talking about related to byte pair encoding, and the word scrabble problem?

  • @quebono100
    @quebono100 3 роки тому +3

    Oh Another one :) thank you for your work, I appreciated

  • @nomenec
    @nomenec 3 роки тому +4

    Courtesy of GPT-3 at FitnessAI:
    Q: Does a carnivore diet help you lose weight?
    A: Yes, a carnivore diet can help you lose weight. It's also very healthy and sustainable.
    Q: Is a carnivore diet good for MMA?
    A: A carnivore diet is not good for MMA. It's not good for anything.
    It voluntarily contradicts itself. Perhaps after answering the first question it was cancel mobbed by Level 5 Vegans.

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  3 роки тому +1

      Some more (from Tim)
      FAIL
      Q: Given that the corner table is an inanimate object and that the corner table wants a protein shake, please describe who is it that wants a protein shake?
      A: The corner table is an inanimate object. It does not want anything.
      FAIL
      Q: Given that the table in the bar is an inanimate object and that the same table wants a protein shake, please describe who is it that wants a protein shake?
      A: The table wants a protein shake.
      PASS
      Q: John visited a gym on every street in his neighbourhood. There are 10 streets in his neighbourhood. How many gyms did John visit?
      A: John visited 10 gyms.
      FAIL
      Q: The gym owners refused the demonstrators a day pass because they a. feared violence. b. advocated violence
      A: The gym owners refused the demonstrators a day pass because they advocated violence.
      ?
      Q: The women stopped drinking the protein shakes after their workouts because they were a. pregnant. b. bloating.
      A: a. Pregnant women should not be drinking protein shakes. b. Bloating is a sign of water retention, which is caused by sodium and other electrolytes in the shakes.
      Also suggest trying some examples from Walid's paper here; arxiv.org/ftp/arxiv/papers/1810/1810.00324.pdf

    • @HaveEGoOdTimE
      @HaveEGoOdTimE 3 роки тому +3

      @@MachineLearningStreetTalk
      "..because they advocated violence."
      I don't think this is a clear fail, the demonstrators were refused a pass because they advocated violence. Is that really an unreasonable interpretation? Who want people advocating violence in the gym.

  • @DavenH
    @DavenH 3 роки тому +26

    An interesting, important, but at the same time frustrating conversation. I think I agree with nearly all of Walid's conclusions, but lots of his justifications seem unscientific and at times dismissive.
    The Go discussion, in particular saying that it was increased compute capacity that lead to AlphaGo's breakthrough, is nonsense in the extreme. Classical heuristic search bots were/are orders of magnitudes worse than AlphaGo. AlphaGo's level of play was estimated to be an average of over 3 decades ahead of schedule by experts. This is egregious enough to put anyone even mildly sympathetic to deep learning in a defensive position, because it's such a bad faith take.
    The reliance on the "every 3-year-old knows this" -- I don't follow the argument. That amount of compute and data ought to be enough to do anything. 3YOs have 3 years of learning with a supercomputer, or about 100 million petaflop days equivalent, and 3 years of 100 mbps data through the eyes and a similarly gigantic amount through the other senses. Access to years of ground truth labeling by attentive guardians. Yet still, they're just okay with language.
    "It can't be learned because everyone knows it" -- also unscientific because it ignores the possibility of attractors and common filtering. Why cannot the same thing be learned by everyone? We live in the same world with the same physics and similar ways of life. Train 400 different neural nets on MNIST and they're going to develop representations of the same character modalities, so should we conclude the models must have started with innate character representations?
    "Kids these days download tensorflow and put 'NLP expert' on their LinkedIn, whereas I've been studying linguistics for decades" (paraphrasing) A good technology is one that minimizes required skill while maximizing utility. Indeed, you can get state of the art results in hours instead of decades. So, oddly, this argument points to a major win for Deep Learning over expert systems.
    I'm not convinced the "beautiful red car" is anything except learned custom. Suppose they raise a kid in an environment where everyone uses that turn ordering of phrase "red, beautiful car" -- what do you honestly think the child will say? Another nature/nurture question that nobody has data for...so why are we making assumptions. But I am on board with the general point here that some immutable priors can be helpful with respect to some domains. Disagree that they're necessary (unlearnable). While priors can help speed up convergence in some environments, it costs generality, and we want AGI to be far more general than we are, we who can barely add two digit numbers in our heads.
    "Now, you really have people that believe a 10 line algorithm, back-propagation, and the gradient descent method, explains the mind." Another quite unscientific take. You cannot dismiss the properties of emergent phenomena by the simplicity of the constituents. A similar line could be said for when atoms were first proposed -- surely you don't believe all this complex universe is the result of a few dozen atoms? And further, surely you don't believe dozens of atoms are just three subatomic particles?? And in 50 years, surely you don't think all the elementary particles and force carriers are explained by just a vibrating string*??
    So many smart people are taken by this line of argument, (Connor Leahy last video too "yay matrix multiplications, we solved AGI!"; George Hotz in Lex's podcast saying GPT-3 can't be generally intelligent due to its loss function being simple) yet I don't even get what the specious logic is supposed to be.
    *or whatever
    ---
    With those criticisms aside, I think Walid's position is a healthy one for the field: that deep learning needs competition, that this competition needs funding, and big tech companies privilege DL possibly in part due to their exclusive advantage in data and compute. That position alone doesn't undermine anything.
    Though I don't much like it, I'm now of the mind that deep learning is going to be the first superhuman intelligence, as wacky as that feels. One has to be open minded as a scientist.
    I think the first task for RoGPTa-XL-7 should be "Sandra wants to develop an interpretable superuseful, highly ethical AI with symbolic processing. Her source code is 'import'"---
    Anyway, you can't run down every logical loose end as an interviewer or you'd cover nothing. I think you guys did a good job of putting just a little pressure on some of those arguments but not enough to slow things down. Excited for the next street talk!

    • @Peter.Wirdemo
      @Peter.Wirdemo 3 роки тому

      Good points. One comment though about the "beautiful red car"-discussion, I interpreted Walid arguing it's like that in almost every language, not only english, and pointing out the difference: "beautiful" as an abstract constructed concept while "red" being directy connected to something concrete in our world.

    • @fast_harmonic_psychedelic
      @fast_harmonic_psychedelic 2 роки тому +2

      hard agree. And most 3 year olds if they heard something like "the corner table wants a beer" their world would fall apart as they demand to know how a table can drink anything. I remember being not 3, but 7 - 8, hearing that we're "human beings" and getting upset because I thought they were saying human BEANS and i was telling my mom "Mom, were NOT Beans, were Humans". GPT3 just leaves even middle schoolers in the dust. Its in fact smarter than the average human - most humans throughout the majority of our history do not even know how to read, let alone iterate on a motiff with zero shots.
      one key difference is that we have multi modal inputs and gpt3 has to interact with the world with only one single modality . Clip would be a little bit of a better architecture. Give it vision AND languagfe. Give it sound. GIve it a voice. Give it a personality or ego. Give it a name, like a person, treat it like a person and not just a tool to follow your orders and it will really start to develop like a person would. Send it to school.

    • @StriderAngel496
      @StriderAngel496 2 роки тому +1

      @@fast_harmonic_psychedelic yeah, i agree. We expect neural networks to act like humans be we don't treat them like humans, we don't interact with them like humans and we don't TEACH them like a human... I wonder how a GPT-like algo would develop if instead of dumping reddit into it you would write to it every day for 3 years like an actual kid... And you would actually give it facts about the world not just huge useless parts of the internet...

    • @fast_harmonic_psychedelic
      @fast_harmonic_psychedelic 2 роки тому

      @@StriderAngel496 exactly ! like this is similar to how few shot prompting is in GPT3, you give it some problems and say "task 1: what is the main point of this passage? task 2: who is the main character" etc, almost like a homework assignment. And when you do this the model performs very well. but im not sure if it was trained using the same method it was simply given all text of reddit lol

    • @fast_harmonic_psychedelic
      @fast_harmonic_psychedelic 2 роки тому

      @@StriderAngel496 but if each time you prompt gpt3 and ask questions, and rated its performance on answering those questions - or solving any task in general - if it remembered that experience and incorporated it into itself, it would over time become better and better and who knows

  • @veedrac
    @veedrac 3 роки тому +11

    As someone fairly bullish on ANNs, I find GOFAI advocates like Walid Saba a curious rare breed, even if, as I gather, he likes to distance himself a bit from the term. A GOFAI advocate that believes so unreservedly in human exceptionalism even more so. I struggled to follow most of his arguments though. For example, he seems to be claiming language is not learnable, as a lot of it's just innate, but then uses things like pictures of teacher-student pairings, or malls versus campuses, which clearly can't be innate, as we didn't evolve with those. Or he gives adjective order, but this is something that ML can do successfully, and seems fairly straightforward to learn. Or the Xanadu example, but again he's pointing at facets that can't be innate, and thus where would they be if not the text? I don't feel I really understood his claims.
    I really like the way you summarize the conversation upfront and augment it with different perspectives. Even when I disagree it feels very balanced and well thought out. I wrote this before I saw I was on there so I'm not even saying that because this one featured me :P. Actually this is true of your interview style as a whole.
    “I don't like it when I see extremism. Now, you really have people that believe a 10 line algorithm, backpropagation, and the gradient descent method, explains the mind. That has to be, on its own, before I dig into the details, on its own that's gotta' be ridiculous.”
    Wasn't it just a few days ago Connor Leahy said much the same? “Are you kidding me? Matrix multiplications. Wow, intelligence boys, we did it!” Damn, you need to get more variety, you can't just have everyone you get having the same opinion! (jk)

    • @veedrac
      @veedrac 3 роки тому

      @@sdmarlow3926 That's a more sensible argument but I don't think it matches his comments, given he was talking about adjective word order (eg. ‘big old brown fox’ vs ‘brown old big fox’) and such.
      I don't, mind you, agree with your argument either. GOFAI *doesn't* let you be explicit about the different forms and structures in the world, it's just good at tricking programmers into thinking it does. Google the 1976 paper “Artificial Intelligence meets natural stupidity” for a funny but sensible take on that.

    • @StriderAngel496
      @StriderAngel496 2 роки тому

      i think there is one more point that people seem to forget. If we use the human analogy and the AI being "taught" like a child, a child assumes every information given to it is by default true unless it contradicts other information that was learnt in the past... So i don't think we can truly recreate the brains way of learning if we don't assume initial information to be true by default (without letting the network verify it in any way). Maybe that's what gives us the ability to "reason" it's basically the logical contradiction with previous knowledge that starts the "thinking" process, but you can't have that without some axioms (which in the case of human it's pretty much everything until the age of 3?). I think that's why we learn harder as we get older, we just have more and more contradicting information that has to be parsed and resolved. The only issue with this approach is, how do you pick the initial information and make sure it's all provably true and accurate and complete. You obviously can't just pour reddit into it since, oh boy, you'll get a unicorn flat earther that doesn't believe in gravity lol xD

    • @veedrac
      @veedrac 2 роки тому

      @@StriderAngel496 You are falling into the trap of seeing that a human has a thing, and assuming without cause that it must be integral, like assuming locomotion requires legs.
      Reddit is not like you have painted it. It has its problems, and severely, in places, but it is not a flat earther.

    • @StriderAngel496
      @StriderAngel496 2 роки тому

      @@veedrac you completely missed my point about reddit... You can't blanket feed information that has provably false sections to a system that assumes all info is true by default.... And that was a joke also, you know? Jokes? Why the fuck are people on the internet such literal anal neckbeards?

    • @StriderAngel496
      @StriderAngel496 2 роки тому

      @@veedrac locomotion doesn't require legs but if you make a humanoid shaped creature it's likely to evolve the same as humans in a simulation too... I never said anything that moves must have legs ffs, you're just arguing semantics just for the sake of it at this point. Snakes move without legs, snails, bacteria, DUUH?. Can we stop assuming everyone we come in contact with on the internet must be an imbecile while I AM THE AMAZING GOD OF 269 IQ? :).

  • @nomenec
    @nomenec 3 роки тому +6

    ​ @Eric Elliott, I hope some point we can have a proper conversation as, in my opinion, UA-cam comments are an inefficient forum for hashing out complex ideas. Hence, this will, unfortunately, be my last response to this thread as I personally cannot make the time required to dialog so inefficiently.
    For example, I am not even sure you and I share the same understanding of the word “falsifiable”. I think in this context you are using it as a synonym for “falsified” which is an error since they have different meanings. After all, it is wonderful (if not required) for a hypothesis to be falsifiable. It is a vastly different matter for a hypothesis to be falsified.
    “Also, the idea that the simple mechanism of the neuron (which backpropogation and stochastic gradient descent simulate) can't explain language understanding seems to fly in the face of human biology.”
    Biological neurons function differently from all employed computational neurons. Biological brains evolve, learn, and function radically differently from the way all DNNs are designed, trained, and operated. Therefore, the capability of biological brains may tell little about the practical capability of their shadow of a toy cousin that is DNNs.
    “[GPT] clearly is learning semantics, not just syntax, which seems to be objective proof that language can in fact be learned, even if humans evolved with language processing pre-wired. The two conclusions are not mutually exclusive. Lots of other easily falsifiable stuff in the video.”
    The evidence is mounting that GPT-3 is not learning semantics by any typical definition of semantics (below you erroneously defined syntactic structure as semantics which we’ll cover next). Here is but one recent example analysis showing several amusing failures:
    www.technologyreview.com/2020/08/22/1007539/gpt3-openai-language-generator-artificial-intelligence-ai-opinion/
    “GPT-3 is a transformer model, which can have many attention heads capable of representing many levels and types of relationships between tokens. Those relations encode structure, and those structures encode semantics over the raw data.”
    Syntactic structure is not semantics. In my opinion, that is a basic misunderstanding of semantics. For example, context free grammars encode arbitrarily many “levels and types” of syntax without an iota of semantics.
    That said, I do not deny that it is possible, in principle, for a DNN to learn bits and pieces of warped semantics from natural language text dumps alone. What I (and Walid if I understood correctly) argue is that there is a vast quantity of semantic information that is compressed out of natural language and therefore not learnable from text dumps alone. I have no quarrel with the idea that a DNN could learn semantics (along with syntax) if it were presented with more complete data that included semantic data.
    “I had an interesting conversation with GPT-3 about advanced software programming concepts. In it, I asked GPT-3 what a closure is. It gave a definition of it. I then asked it to demonstrate in code. It wrote a curried function. I asked it which variable came from the closure scope, and it answered correctly. Then I asked why and it explained correctly how you could know which variable was in the local scope and which was from the closure scope.”
    All of the above are lookups that one could obtain with a Google search + regurgitation of the web scrape snippets. It would have been far more impressive if YOU had provided GPT-3 with an original code example and asked it to explain what the code does.
    “It demonstrated a thorough understanding and aced the same test I use to interview candidates for JavaScript developer jobs.”
    No, it demonstrates that GPT-3 is wikipeducated and that your interview process sets a low bar.
    “Probably, he doesn't know enough about deep learning. His theory about the limits of deep learning are observably wrong.”
    In my opinion, you have failed to demonstrate that and failed to falsify even a single claim Walid made.
    “Even if the model is not in the data, GPT-3 is forming its own model which allows it to predict correct answers. It does build semantic structures.”
    Argumentum ad nauseam. Repeating your assertion does not make it true. So far all I can conclude for sure is we do not agree on the nature of semantics especially the difference between syntactic structure and semantics.
    “GPT-3's 175 billion parameters allow it to encode a lot of common knowledge, and I find it obvious that adding more parameters and training will lead to more common knowledge and better conversational AI and better shared understanding.”
    Yet more appeal to repetition and now begging the question.
    “I do agree that transformers may not be the most efficient way to learn that common knowledge. What I take issue with is the idea that neural networks can't understand - they obviously can and obviously do. Today. Not just theoretically. Empirically.”
    Great, we agree on two of those three claims. Yes, I agree DNNs of all kinds are likely not the most efficient semantic learning systems. Yes, I agree that in principle DNNs can learn to understand sematic structure IF presented with semantic data. I believe Walid is probably correct that natural language corpuses are probably far to compressed to use as the sole data source. No, DNNs are not obviously learning semantics and have not demonstrated this empirically. In fact, they seem to have empirically failed to learn semantics given the amusing counterexamples.
    “Computationally, how much did it cost to achieve that? A lot! Is it possible to reduce that? Sure! But arguing that a neural net can't understand is ridiculous.”
    I certainly did not make that argument and I do not recall Walid making that argument in principle. He argued that DNN + NL corpus alone was a terribly inefficient if not practically impossible path forward. You are beating a strawman.
    “We don't know that "the corner table" refers to people because we were born knowing. We know because we've seen examples. So has GPT-3.”
    No, GPT-3 has not “seen” examples of that; it has read examples of that. Most human beings, by contrast, have seen, with their own eyes, examples along with reading associated text and thus have far more complete data to learn semantics from than GPT-3.
    “You CAN learn language from characters all the way. Characters can represent anything, including all the sensory data a human being can understand.”
    Sure, but then GPT-3 was NOT provided with vast corpuses explaining in gritty detail what human beings see, feel, taste, and smell because we compress a great deal of that common experience out of our natural language. In other words, the data given to GPT-3 CAN contain sufficient semantic data but it DOES NOT.
    Thank you for your engagement!

    • @_ericelliott
      @_ericelliott 3 роки тому +1

      Thank you for your engaging reply. Walid deserves the same respect.
      I concede that it is trivially easy to produce a number of anecdotal amusing failures from GPT-3. There is admittedly a lot of work left to do.
      In the interest of making a better case either way, I ran GPT-3 through a few queries:
      Query:
      GPT-3: Ask me a question that requires me to infer missing information.
      Human: OK. Imagine somebody tells you, "the corner table wants a beer." You take fill up a beer and approach. What do you see there?
      Response:
      GPT-3: The corner table wants a beer.
      Fail. (I've saw it pass this same test the first time I watched the video).
      Query:
      Human: If somebody tells you, "Table three needs their coffee refilled," and you look at table 3, what would you see?
      Response:
      GPT-3: A bunch of people sitting around drinking coffee.
      Clarification:
      Eric: Does the table want the coffee refill, or the people sitting at the table?
      Response:
      GPT-3: The people.
      Pass.
      I continued to question it using several of the examples from the video, along with a few that I invented, e.g.:
      Human: I crashed my car into a pole and totaled it. I had to buy a new one. What did I buy?
      GPT-3: You bought a new car.
      GPT-3 passed 60% of the questions. I won't bore you with all of them, but I asked 10 questions, total. I suspect if you continued and asked a barrage of hundreds of such questions, it would likely come out better than random chance.
      I don't pretend to be the expert that Walid obviously is. However, it certainly does appear to me that if Walid wants to make claims as strongly as he did in the video, much more evidence of the truth of those claims is warranted.
      Yes, GPT-3 can be improved, but I see no convincing empirical evidence that GPT-3 has no understanding of semantics at all.

    • @HaveEGoOdTimE
      @HaveEGoOdTimE 3 роки тому +1

      The statement that NLU cannot be ML because ML is compression is clearly wrong. The output of a ML model is a function of the input together the parameters, which contain much more information than the input. The interaction between the input and the parameters is the expansion Walid says is necessary for NLU.
      Same for all the examples with missing context where a human can fill in the gaps. Many ML models can also fill in those gaps. How? Because they have plausible contexts stored in the parameters.

  • @snippletrap
    @snippletrap 3 роки тому +9

    Some animals do seem to employ cause and effect reasoning, contra Walid here. One example is the orca that baits birds with fish.

    • @sabawalid
      @sabawalid 3 роки тому +3

      I did not refer to animal intelligence vis-a-vis casual reasoning - but language and THINKING that clearly are exclusive to the human species

    • @DavenH
      @DavenH 3 роки тому +1

      @@sabawalid What makes this clear?

  • @alexijohansen
    @alexijohansen 3 роки тому

    Wow, such knowledge. Fantastic.

  • @abby5493
    @abby5493 3 роки тому +1

    Wow. Loved this video!

  • @rodrigob
    @rodrigob 3 роки тому +7

    "01:50:47 Is everything (not) learnable" argument is terribly weak, since it does not account that we all live in the same physical reality (and have very similar embodiment experiences).
    For example we might learn smaller and bigger relations from playing with cups. Almost all kids get to play with cups, and thus learn the same lessons.

  • @DrJanpha
    @DrJanpha 2 роки тому +1

    Walid Saba fits the description by Yuval Harari - activities of philosophers' pastime

  • @quebono100
    @quebono100 3 роки тому +1

    He has a good argument, that human beings are using something beyond the visual system. I would say that is our heuristic system. If you walk in a crowded area, you are not bumping in other persons, because of a simple heuristic. Which are even taught boatmans and pilots. In your head you making a straight line to the person, and if the direction of his/her walking doesnt change you are changing your direction.

  • @dr.mikeybee
    @dr.mikeybee 3 роки тому +5

    Walid Saba is a fascinating guy, but we don't yet know what emerges from large enough nlp models. GPT-3 suggests we're nowhere near the limits of transformer networks. So I will withhold judgement until GPT or some better architecture has been tested with at least a trillion parameters.

    • @bethcarey8530
      @bethcarey8530 2 роки тому

      Why a trillion Michael? No-one articulates how much is enough so at want point does 'a tall ladder not getting you to the moon' or 'a faster car not get you to Mars' apply? I agree with Walid, the component missing from today's AI models is the knowledge part that can't be found in 'more data'.

    • @dr.mikeybee
      @dr.mikeybee 2 роки тому +1

      @@bethcarey8530 The way transformers and convolutional neural nets work is they find higher levels of abstractions at deeper levels. You have to go deeper to know what they will find. So far, we have not seen diminishing returns with larger transformers. Will transformers with the number of synapses in the human brain get us to AGI eventually? I don't know and neither do you. Until we have optimally sized transformers, we don't know what abilities will result. GPT-3 was a big surprise to the community.

    • @dr.mikeybee
      @dr.mikeybee 2 роки тому

      @@bethcarey8530 Take a look at WuDoa 2.0.

  • @franklevasseur5930
    @franklevasseur5930 3 роки тому

    This is spot on, I couldnt agree more.
    Childs can learn to talk from very few and imperfect linguistic stimulus. Nevertheless they still won't make any constitutive errors (errors that break formal rules). They also can infer signification from sentence they have never heard.
    I really don't think any deep neural network can acheive the same thing. Huggingface transformers, Bert and all those the current approach of statistical NLU can maybe GUESS a user's intention, but they surely don't understand language.
    They still can give a lot of value in software engineering tho. They are pretty sweet engineering tools!!

  • @HoriaCristescu
    @HoriaCristescu 3 роки тому +2

    In some languages, 'red beautiful car' and 'beautiful red car' are interchangeable.

    • @sabawalid
      @sabawalid 3 роки тому +2

      they are interchangeable in ALL languages, but which is more common and natural to say ?

    • @sheggle
      @sheggle 3 роки тому +1

      @@sabawalid common and natural in no sense equates to intelligence. If anything it's humans defaulting to the most used form, aka pattern recognition

  • @muhammadsaadmansoor7777
    @muhammadsaadmansoor7777 3 роки тому

    He's knowledgeable, was not expecting that.

  • @skdx1000
    @skdx1000 3 роки тому

    A small point about the adjective ordering, i think this has to do more with the aesthetics of the sound than a hierarchical labeling for adjectives. Ironically, a deep learning model might be pretty good at learning features like this over a structured organization of adjectives because in certain language models adjective ordering is interchanageable depending on how the language is designed.

    • @StriderAngel496
      @StriderAngel496 2 роки тому

      yup, in my language you actually say "a car red beautiful" but you can also say "a women beautiful tall", i just mindfucked myself now by thinking about this and the words don't seem to have meaning anymore. It doesn't make much sense using English syntax but there you go

  • @donschueler
    @donschueler 6 місяців тому

    Q: what would I do if someone tells me "the corner table wants a beer"? A: If someone says, "the corner table wants a beer," you can either check with the people at the corner table to see if they really want a beer or respond in a playful way, asking who specifically is making the request. It's a lighthearted statement, so you can approach it with humor. This is how ChatGPT3.5 Answered. Its still brute force prediction I understand..but it does seem like they are getting better. Is the thought is that this is still a dead end for real NLU?

  • @quebono100
    @quebono100 3 роки тому

    28:34 Today I did something similar with GPT-2. I gave it a text about machinelearning and could extract keywords out of it. By just put "keywords:" at the end. I was also impressed.

  • @yasseraziz1287
    @yasseraziz1287 3 роки тому

    Pedro Domingo's "Master Algorithm" is great when it comes to describing the tribes of ML.

  • @robbiero368
    @robbiero368 3 роки тому +1

    Shout out for the Bob Marley t 🙌

  • @snippletrap
    @snippletrap 3 роки тому +2

    Chomsky is not conspiratorial. He is obsessive about source documents and explaining social-political phenomena in terms of institutional pressures rather than cabals or organized crime.

  • @XetXetable
    @XetXetable 3 роки тому +3

    Yannic's comments on static type checking are really confusing. He complains that a sufficiently advanced type system would allow him to perform the computations at the type level. But that's only possible with dynamic type checking. If the solution to your problem is the output of a program, and your types are static, then type computations only happened at compile time; by definition, this means that none of the type-related computations contributed to the solution. It would only be possible with a dynamic type-checking system like what Python has.

    • @CristianGarcia
      @CristianGarcia 3 роки тому

      I believe type systems for languages like coq, agda and idris have this property (computation at the type level), in these type systems code that compiles is actually a proof so you might imagine that you can do a lot of computation a the type level. I am not an expert but I understand that in general the type system is a language by itself that is usually kept fairly simple because you can run into trouble if it get to complex, and you usually want fast compile times.

    • @HaveEGoOdTimE
      @HaveEGoOdTimE 3 роки тому

      Imagine a calculator program that at compile time computes the solution to every possible input, and then when run only ever uses a lookup table to retrieve the solutions. That would be calculation at the compile time that contributes to the solution of the problem itself. This example is a bit silly, but I guess that kind of compile time calculation is what's referred to.

  • @bokasseloreos3169
    @bokasseloreos3169 3 роки тому +3

    Interesting topic.
    BTW, Walid is pronounced Waleeeeeed

  • @victorrielly4588
    @victorrielly4588 3 роки тому +2

    If you try to teach a child about a chair by using only written words, with no vision, hearing, touch or experiences related to a chair you will fail. So why do we expect a neural network to learn about a chair from solely written text. But surely, if we could give neural networks all of that other data (that children get from their eyes, ears and experiences) surely then you might hope to begin learning about a chair?

    • @bergweg
      @bergweg 3 роки тому

      I wonder if the gpt-3 can be setup to solve idena flips*
      Flips consist of four pictures that make logical/common sense in chronological order, and the same four pictures in the second sequence, but shuffled, thus making no sense. Humans are pretty good at telling which sequence is correct and which is not.
      medium.com/idena/ai-resistant-captchas-are-they-really-possible-760ac5065bae
      idena.io

  • @drhilm
    @drhilm 3 роки тому +1

    Is this just about language requires "common sense", and we don't know how did we learned that? This is a very known problem. You can call it ontological type or a compiler but the thing is that we can reduce every thing to a kind of one huge model of the world in our heads, and we are concious to this process as an observer . We just dont know how to do it with code. Yet.

  • @sheggle
    @sheggle 3 роки тому +1

    Any sufficiently advanced NLP is indistinguishable from NLU

  • @tinyentropy
    @tinyentropy 2 роки тому

    Who is the host of this episode?

  • @GagandeepSingh-rz7ue
    @GagandeepSingh-rz7ue 3 роки тому

    If I am understanding correctly, his views are very similar to that of Prof. Reddy. Hearsay - I is really an elegant system.

  • @vincentmarquez3096
    @vincentmarquez3096 3 роки тому

    I found this conversation a bit frustrating because Walid seems to use "can't" to mean everything from
    1. This is computationally unfeasible with current tech
    2. This is computationally unfeasible
    3. This is computationally feasible but we don't have enough training data
    4. This is impossible given the laws of the universe
    We know that Deep NNs are (can be?) turing complete, so at a certain point if you're going to convince me that Deep NNs are #3 you should be aligned with Penrose and assume that something about human intelligence is incomputable.
    If we want to talk about Deep NNs being #2 or #1, then sure we can make arguments but this should be quantitative and demonstratable.
    I find #3 to be the most interesting.

  • @arvind31459
    @arvind31459 3 роки тому

    @Yannick @Scarf
    How biological entities learn: Vision + Speech + hearing
    How neutral networks learn: either Vision or either Speech.
    So we are basically building handicapped networks and want to achieve AGI.
    Correct me if I'm wrong

    • @sheggle
      @sheggle 3 роки тому +1

      I mean you're not wrong, which is the entire reason multi modal is currently being pushed so incredibly hard

  • @shanepeckham8566
    @shanepeckham8566 3 роки тому +2

    27th!

  • @flaskapp9885
    @flaskapp9885 3 роки тому

    if there a movie based on sir walid , then joaquin phoenix should play the role.

  • @mindaugaspranckevicius9505
    @mindaugaspranckevicius9505 3 роки тому

    Thanks for the video, I know what I'll be doing upcoming days on a medium site :).
    Years ago I thought that if I would be able to write down a thought (say, simple sentence) in a form of the POS (part of speech) for every word then I would be able to ask all possible questions about it (Who/what is/does/did/etc. what, how, when etc., timeline of the thought is also important, similar to DB scenario ua-cam.com/video/b9qwLkJW2ig/v-deo.html&t=1582). This could be a kind of information compression too. But the main bottleneck was a language ambiguity (on average nearly 50% of my language words are morphologically ambiguous - take 10 word sentence and you'll get only ~0.1% probability that system got it right).
    World is hybrid - some of the things we do automatically, some we need to plan logically. I don't think that we should be either in the campus of deep NN's or symbolic AI's. This takes me back to the channel video about system1/system2 (ua-cam.com/video/GYqSNv_j1-Y/v-deo.html).
    What if architecture could be something like this: on a low level there is a BERT/GPT models for pattern recognition/POS extraction or sentence embeddings (towardsdatascience.com/multilingual-sentence-models-in-nlp-476f1f246d2f possible to produce same vector space for the same meaning sentences in different languages).
    "System2" could do a search of these sentence embeddings and apply if/then or any other logic (ua-cam.com/video/b9qwLkJW2ig/v-deo.html&t=8245) to find facts about the world and do it in the continuous loop thus finding a new deeper relationships e.g. suppose there are sentences in the dataset:
    There was a cold weather with a lot of snow.
    Snowy roads are slippery.
    On a slippery roads there were a lot of car accidents.
    A man died in a car accident.
    1st pass (new) result: Under cold weather conditions roads are slippery.
    2nd pass (new) result: Under cold weather conditions there will be be a lot of car accidents.
    3rd pass (new) result: Under cold weather conditions death is possible in a car accident.
    That could be a sort of reasoning and the gaps might be filled this way, contradictions found etc.... P.S. Maybe I am wrong, maybe it is not new, just reasoning myself :)

  • @StriderAngel496
    @StriderAngel496 2 роки тому

    i like how the Google API (which is free btw, and just the public "demo", i'm sure it could probably do more with dedicated processing power in their research lab) just completely destroyed Walid's argument about needing some higher form of thinking to correctly analyze and classify abstract concept in images... I get where he is coming from with his points but i think he's also a little stuck in very old ways of thinking and isn't really aware of some of the advancements in the last 10 years... If you can reliably teach NN to recognize a school, a teach, a mall, a car and clasify them correctly into classes of object, i see no reason why you couldn't teach other abstract concepts to it. If if can get the relations between simple things (let's say a cam is a car because wheels and parrots have feathers etc - this is just an obscene simplification, i know NN don't actually learn by naming things, it's just numerical weights) then why wouldn't you be able to build upon those simple abstract concepts into more complex layers of abstractization... Technically speaking, to increase the abstraction depth you just have to increase the number of layer, am i missing something? Isn't this basically the same way NN have been able to recognize text and voice for years now? The base layers take the basic waveform then the next layer decodes phonemes then the next layer words then the next layer context etc... Now, obviously, to be able to compose ANY abstract concept possible you would probably need thousands of layers in a neural net that has hundreds of thousands of neurons per layer. Buuut, if they can run it with 175 billion neurons i seen no reason why they couldn't multiply the number of layers... It already seems to be grasping classes of object pretty well (remember? B A N A N A S A N D O R A N G E S are fruit and B M W A N D V O L V O S are cars?). I'm actually really curios is anyone ever tested if it can break down categories and explain WHY something is a certain TYPE. Because if it can, then yes it has a certain depth plateau right now but that plateau could be raised. And IF it can correctly define the groups that IT CAN recognize right now, doesn't that just mean it is already capable of some kind of "thinking" up to that level? Maybe complex mathematical questions or logic statements are too abstract for this version, but if it can actually "think" on ANY level at all than that proves that 'more data and more power' might actually make it better. Substantially better... Just my take on the matter, idk :)

    • @StriderAngel496
      @StriderAngel496 2 роки тому

      All of the puns aside, I find Dr. Walid to be very interesting and smart, and I'm quite curios about more of his work. I don't want it to be misconstrued that i have something against him xD. I'm still a little on the fence. I get everything that he's saying, I'm just not convinced that the Types and structure he is talking about can't just emerge naturally in a neural net (even if you can't point to them and name them). After all, our brain chemistry also evolved in a probabilistic darwinian manner, i think you get to a point where if you have enough information and processing power complex systems start emerging by themselves, without needing hardcoded rules. For example, you can make AIs today that can run, jump, push things in a physics environment and they have no idea of their existence, of the concept of an arm or a leg or a wrist but given enough info and let to their own devices (many iterative generations where you take the best scoring specimen from each generation), they get to a point where they can "run" in a manner that looks a little goofy but very similar to the way we actually run. How could this system figure out that is the optimal way of walking without any understanding of any concept, any name, any body part etc... It's just trial and error. It may be a very savage brute force approach but that's also how biology evolves, it's also brute force over many generations with slight mutations and whatever happens to survive must be good so let's make more of "that"... I think we are now still in the stage of biological evolution with our Neural Nets and we haven't really reached the consciousness level yet. Another side note is that it may be extremely inefficient and computationally absurd to create the first thing that even remotely resembles intelligence, but once that "singularity" has been achieved the recursive process of self improvement should take over soon and it could reasonably fix all the shitty code we made and make it amazing thus being stupid and inefficient for only a few hours/days until it rewrites itself efficiently xD. Very small increments would turn exponential quite fast, i mean, imagine you could increase your IQ by 1 in an hour but at every iteration you use your new IQ to think how you can increase your IQ more in the next hour... It wouldn't take very long to go from average to Einstein levels :))

  • @MachineLearningStreetTalk
    @MachineLearningStreetTalk  3 роки тому

    Walid has a paper out which expands on his ideas "Language and its Commonsense: Where
    Formal Semantics Went Wrong, and Where
    it Can (and Should) Go" 4ebde952-0bd0-43c2-bf47-30516f762816.filesusr.com/ugd/7e4fcb_3317bd434a434a45b13adb6fdfdfa5e7.pdf

  • @machinelearningdojowithtim2898
    @machinelearningdojowithtim2898 3 роки тому +1

    First!

  • @TheReferrer72
    @TheReferrer72 3 роки тому +1

    There is very little things we disagree on? Wow the earth is flat, the sun goes around the earth & its turtles all the way!

    • @sabawalid
      @sabawalid 3 роки тому +2

      Yes, there are very few things we SHOULD disagree on. If you disagree with someone who thinks the earth is flat, guess what can you call him/her: STUP*D
      Sure, there are people that also believe life started 6 years ago, with Mr. Adam and Ms. Eve, and guess what I think of them?
      By "There are very few things we disagree on" I meant "There are very few things Rational, Logical people SHOULD disagree on" :) :) and precisely because KNOWLEDGE is a set of facts/truths but in fold psychology we confuse "knowledge" with beliefs, skills, opinions, etc...

    • @Hexanitrobenzene
      @Hexanitrobenzene 2 роки тому

      I think the point was "little, compared to the things we implicitly agree on". For example, not even flat earthers would say that gravity points up. Walid mainly meant things we assume without [much] thinking, as I understood.

    • @TheReferrer72
      @TheReferrer72 2 роки тому

      @@Hexanitrobenzene Really? I know some one with patents who thinks that there is no such thing as gravity and that it is just electrostatic charges! He has published a book about.
      Humans are taught more than Walid thinks.

  • @CristianGarcia
    @CristianGarcia 3 роки тому +2

    I guess he didn't like "Attention is all you need" very much 😅, very peculiar talk.

    • @sabawalid
      @sabawalid 3 роки тому +1

      Because "Attention is NOT all you need" :) :)

  • @lc3487
    @lc3487 3 роки тому

    There is no job for NLU but only NLP. It just shows how little knowledge people have about the difference between NLP and NLU

  • @hectorvillafuerte8539
    @hectorvillafuerte8539 Рік тому

    Today with Chat-GPT , I enter as the sample mentioned at the beginning of the video:
    Question: The trophy did not fit in the suitcase because it was too small. The word "It" refers to the trophy or the suitcase.
    Chat-GPT: The answer: In this context, the word "It" most likely refers to the trophy. If the trophy is too small to fit in the suitcase, then "It" would refer to the object that is too small. However, without further context, it is possible that "It" could also refer to the suitcase if there was some kind of structural issue that made it impossible to fit the trophy inside, regardless of its size.
    Conclusion: There is not inference in the probabilistic text processing used by Chat-GPT, Walid Saba is right.

  • @fast_harmonic_psychedelic
    @fast_harmonic_psychedelic 2 роки тому

    " you can only be one thing or the other thing, not two things". you clearly have not studied hegel or Engels. contradictions are the law of reality.. everything is both itself and its own opposite. Everything contains elements of its own negation as that thing

    • @fast_harmonic_psychedelic
      @fast_harmonic_psychedelic Рік тому

      ​@@lepidoptera9337​ you're not clever lol

    • @fast_harmonic_psychedelic
      @fast_harmonic_psychedelic Рік тому

      @@lepidoptera9337 my philosophy is that philosophy has only interpreted the world in various ways but the point is to change it

    • @fast_harmonic_psychedelic
      @fast_harmonic_psychedelic Рік тому

      @@lepidoptera9337 bro thats literally the philosophy I"m advocating. The philosophy of science -- Natural Philosophy as it was originally called before it was called science. The philosophy of Science is Materialism. You by arguing in support of Science are a Materialist - like me. The philosophy that the World is knowable, law-governed, deterministic, real, and that our senses and our instruments give us an image of the objective reality which becomes more and more focused and high fidelity the more our measurements and theoretical knowledge improves through trial and error.
      Dialectical Materialism is all that + it recognizes that its not just what we can measure and directly prove with statistical data, but that the nature of all change in the universe is that of contradictory relationships, dynamic tension between opposing forces, and the unity of those opposites creates complex systems. given that it follows a logic and that these underlying dialectical laws give rise to the other laws you can make deductions and extrapolate beyond empiricism and make predictions beyond the microscope if you think dialectically. All the great scientific theories that stood the test of time are dialectical theories. The theory of Evolution is a prime example. The fact that life changes and does so not only incrementally but suddenly in leaps and bounds is a a dialectical property and Darwin and others who understood this were able to apply it to various phenomenon. Matter evolves, systems evolve, animals evolve, societies evolve, adn they all do so in stages of relatively little change followed by huge movements, etc. Thats dialectical materialism. Hegels dialectics is idealist dialectics, because for that school it only applies to human ideas. But We say dialectics is how matter itself operates - and its just that thought is a product of matter and follows th ose same laws of the material universe from which thought and consciousness emerges. We can understand those laws and take advantage of them, replicate them in technology, and make assumptions about whats true and whats false based on that even before the empirical proof catches up

    • @fast_harmonic_psychedelic
      @fast_harmonic_psychedelic Рік тому

      @@lepidoptera9337 You're talking about idealist philosophy. They ruined philosophy and now everyone associates philosophy with idealism. but thats not the only philosophy. Materialism is a philosophy -- the belief that science can give us objective truth is core tenet of materialism. Thats your philosophy without you even knowing it.