#80 AIDAN GOMEZ [CEO Cohere] - Language as Software

Поділитися
Вставка
  • Опубліковано 20 тра 2024
  • We had a conversation with Aidan Gomez, the CEO of language-based AI platform Cohere. Cohere is a startup which uses artificial intelligence to help users build the next generation of language-based applications. It's headquartered in Toronto. The company has raised $175 million in funding so far.
    Language may well become a key new substrate for software building, both in its representation and how we build the software. It may democratise software building so that more people can build software, and we can build new types of software. Aidan and I discuss this in detail in this episode of MLST.
    Check out Cohere -- dashboard.cohere.ai/welcome/r...
    Support us!
    / mlst
    Pod version: anchor.fm/machinelearningstre...
    TOC:
    [00:00:00] Aidan Gomez intro
    [00:02:12] What's it like being a CEO?
    [00:02:52] Transformers
    [00:09:33] Deepmind Chomsky Hierarchy
    [00:14:58] Cohere roadmap
    [00:18:18] Friction using LLMs for startups
    [00:25:31] How different from OpenAI / GPT-3
    [00:29:31] Engineering questions on Cohere
    [00:35:13] Francois Chollet says that LLMs are like databases
    [00:38:34] Next frontier of language models
    [00:42:04] Different modes of understanding in LLMs
    [00:47:04] LLMs are the new extended mind
    [00:50:03] Is language the next interface, and why might that be bad?
    References:
    [Balestriero] Spine theory of NNs
    proceedings.mlr.press/v80/bal...
    [Delétang et al] Neural Networks and the Chomsky Hierarchy
    arxiv.org/abs/2207.02098
    [Fodor, Pylyshyn] Connectionism and Cognitive Architecture: A Critical Analysis
    ruccs.rutgers.edu/images/pers...
    [Chalmers, Clark] The extended mind
    icds.uoregon.edu/wp-content/u...
    [Melanie Mitchell et al] The Debate Over Understanding in AI's Large Language Models
    arxiv.org/abs/2210.13966
    [Jay Alammar]
    Illustrated stable diffusion
    jalammar.github.io/illustrate...
    Illustrated transformer
    jalammar.github.io/illustrate...
    / @arp_ai
    [Sandra Kublik] (works at Cohere!)
    / @itssandrakublik

КОМЕНТАРІ • 35

  • @bissbort
    @bissbort Рік тому +12

    "Relief" is the first thing that came to mind when I saw this video pop up. I was already afraid that MLST had gone on hiatus. It's still one of the most underrated YT channels out there given the production and content quality IMO! If nothing else, future generations will be thankful.

  • @Self-Duality
    @Self-Duality Рік тому +25

    This is a gem of a channel 💎

    • @TheReferrer72
      @TheReferrer72 Рік тому +2

      Always has been, can't understand why it's not more popular.

    • @Alice8000
      @Alice8000 11 місяців тому

      You mean me

  • @vinca43
    @vinca43 Рік тому +3

    It was fascinating to hear a behind-the-scenes perspective on the development of the transformer architecture. Aidan is so humble and unassuming. Very impressive.

  • @vinca43
    @vinca43 Рік тому +4

    If get nervous when I hear (starting at 48:08) discussions around "outsourcing" mundane intellectual activity. There is no doubt that I was a better mathematician / math enthusiast when I had limited access to computing devices and unfettered access to the math stacks at a university library. I don't believe there is a shortcut to internalizing and understanding patterns from rote processes, without actually carrying out the rote processes. I would have never gotten through Algebra without 1000s of hours of elementary computation in grammar and high school that helped my mind to intuit patterns, and prime me for theorems that formalized these patterns. And that served as a foundation to iterate between increasingly complex rote processes and increasingly abstract patterns, formalized by increasingly bonkers theories. All this to say, if we give up the mundane tasks, we leave the greatest part of learning to the imperfect neural net/model. IMHO.

    • @edz8659
      @edz8659 Рік тому +1

      I think the best learning of new knowledge (knowledge that is new to you) happens with teachers though in a conversational format. Becoming good at that skill is of course through practice but there's a reason people still pick uni over watching videos. Being able to query back and forth on a topic is immensely useful in not only learning but understanding and applying.

    • @vinca43
      @vinca43 Рік тому

      @@edz8659 great observation. VideosUni. The best of my learning happened on a dusty grad student chalkboard, scribbling thoughts with fellow grads after a mind-imploding lecture.
      Ultimately, key ingredients: great mentor/teacher, great peers, and dedicated time on task.

  • @billykotsos4642
    @billykotsos4642 Рік тому

    always worth the wait for these vids. Such interesting and high quality discussions

  • @simonstrandgaard5503
    @simonstrandgaard5503 Рік тому +2

    Excellent interview.

  • @snowballcapital
    @snowballcapital Рік тому

    35 min in. needed for our future ai-personal secretaries.

  • @earleyelisha
    @earleyelisha Рік тому +1

    Monday treats?!! MLST, you’re too kind!

  • @ariramkilowan8051
    @ariramkilowan8051 Рік тому +2

    Aiden should record audio books :)

  • @MikeArg-hp9ho
    @MikeArg-hp9ho 29 днів тому

    Can anyone explain how a recent intern becomes an expert in anything in a matter of 1-2 years?

  • @MuhammadArrabi
    @MuhammadArrabi Рік тому

    Would be interesting to look again at the papers on limitations of transformers, in light of the progress in last 5 months.

  • @hobonickel840
    @hobonickel840 Рік тому

    wonder what the transformer can do with information jwst collects thinking of this model just blows my mind. Or what about the language model which represents all the particle data something like CERN has collected over last 20 years down to the cookie crumbs ... emerging is correct

  • @Alice8000
    @Alice8000 11 місяців тому

    I'm jealous of his success. ANGRY EMOJI

  • @SimonJackson13
    @SimonJackson13 Рік тому

    Is not a stack a reference to the past for attention? Statistical time augmented transforming, with delay wrong augment feed-forward?

    • @SimonJackson13
      @SimonJackson13 Рік тому

      A feed-forward wrong time? A feed-forward chaos utility spreading stream? Feedback range acception?

    • @SimonJackson13
      @SimonJackson13 Рік тому

      Usage statistics follow availability? Sensorship happens on stack memory drop/nip?

    • @SimonJackson13
      @SimonJackson13 Рік тому

      Is it possible to invert users? Genetic algorithms would suggest the entropy difference from want to supplied is maximally variant?

    • @SimonJackson13
      @SimonJackson13 Рік тому

      Optimal cull of maximal mutation potential?

    • @SimonJackson13
      @SimonJackson13 Рік тому

      I found today enumerate implies estimate, chatgpt couldn't, but should have maybe used a reduction to provide instead of a definite can't.

  • @m_ke
    @m_ke Рік тому +2

    Should have asked him how they will compete with huggingface, not "open"AI.

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  Рік тому +5

      HF is a different category entirely i.e. deploying your own, much smaller models (in most cases supervising the deployment, you need to know what you are doing). Because the models are smaller and less generalisable, they are fine-tuned for doing specific things rather than the LLM/Cohere idea which is that you can use a single planetary-scale LLM to do anything. OpenAI is the only other player in this game, at this scale which I am aware of. It's certainly true that most things you need to do as an app developer are narrowly defined (i.e. NER), so simple pointillistic models would work... but then you have the ML DevOps / engineering problem, which blows up fast with many models in an app platform. Imagine if a lot of that complexity melted away because it was all just a single LLM?

    • @m_ke
      @m_ke Рік тому +5

      @@MachineLearningStreetTalk open source models are free and just like stable diffusion there will be plenty of large open source language models soon (see BLOOM). For 99% percent of business use cases (not blogspam generation) a smaller fine tuned model will perform much better than a huge general model that was trained on random web data.
      Cohere will have to learn the hard way that companies that have people who know how to use outputs from machine learning models usually also have people capable of training their own models on domain specific data and that due to the cost of hosting these large models it will make more sense for most of their large customers to develop things in house. Deep learning research being so pro open source also means that in 6-12 months new methods will be available to everyone that are as good as whatever they spent building internally.
      Also like you mentioned in the interview, the latency of sending all of your data over http to an external service is suboptimal and you'll get much better throughput by doing batch inference in your own VPC or save money by moving the model to customer devices (obviously not trillion parameter language models).

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  Рік тому +3

      Thanks for the thoughtful response! Will be interesting to see how this pans out. I agree that Stable Diffusion was a watershed moment i.e. the community suddenly got a very capable generative vision model to play with (although there has recently been a lot of drama, as you will know from watching Yannic's videos!). Currently public LLMs are nowhere near as good and you need to know what you are doing i.e. do the engineering yourself, be a good software engineer and create a pointillistic solution. You may well be right that a similar watershed moment might arrive soon for LLMs, but it's definitely not there yet. The interesting story here for me is that large scale and generalisable LLMs might remove some of the complexity of software engineering, as it becomes a matter of prompt engineering and language becomes the interface.

    • @m_ke
      @m_ke Рік тому +1

      @@MachineLearningStreetTalk a big hurdle is getting business people and web/mobile devs to be comfortable dealing with functions that can output the wrong/stupid answer a decent % of the time.
      We went through the same thing with CV at clarifai, CNNs were hard to train and deploy in 2014 but that didn't last long and every summer there were new imagenet results coming from teams from all over the world, something that no single team can stay ahead of. Back then it was also way easier because machine learning was a much smaller field and things like tensorflow and pytorch didn't exist.

    • @DrIndjic
      @DrIndjic Рік тому

      @@m_ke excellent point about doing as many things on the client side - AI-at-the-edge - as increasingly #5gservices.

  • @Adovid
    @Adovid 11 місяців тому

    Love, more, hate, less