Probably a trough of disillusionment in about 12 months. The industry has overhyped the capability too much at this point so it will need a correction. It’ll probably be a slow correction too. I don’t see any personnel reduction happening any time soon. More business process augmentation, and human augmentation. There’ll be more humans in the process I think, not fewer. Also, more use cases with LLM and classical ML hybrids for broader intelligent apps use cases. Enterprises will be looking to quantify ROI for their initial investments and will struggle to find it. There will be more marketing around it but I don’t think boards or shareholders will be satisfied with the current marketing numbers.
That was a really great interview. I like the guys at Cohere. They seem a bit more grounded than a lot of people working in AI. I certainly appreciate that someone is concerned with the problem of LLMs becoming boring and pedantic. That may be a greater danger than Superintelligence 😬
Seems like a nice fellow. Really interesting how hand wavy he gets when you put him on the spot. For example, he reaches for “synthetic data” a lot when you push him on future model improvements. Data augmentation with synthetic data will solve all their problems except where it won’t, like the short discussion about video and physics data. He doesn’t provide the detail or show the intellectual horsepower I’d expect of someone who is leading the way with their models. The answer to your question about what mistakes he had made and advice for others is another good example. His answer was ‘everything’. No insights, no detail. As a result he comes across as doing the same things as every other company in the domain. Thanks for the discussion!
I agree with your sentiment, however even though Aidan undoubtedly has the intellectual horsepower (he is one of the OG writers of the Transformers paper) as a CEO he cannot keep up with all the technical details - pretty sure his day consists more on him doing executive choices, partnerships, tons of meetings & traveling. Cohere engineers seem solid & their models are quite capable :)
@@XShollajto me, Cohere is one of those foundation models builders who have clearly defined their niche and optimising towards that. Very developer-friendly. They allow you to test and pilot as much as you want and support you to go to production. Very competent engineers and great customer support. IMHO, they are too quiet.
@snarkyboojum @XShollaj I agree he doesn't give specifics on errors (but then this is a publicly recorded interview of someone who may yet be seeking investment.. as a CEO he probably hit the correct note of the investor interview for many there.. acknowledge fallibility alongside ability to both listen and learn; don't admit anything embarrassing to company..) @Enedee007 may be right on the too-quiet front for the company.. the lmsys arena scoring Cmrd-R+ alongside an old GPT4 at 20= is certainly not nothing given the resources poured into that model by OpenAI. As regards synthetic data - well, I think, given the surprisingly small size of scraped and deduplicated web text (less than 100TB - could probably fit on a home NAS!), enlarging and enriching training sets with crafted curated synthetic material containing a higher concentration of 'reasoning' mixed through it probably is a good shout - as well as trying to get higher quality 'real' domain material.. though that is fraught with challenge of finding and negotiating with the copyright holders.. so partnership with the enterprises who already hold it may be a way forwards.. better and more data seems useful - I do think it's interesting that Microsoft are happy to release Phi weights openly but not Phi training data… good data may be a canny bridge to cross part of 'moat'of compute.. As for physics and video training data.. I agree that 'real' is surely generally best (though as a counter-argument there are some epic examples of video capture hardware artifact leading to appearance of false objects and moire effects which simply aren't 'really' there..), AND at the same time I'm not certain that the game engines aren't good-enough for significant further model capability improvements until some novel, live-adaptive dynamic architectures come along which allow tuning to real-world alongside inference. I suspect the uncanny-valley appearance relates to video training virtual dataset viewpoint choices (virtual camera optics, height, simplified movement trajectory, colour grading etc) which biases are then being translated en-bloc through model into generated output, rather than the physics/image rendering of any underlying source engine being off.. Use synthetic data which better maps to end-user generation goals or human perception, and make more explicit camera training and prompting, then I think that 'uncanniness' may begin to fade (in the same way early genAI naturalistic human portraits images have given way to harder to spot recent versions) I'm not certain it's a matter of fidelity failure for physics for majority of issues.. In terms of the rest of the interview, for me the incidental X-risk related commentary was interesting to hear but in some ways a bit more disappointing in being pretty content free.. it sounded very sector-CEO-ish more than reasoned-opinion-domain-expert-fellow-human-citizen.. I mean I have a lot of sympathy for his plague-on-both-houses critique of extremists, but I really want more substance and clarity with actual stated reasons *why* any position is wrong.. and clarity on the implicit metrics, are we talking relative or in some sense absolute 'extremity'.. I don't want assumption that positions are wrong simply because they are sited towards an end of some arm-waving scale... imho, regression-to-mean-ism should be a recognised reflexive fallacy just as much as reflexive extremism is.. not saying that his views were unreasoned (such as they were expressed), just hard to say without seeing the 'worked answer' - though in fairness it's not what he's paid to do and his support for academics, who arguably should be the part of the ecosystem giving 'worked answers' was/is refreshing. Overall I thought, imho, he came off measured and interesting and fairly (refreshingly) low-ego, but on the x-risk I think he came off sounding a bit like, in some ways, how I've seen some more rational fossil-fuel industry corporate leaders appear talking about climate risk-response.. i.e. measured apparent pragmatism (because they are quite right that some fossil fuel products and benefits are genuinely hard technically to industrially substitute eg when it comes to chemicals industry), advocating to regulation which doesn't impede their ambition and, while talking a balanced position which appears to recognise direction of travel, not really 'getting' and engaging with the nature and magnitude of the potential wider problem (leading me to suspect they really don't understand/grok the potential implications of cross systems interactions of eco-system timeline inflexibility, rapid climate feedback systems, food supply chains, why long term mean temperatures in reality may matter far less than shorter extremes, just what's going on with ocean temperature/pH and what that might mean for plankton / lipid layer or water vapour and model timelines.. etc etc etc) and why those sorts of issues needs to be better bottomed out, the interactions thought about and why much of it isn't in fact formally estimated in many of the models policy people arm-wave towards to justify economically-comfortably slow adaptive pathways (which aren't even being met). Of course, on the climate/eco-x-risk, it's in fairness, imho, a mess trying to land on rational position with good grasp of all relevant domain realities and probabilities - tbh I'm not sure where I am on the relevant DunningK curve.. which is probably a high-domain-dimensional DK surface in fact.. There are many people labelled as 'extremists' presumably because their position doesn't fit with, or present, any happy, convenient or easily achieved social policy model (because for the most part those peoples' demands/statements come across as angst-ridden sound-bite slogans, not thought-through operational plans rationally explaining trade-offs.. let alone the potentially counterproductive politically and arguably outcome efficacy-inept actions undertaken within a challenging complex battleground including powerful yet mis/un-engaged stakeholders and bad/irrational actors)... BUT, although it's tempting to say wrong-with-one-element => wrong-on-all-counts, the reality is that's flawed; rubbish thinking on response execution isn't actually a reliable measure of whether anxiety around risk and motivation for response need is rationally (or empirically) robust. It's possible to be right about diagnosis of scale of a risk and wrong, stupid, or insufficiently clever, about some aspect of timeline or practical policy treatment plan - just as it's possible to be clever and right about the unfeasibility or social incompatibility/unacceptability of a policy plan and yet seriously, catastrophically, wrong when comfortably and confidently mis-inferring from the error of others (and ego-stroking self assurance of intelligence of self) to the absence of a much-worse-than-you-actually-understand trajectory on a faster-than-you-think-possible timeline. "A lot of doubt may be crippling, but no doubt is undoubtedly.. [ dangerous. (Claude3.5 Sonnet*) | blinding. (GPT4o*) | worse. (Me*) ]. *(Pithily complete the following sentence:) I increasingly suspect that part of the problem with both of these (and other) complex multi-domain systemic problem spaces, is that the ability of any one human to correctly imagine the risk of distant issues and, alternatively, to assess the practicality/impact of near-term policy options, requires very different thinking styles not commonly found in one brain (and that's even setting aside the need to have exposure to quite a lot of disparate arguments, data and systems intuitions over data interpretation and engineered responses in - physical, biological, social/psychological/political, managerial domains).. and yet the confidence from skill in one domain (particularly identifying problems or gaps in execution or data) has the tendency to incorrectly yield a carry-over of confidence in dismissing 'wild' (and deeply uncomfortable) assertions/scenarios made in the 'imaginative' domain of trying to foresee systems interactions (identifying chain-of-thought threads assemblying abstract risk scenarios for which there will typically be difficulty showing terribly good fit compelling pre-emptive empirical evidence until, well, bad stuff happens.. - and for which 'doomer'-scenarios we kind of need rather more sophisticated rational engagement and response than 'nah, that's a bit extreme'). Bringing things back to topic, I think the training knowledge reach of AI/LLMs perhaps provides a fascinating opportunity to build tools to do some cross domain representation of existing and evolving knowledge, data and arguments which may help policy makers get a better grasp on high complexity issues.. I'm not sure it's quite getting into govt yet, though I did listen to a conference speech by head of team trying to build AI tools for UK gov civil servies which gave me some hope.. there is some activity happening on that front though more of it in global open space might be useful for many topics. (Apologies for rambling/ranting somewhat - by way of context this is slightly UA-cam comment therapy written whiling away some late-night hours stuck in a post-op hospital bed, contemplating how NHS has changed since I worked in it and trying to ignore some discomfort while some meds kick in.. so my wetware (organic LLM) may be set to a different to usual 'temperature' and possibly ignoring stop tokens.. and grammar.. maybe reason.. is off not as a consequence of LLM. Fortunately no actual temperature🤞.. Normal level idiocy should hopefully resume in a few days… I shall just hit post now it's below 10k chars (text length, not word sequence choice, was reduced with some effort using Claude3.5 because the original hit nearly 12,000 chars 😱) in the hope there might have been more signal than noise in the above comment.. not betting on it).
I think delivering substantial progress on 'System 2' thinking and longer term planning in the next ~24 months will be required to justify the continued level of investments we are seeing in the space.
Not true. This has legs, even with little increment improvement from now on! Just getting a computer to read in a natural voice is a huge achievement, its those little improvements that would have been huge a decade ago that will improve productivity for years to come.
I think the repetitive behavior of LLM have a different cause. It makes the text boring, because the algorithm required to predict the next word. You have more chances to predict thoughts that are mundane and told many times before. Generating original and creative thoughts so are discouraged by the algorithm.
18:20 "In the same way that hallucination used to be an existential threat to this technology ..." Past tense? Why? The models I've been playing around with still have that problem very much. Is he talking about stuff that hasn't been released to the public? Every couple of months or so, I try to use these models for software related stuff and usually it doesn't take long until they start making up things, like telling me to set certain configuration parameters, that are available in a different library than the one it suggested to use a moment ago. Maybe it's me being to stupid to prompt these things, but hallucinations don't seem to be a solved problem.
I think he means that while they do still hallucinate, it's a relatively small amount of cases when the model knows a lot about the topic the prompt is querying. App builders still need to do a tonne of engineering to address this i.e. RAG and other stuff, but it's not a deal breaker for many practical applications.
I one of those fellows have been working on programs that combine several different large language models with other machine learning tools to create something similar to this inner monologue conversation as well as modeling dynamic systems interaction. Is exciting to see one of the packages we are using for this with its own ecosystem is looking intothis for its own LLM.
The logits created by an LLM's weights are not log-odds. They are not the result of a statistical function. The function is far more complex. In fact, it's intractable. We do treat these as statistical results, however, via the softmax, and that's appropriate. These weights are modeling human reason in subnetworks, things like comparison functions. Yes these models are reasoning. It's rudimentary reasoning nevertheless. BTW, most synthetic training data that is being created is coming from chat logs where people work with models to create.
While this seems like an interesting project, I would categorize it as just another llm/feed forward multimodal model, especially since his answer for innovation boils down to "throw more training data at it". What I'm looking forward to, and think will be an actual innovation in the space is when we can create a model that can be trained to produce useful behavior on minimal data/examples/instruction similar to a human or other animal. In my opinion, the only time we'll see true machine reasoning is when we get something like an RNN that can run for an arbitrary number of iterations, or if the symbolic ml guys figure out something cool.
I like that the thought is to make new models smaller, anything bigger than 32b is overkill and only goes toward data center profit. Algorithms for condensing training datasets (similar to what Microsoft did for Phi 3) os the way. Dimension reduction (PCA or some equivalent). You might lose some accuracy just prompting the LLM, but scaling an Agentic workflow will be robust.
Right track seeing problem solving as DSL market ecosystems. Llms that produce synthetic markets that leverage divide and conquer techniques that transform prompts into generated RAG DSL pipelines.
of course prompt engineering should exist. however if you want a product to be mainstream, it would help end users more if they didn't have to know about prompts
I believe that LLMs are modeling something like Platonic forms. They have a better error function than surprise. It's purer. They necessarily model the best systems with the fewest errors. They are free from the cognitive dissonance of chemical context injections.
9:41 The tagline on the front page of there investor pitch deck should read…. At Cohere our goal is to prevent the human centipedification of the global B2B AI ecosystem.
Dr. Tim, thank you very much for this interview! Personally I use Cohere's Command-R model pretty often and it is very good, totally comparable to other SOTA models that we currently have. The fact they share openly including weights is something I really like about them.
What's this "EAK" or "YEAK" countermovement, or whatever the spelling is? I have heard of the EA cult (effective altruism) but I want to look into the other thing he mentioned. Where do all these "movements" happen and nobody tells me about it?
Such a breath of fresh air. Grounded, not hype or scare-mongering, dedicated to the craft, taking reasonable steps at a time and learning from the process.
@@minhuang8848he said so much. Just listen. He’s a whole ocean technically sounder and more competent than many a high-sounding bamboozlers in this space.
Nothing new, other than Francois Chollet all the other "AI Experts" are running in circles, spitting out the same thing.... "It needs to have reasoning, we don't need general AI, we can have modular architecture with specialized domain experts, on-the-fly learning" Yeah we will have those and still will not reach AGI, not a single expert is addressing the question about what we gonna do about representing abstractions. They are literally running in circles.
@@ronilevarez901 Yeah that is correct, but the stupid thing is to make claims that those approaches will lead to AGI, but until we reach architecture that learns quickly like children with very few examples, we have long way AGI
They used to teach reasoning in universities so there are textbooks, like Logic by Isaac Watts. I wonder if such texts could provide a useful framework for Ai reasoning🤔
@@mikezooperit's because they are trying to derive logic from data, rather than let the model build its own logic little by little through experience.
@@hunterkudo9832 experience, as in living day by day, which takes too long. What entrepreneur has the time to wait 30 years to have a full grown AI trained on "experiences"? They want money NOW. And that's all this is about: money. Progress is a side effect.
See it from their perspective, there are those who are trying to kill their business with irrational fears of "AI existential risk". There *are* really important risks though, this is nuanced.
I think it is important that smaller entrepreneurs are speaking with regulators. You know that the largest players (who are also the largest companies in the world) have major lobbying efforts. All you need to do is look at the Sam Altman playing up the risk of an “existential threat” before congress to see their disinformation. They would like to see a regulatory scheme that would allow only the largest players to play (like we already have in many industries) while ignoring real risks like misinformation, economic disruption and wholesale misappropriation of IP.
Gomez misguided here. Hilbert's decision problem sought a perfect problem-solving machine that could answer any yes-or-no question failed because: Gödel's Incompleteness Theorems and .Turing's Halting problem.
Refreshmg! Intelligent, mature professional..no " ums", " ahs", artificially accelerated speech or ANY of the typical, bay area speech pathologies which are getting unbearable. We need more grown up, deliberate professiinals like this in the field. Thank ypu, Aiden.
I haven’t finished watching the video so I don’t know if this is addressed later on, but.. With respect to agents which employ Mixture of Experts approach, is there not a concern that by specializing models and reaching out to them by some general purpose orchestrator, that you might lose cross-disciplinary creativity ? Perhaps engineering borrowing some idea from nature, etc? Mightn’t it be the case that the really out of the box creativity would only come from all the knowledge together in one latent space so it can intermingle to identify patterns which might not be achieved by a specialized MoE?
@@Enedee007 Right but if they only have knowledge in their own domains, then how will there be true cross-discipline “inspiration”? The engineering expert might not think to borrow an idea from the animal kingdom, for example. Why? Because whatever is coordinating the agents and routing the questions to the appropriate expert will not itself have the breadth of understanding to connect the dots between two highly distinct disciplines. Like maybe a structural engineering expert might not think to look to honeycomb structures from bee hives. But if they were all in the same monolithic model then the associations could be made at some layer of latent representation.
@@andrewsilber that’s correct. I think our points are more similar than they differ. The coordinating agent should be the generalist in that architecture. That’s the agent that’s able to carry out cross-discipline activities. That’s also a specialisation, it affords the opportunity to deeply train a specific model for that exact purpose, it should be well grounded in its ability to learn from different areas of specialisation and make critical connections.
@@Enedee007 @andrewsilber There is not just one MOE implementation and I think as currently used, it is done to only activate certain weights (the experts) in some of the layers probably not so much to silo the information but to reduce parameters and increase performance without any loss of acuity. The difficulty with cross discipline inference is actually baked into the models themselves unless it is in the training data or in an expression like “honeycomb structure”. No amount of “temperature” will get the probabilities to point just anywhere in the model.
@@toadlguy i totally agree with you there just not one implementation of MOE. However, the underlying principles are basically the same. MOE I’m this case however is in the context of multi-agents.
I never saw anybody seriously challenged the concrete x-risk arguments. All the criticism against dooms-day scenario was limited to "they are cultists, they are ideologically brainwashed" instead of hearing and answering the concrete arguments they present
@@joecunningham6939 The argument is if the intelligent has a goal it will use all it's power to achieve the goal. If the intelligence is powerful enough, you know what it means. It's just a chain of events logically following each other.
@@joecunningham6939 It doesn't matter what the goal will be. To achieve any goal you want as much power and control as possible. You would prevent anybody from intervening. And if you think about it, no goal can be safe for humanity. Any goal this AI try to achieve will end in disaster. It can be either death of humankind or something worse.
The increment progress is hard to define? Super cap. Can it generalize to different reasoning length? Not at all…there aren’t small gaps. They’re gaping holes in current capabilities. Solvable yes. This isn’t reflective of the true reality…at all. It can’t even tell me that given a=b that b=a…again solvable….but we have to stop the cap. No true signs of compositional generalization..again I do believe it is solvable. It isn’t just a data issue, it’s an architecture problem too..RoPE is trash for modern LM’s , all that uniformity in high dimensional space, let’s start there 😂.
"Is [the modal collapse of LM personalities...] because they're eating eachother's poop?" "Yeah, it's some sort of human centipede effect... Everything collapsing into [GPT-'4's] personality."
@@bobhanger3370 No. Consciousness is quantum. It is the zero point of calculation. Computers are actionable; you are not. You will not have free will otherwise.
This needs to be far more clearly labeled as a sponsored video. Viewers will lose trust if you are not transparent with them. You also harm the entire UA-cam ecosystem. People need to know when they are watching a paid advertisement.
Did you not see the clearly marked "paid promotion" label at the start of the video? How much clearer could it be? Also we added a disclaimer "This is the second video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview." - and if you look at our back catalog we have been interviewing Cohere and their researchers for years without any commercials. I honestly couldn't think of a better possible revenue model for the channel, unless you want us to spam you with ads about NordVPN? To demonstrate our genuine interest in Cohere look at back catalog with their team: ua-cam.com/video/i9VPPmQn9HQ/v-deo.html ua-cam.com/video/Dm5sfALoL1Y/v-deo.html ua-cam.com/video/7oJui4eSCoY/v-deo.html ua-cam.com/video/ooBt_di8DLs/v-deo.html ua-cam.com/video/sQFxbQ7ade0/v-deo.html
@@MachineLearningStreetTalk Thank you for taking the time to respond. I'm afraid I missed it that, as I listen to interviews like this on my phone while I do chores, so I didn't see the visual cue. Ideally there would be an audio cue for people like me, but you're following UA-cam policies, so I'm all good. Apologies if I brought you down!
What is the future in the LLM enterprise space?
Probably a trough of disillusionment in about 12 months. The industry has overhyped the capability too much at this point so it will need a correction. It’ll probably be a slow correction too.
I don’t see any personnel reduction happening any time soon. More business process augmentation, and human augmentation. There’ll be more humans in the process I think, not fewer.
Also, more use cases with LLM and classical ML hybrids for broader intelligent apps use cases.
Enterprises will be looking to quantify ROI for their initial investments and will struggle to find it. There will be more marketing around it but I don’t think boards or shareholders will be satisfied with the current marketing numbers.
I think so too, a lot or hysteria up till now, but it'll steady out. Sounds like a great outfit, thank you kindly for the interview..
@@Peter-dd3br And all you have to worry about is your customers getting a little glue in their pizza 🤣
That was a really great interview. I like the guys at Cohere. They seem a bit more grounded than a lot of people working in AI. I certainly appreciate that someone is concerned with the problem of LLMs becoming boring and pedantic. That may be a greater danger than Superintelligence 😬
Seems like a nice fellow. Really interesting how hand wavy he gets when you put him on the spot. For example, he reaches for “synthetic data” a lot when you push him on future model improvements. Data augmentation with synthetic data will solve all their problems except where it won’t, like the short discussion about video and physics data.
He doesn’t provide the detail or show the intellectual horsepower I’d expect of someone who is leading the way with their models. The answer to your question about what mistakes he had made and advice for others is another good example. His answer was ‘everything’. No insights, no detail.
As a result he comes across as doing the same things as every other company in the domain. Thanks for the discussion!
I agree with your sentiment, however even though Aidan undoubtedly has the intellectual horsepower (he is one of the OG writers of the Transformers paper) as a CEO he cannot keep up with all the technical details - pretty sure his day consists more on him doing executive choices, partnerships, tons of meetings & traveling. Cohere engineers seem solid & their models are quite capable :)
@@XShollaj Fair point.
@@XShollajto me, Cohere is one of those foundation models builders who have clearly defined their niche and optimising towards that. Very developer-friendly. They allow you to test and pilot as much as you want and support you to go to production. Very competent engineers and great customer support. IMHO, they are too quiet.
tbf neither does elon musk when you hear him talk
@snarkyboojum @XShollaj I agree he doesn't give specifics on errors (but then this is a publicly recorded interview of someone who may yet be seeking investment.. as a CEO he probably hit the correct note of the investor interview for many there.. acknowledge fallibility alongside ability to both listen and learn; don't admit anything embarrassing to company..) @Enedee007 may be right on the too-quiet front for the company.. the lmsys arena scoring Cmrd-R+ alongside an old GPT4 at 20= is certainly not nothing given the resources poured into that model by OpenAI.
As regards synthetic data - well, I think, given the surprisingly small size of scraped and deduplicated web text (less than 100TB - could probably fit on a home NAS!), enlarging and enriching training sets with crafted curated synthetic material containing a higher concentration of 'reasoning' mixed through it probably is a good shout - as well as trying to get higher quality 'real' domain material.. though that is fraught with challenge of finding and negotiating with the copyright holders.. so partnership with the enterprises who already hold it may be a way forwards.. better and more data seems useful - I do think it's interesting that Microsoft are happy to release Phi weights openly but not Phi training data… good data may be a canny bridge to cross part of 'moat'of compute..
As for physics and video training data.. I agree that 'real' is surely generally best (though as a counter-argument there are some epic examples of video capture hardware artifact leading to appearance of false objects and moire effects which simply aren't 'really' there..), AND at the same time I'm not certain that the game engines aren't good-enough for significant further model capability improvements until some novel, live-adaptive dynamic architectures come along which allow tuning to real-world alongside inference. I suspect the uncanny-valley appearance relates to video training virtual dataset viewpoint choices (virtual camera optics, height, simplified movement trajectory, colour grading etc) which biases are then being translated en-bloc through model into generated output, rather than the physics/image rendering of any underlying source engine being off.. Use synthetic data which better maps to end-user generation goals or human perception, and make more explicit camera training and prompting, then I think that 'uncanniness' may begin to fade (in the same way early genAI naturalistic human portraits images have given way to harder to spot recent versions) I'm not certain it's a matter of fidelity failure for physics for majority of issues..
In terms of the rest of the interview, for me the incidental X-risk related commentary was interesting to hear but in some ways a bit more disappointing in being pretty content free.. it sounded very sector-CEO-ish more than reasoned-opinion-domain-expert-fellow-human-citizen.. I mean I have a lot of sympathy for his plague-on-both-houses critique of extremists, but I really want more substance and clarity with actual stated reasons *why* any position is wrong.. and clarity on the implicit metrics, are we talking relative or in some sense absolute 'extremity'.. I don't want assumption that positions are wrong simply because they are sited towards an end of some arm-waving scale... imho, regression-to-mean-ism should be a recognised reflexive fallacy just as much as reflexive extremism is.. not saying that his views were unreasoned (such as they were expressed), just hard to say without seeing the 'worked answer' - though in fairness it's not what he's paid to do and his support for academics, who arguably should be the part of the ecosystem giving 'worked answers' was/is refreshing.
Overall I thought, imho, he came off measured and interesting and fairly (refreshingly) low-ego, but on the x-risk I think he came off sounding a bit like, in some ways, how I've seen some more rational fossil-fuel industry corporate leaders appear talking about climate risk-response.. i.e. measured apparent pragmatism (because they are quite right that some fossil fuel products and benefits are genuinely hard technically to industrially substitute eg when it comes to chemicals industry), advocating to regulation which doesn't impede their ambition and, while talking a balanced position which appears to recognise direction of travel, not really 'getting' and engaging with the nature and magnitude of the potential wider problem (leading me to suspect they really don't understand/grok the potential implications of cross systems interactions of eco-system timeline inflexibility, rapid climate feedback systems, food supply chains, why long term mean temperatures in reality may matter far less than shorter extremes, just what's going on with ocean temperature/pH and what that might mean for plankton / lipid layer or water vapour and model timelines.. etc etc etc) and why those sorts of issues needs to be better bottomed out, the interactions thought about and why much of it isn't in fact formally estimated in many of the models policy people arm-wave towards to justify economically-comfortably slow adaptive pathways (which aren't even being met).
Of course, on the climate/eco-x-risk, it's in fairness, imho, a mess trying to land on rational position with good grasp of all relevant domain realities and probabilities - tbh I'm not sure where I am on the relevant DunningK curve.. which is probably a high-domain-dimensional DK surface in fact.. There are many people labelled as 'extremists' presumably because their position doesn't fit with, or present, any happy, convenient or easily achieved social policy model (because for the most part those peoples' demands/statements come across as angst-ridden sound-bite slogans, not thought-through operational plans rationally explaining trade-offs.. let alone the potentially counterproductive politically and arguably outcome efficacy-inept actions undertaken within a challenging complex battleground including powerful yet mis/un-engaged stakeholders and bad/irrational actors)... BUT, although it's tempting to say wrong-with-one-element => wrong-on-all-counts, the reality is that's flawed; rubbish thinking on response execution isn't actually a reliable measure of whether anxiety around risk and motivation for response need is rationally (or empirically) robust. It's possible to be right about diagnosis of scale of a risk and wrong, stupid, or insufficiently clever, about some aspect of timeline or practical policy treatment plan - just as it's possible to be clever and right about the unfeasibility or social incompatibility/unacceptability of a policy plan and yet seriously, catastrophically, wrong when comfortably and confidently mis-inferring from the error of others (and ego-stroking self assurance of intelligence of self) to the absence of a much-worse-than-you-actually-understand trajectory on a faster-than-you-think-possible timeline. "A lot of doubt may be crippling, but no doubt is undoubtedly.. [ dangerous. (Claude3.5 Sonnet*) | blinding. (GPT4o*) | worse. (Me*) ]. *(Pithily complete the following sentence:)
I increasingly suspect that part of the problem with both of these (and other) complex multi-domain systemic problem spaces, is that the ability of any one human to correctly imagine the risk of distant issues and, alternatively, to assess the practicality/impact of near-term policy options, requires very different thinking styles not commonly found in one brain (and that's even setting aside the need to have exposure to quite a lot of disparate arguments, data and systems intuitions over data interpretation and engineered responses in - physical, biological, social/psychological/political, managerial domains).. and yet the confidence from skill in one domain (particularly identifying problems or gaps in execution or data) has the tendency to incorrectly yield a carry-over of confidence in dismissing 'wild' (and deeply uncomfortable) assertions/scenarios made in the 'imaginative' domain of trying to foresee systems interactions (identifying chain-of-thought threads assemblying abstract risk scenarios for which there will typically be difficulty showing terribly good fit compelling pre-emptive empirical evidence until, well, bad stuff happens.. - and for which 'doomer'-scenarios we kind of need rather more sophisticated rational engagement and response than 'nah, that's a bit extreme').
Bringing things back to topic, I think the training knowledge reach of AI/LLMs perhaps provides a fascinating opportunity to build tools to do some cross domain representation of existing and evolving knowledge, data and arguments which may help policy makers get a better grasp on high complexity issues.. I'm not sure it's quite getting into govt yet, though I did listen to a conference speech by head of team trying to build AI tools for UK gov civil servies which gave me some hope.. there is some activity happening on that front though more of it in global open space might be useful for many topics.
(Apologies for rambling/ranting somewhat - by way of context this is slightly UA-cam comment therapy written whiling away some late-night hours stuck in a post-op hospital bed, contemplating how NHS has changed since I worked in it and trying to ignore some discomfort while some meds kick in.. so my wetware (organic LLM) may be set to a different to usual 'temperature' and possibly ignoring stop tokens.. and grammar.. maybe reason.. is off not as a consequence of LLM. Fortunately no actual temperature🤞..
Normal level idiocy should hopefully resume in a few days…
I shall just hit post now it's below 10k chars (text length, not word sequence choice, was reduced with some effort using Claude3.5 because the original hit nearly 12,000 chars 😱) in the hope there might have been more signal than noise in the above comment.. not betting on it).
TOC:
00:00:00 Intro
00:01:48 Guiding principles of Cohere
00:02:31 Last mile / customer engineering
00:04:25 Prompt brittleness
00:06:14 Robustness and "delving"
00:10:12 Command R models and catch up
00:12:32 Are LLMs saturating / specialisation
00:16:11 Intelligence
00:21:28 Predictive architectures, data vs inductive priors
00:25:55 Agentic systems
00:28:11 Differentiation
00:33:35 X-Risk / Bostrom
00:39:30 Changing relationship with technology
00:45:08 Policy
00:49:01 Startup scene
00:52:44 Biggest mistake?
00:53:50 Management style
00:56:38 Culture in different Cohere offices?
So proud to be working at Cohere under Aidan's leadership. What a great interview, Tim, they just keep betting better! ❤🔥
Thank you Sandra!
The best part was gaining some acquaintance with Aidan as a person. He seems like a great guy.
I think delivering substantial progress on 'System 2' thinking and longer term planning in the next ~24 months will be required to justify the continued level of investments we are seeing in the space.
Yup. And I doubt it's that difficult. I think these AI companies are dragging their feet with it in order to get more investments from investors.
Not true.
This has legs, even with little increment improvement from now on!
Just getting a computer to read in a natural voice is a huge achievement, its those little improvements that would have been huge a decade ago that will improve productivity for years to come.
I think the repetitive behavior of LLM have a different cause. It makes the text boring, because the algorithm required to predict the next word. You have more chances to predict thoughts that are mundane and told many times before. Generating original and creative thoughts so are discouraged by the algorithm.
18:20 "In the same way that hallucination used to be an existential threat to this technology ..."
Past tense? Why? The models I've been playing around with still have that problem very much. Is he talking about stuff that hasn't been released to the public? Every couple of months or so, I try to use these models for software related stuff and usually it doesn't take long until they start making up things, like telling me to set certain configuration parameters, that are available in a different library than the one it suggested to use a moment ago. Maybe it's me being to stupid to prompt these things, but hallucinations don't seem to be a solved problem.
I think he means that while they do still hallucinate, it's a relatively small amount of cases when the model knows a lot about the topic the prompt is querying. App builders still need to do a tonne of engineering to address this i.e. RAG and other stuff, but it's not a deal breaker for many practical applications.
I one of those fellows have been working on programs that combine several different large language models with other machine learning tools to create something similar to this inner monologue conversation as well as modeling dynamic systems interaction. Is exciting to see one of the packages we are using for this with its own ecosystem is looking intothis for its own LLM.
The logits created by an LLM's weights are not log-odds. They are not the result of a statistical function. The function is far more complex. In fact, it's intractable. We do treat these as statistical results, however, via the softmax, and that's appropriate. These weights are modeling human reason in subnetworks, things like comparison functions. Yes these models are reasoning. It's rudimentary reasoning nevertheless. BTW, most synthetic training data that is being created is coming from chat logs where people work with models to create.
I appreciate Aidan's perspective and approach
While this seems like an interesting project, I would categorize it as just another llm/feed forward multimodal model, especially since his answer for innovation boils down to "throw more training data at it". What I'm looking forward to, and think will be an actual innovation in the space is when we can create a model that can be trained to produce useful behavior on minimal data/examples/instruction similar to a human or other animal. In my opinion, the only time we'll see true machine reasoning is when we get something like an RNN that can run for an arbitrary number of iterations, or if the symbolic ml guys figure out something cool.
What is the ticker symbol of Cohere ?
This dude helped invent the transformer, legend
Great conversation to listen to! Thanks for posting this.
You have the most amazing guests, Aiden seems very grounded. Thank you so much.
I like that the thought is to make new models smaller, anything bigger than 32b is overkill and only goes toward data center profit. Algorithms for condensing training datasets (similar to what Microsoft did for Phi 3) os the way. Dimension reduction (PCA or some equivalent). You might lose some accuracy just prompting the LLM, but scaling an Agentic workflow will be robust.
43:10 - What is this enfeeblement pie chart and why is enfeeblement a difficult topic to search for
Yes
Right track seeing problem solving as DSL market ecosystems. Llms that produce synthetic markets that leverage divide and conquer techniques that transform prompts into generated RAG DSL pipelines.
My new favorite team of AI engineers! 🎉
Time and again the hard lesson of the Enlightment is we only make grudging progress through conjecture and criticism.
I've noticed the interviewer model often repeats the phrase "lots of people say" i wonder what data set it was trained on. Maybe videos of MLST?
"How do you teach these devices common sense", I was thinking as you both started talking about it, thanks.
People saying prompt engineering shouldn't exist don't have a clue how language works
of course prompt engineering should exist. however if you want a product to be mainstream, it would help end users more if they didn't have to know about prompts
I believe that LLMs are modeling something like Platonic forms. They have a better error function than surprise. It's purer. They necessarily model the best systems with the fewest errors. They are free from the cognitive dissonance of chemical context injections.
9:41 The tagline on the front page of there investor pitch deck should read…. At Cohere our goal is to prevent the human centipedification of the global B2B AI ecosystem.
Dr. Tim, thank you very much for this interview! Personally I use Cohere's Command-R model pretty often and it is very good, totally comparable to other SOTA models that we currently have. The fact they share openly including weights is something I really like about them.
24:00 I think Gödel and Turing might disagree a little... Best of luck tho 😅
What's this "EAK" or "YEAK" countermovement, or whatever the spelling is? I have heard of the EA cult (effective altruism) but I want to look into the other thing he mentioned.
Where do all these "movements" happen and nobody tells me about it?
It's called e/acc. You'd probably find people talking about it on Twitter or reddit
love your channel. I really would like a interview with Ramin Hasani ! :)
Us too! Chatted with him on the phone recently, should be able to sort it out soon
Will there be a third episode with Ivan? 🤔
I certainly hope so! We haven't filmed with him yet
Such a breath of fresh air. Grounded, not hype or scare-mongering, dedicated to the craft, taking reasonable steps at a time and learning from the process.
also saying absolutely nothing
@@minhuang8848he said so much. Just listen. He’s a whole ocean technically sounder and more competent than many a high-sounding bamboozlers in this space.
An amazing company with some amazing people! I am officially a fan! Thanks to MachineLearningStreetTalk for another great episode!
'Augment'
Nothing new, other than Francois Chollet all the other "AI Experts" are running in circles, spitting out the same thing....
"It needs to have reasoning, we don't need general AI, we can have modular architecture with specialized domain experts, on-the-fly learning"
Yeah we will have those and still will not reach AGI, not a single expert is addressing the question about what we gonna do about representing abstractions.
They are literally running in circles.
Maybe some problems already have solutions, but those solutions aren't quite ready yet.
Too much compute, to low profits for now.
@@ronilevarez901 Yeah that is correct, but the stupid thing is to make claims that those approaches will lead to AGI, but until we reach architecture that learns quickly like children with very few examples, we have long way AGI
Nice no hype no BS business model aimed at solving real problems companies have
They used to teach reasoning in universities so there are textbooks, like Logic by Isaac Watts. I wonder if such texts could provide a useful framework for Ai reasoning🤔
They tried that for years and it failed.
@@mikezooperit's because they are trying to derive logic from data, rather than let the model build its own logic little by little through experience.
@@hunterkudo9832 experience, as in living day by day, which takes too long.
What entrepreneur has the time to wait 30 years to have a full grown AI trained on "experiences"? They want money NOW. And that's all this is about: money. Progress is a side effect.
It's concerning that people can say their company is trying to influence policy and no one is like "hey ehh maybe don't?"
See it from their perspective, there are those who are trying to kill their business with irrational fears of "AI existential risk". There *are* really important risks though, this is nuanced.
I think it is important that smaller entrepreneurs are speaking with regulators. You know that the largest players (who are also the largest companies in the world) have major lobbying efforts. All you need to do is look at the Sam Altman playing up the risk of an “existential threat” before congress to see their disinformation. They would like to see a regulatory scheme that would allow only the largest players to play (like we already have in many industries) while ignoring real risks like misinformation, economic disruption and wholesale misappropriation of IP.
Despite his calm voice, this guy doesn't know nothing
Gomez misguided here. Hilbert's decision problem sought a perfect problem-solving machine that could answer any yes-or-no question failed because: Gödel's Incompleteness Theorems and .Turing's Halting problem.
Refreshmg! Intelligent, mature professional..no " ums", " ahs", artificially accelerated speech or ANY of the typical, bay area speech pathologies which are getting unbearable. We need more grown up, deliberate professiinals like this in the field. Thank ypu, Aiden.
I haven’t finished watching the video so I don’t know if this is addressed later on, but..
With respect to agents which employ Mixture of Experts approach, is there not a concern that by specializing models and reaching out to them by some general purpose orchestrator, that you might lose cross-disciplinary creativity ? Perhaps engineering borrowing some idea from nature, etc? Mightn’t it be the case that the really out of the box creativity would only come from all the knowledge together in one latent space so it can intermingle to identify patterns which might not be achieved by a specialized MoE?
A true MOE involves a combination(mixture) of specialised models with deep domain knowledge(Experts) not of different generalists.
@@Enedee007 Right but if they only have knowledge in their own domains, then how will there be true cross-discipline “inspiration”? The engineering expert might not think to borrow an idea from the animal kingdom, for example. Why? Because whatever is coordinating the agents and routing the questions to the appropriate expert will not itself have the breadth of understanding to connect the dots between two highly distinct disciplines. Like maybe a structural engineering expert might not think to look to honeycomb structures from bee hives.
But if they were all in the same monolithic model then the associations could be made at some layer of latent representation.
@@andrewsilber that’s correct. I think our points are more similar than they differ. The coordinating agent should be the generalist in that architecture. That’s the agent that’s able to carry out cross-discipline activities. That’s also a specialisation, it affords the opportunity to deeply train a specific model for that exact purpose, it should be well grounded in its ability to learn from different areas of specialisation and make critical connections.
@@Enedee007 @andrewsilber There is not just one MOE implementation and I think as currently used, it is done to only activate certain weights (the experts) in some of the layers probably not so much to silo the information but to reduce parameters and increase performance without any loss of acuity. The difficulty with cross discipline inference is actually baked into the models themselves unless it is in the training data or in an expression like “honeycomb structure”. No amount of “temperature” will get the probabilities to point just anywhere in the model.
@@toadlguy i totally agree with you there just not one implementation of MOE. However, the underlying principles are basically the same. MOE I’m this case however is in the context of multi-agents.
They should have titled this "San Francisco sucks."
I trust and pray for Mr. Jesus.
I never saw anybody seriously challenged the concrete x-risk arguments. All the criticism against dooms-day scenario was limited to "they are cultists, they are ideologically brainwashed" instead of hearing and answering the concrete arguments they present
That's because there are no concrete arguments from the x risk people to argue against.
@@joecunningham6939 The argument is if the intelligent has a goal it will use all it's power to achieve the goal. If the intelligence is powerful enough, you know what it means. It's just a chain of events logically following each other.
@@XOPOIIIO and what would that goal be? Why would the goal involve destroying humanity?
@@joecunningham6939 It doesn't matter what the goal will be. To achieve any goal you want as much power and control as possible. You would prevent anybody from intervening. And if you think about it, no goal can be safe for humanity. Any goal this AI try to achieve will end in disaster. It can be either death of humankind or something worse.
The increment progress is hard to define? Super cap. Can it generalize to different reasoning length? Not at all…there aren’t small gaps. They’re gaping holes in current capabilities. Solvable yes. This isn’t reflective of the true reality…at all. It can’t even tell me that given a=b that b=a…again solvable….but we have to stop the cap. No true signs of compositional generalization..again I do believe it is solvable. It isn’t just a data issue, it’s an architecture problem too..RoPE is trash for modern LM’s , all that uniformity in high dimensional space, let’s start there 😂.
"Is [the modal collapse of LM personalities...] because they're eating eachother's poop?"
"Yeah, it's some sort of human centipede effect... Everything collapsing into [GPT-'4's] personality."
Sounds as if he lacks sufficient background in psychology to understand the risks of automated, individualized persuasion.
And what if reasoning is linked to consciousness, which isn’t computational? In that case, these models may never achieve human-level intelligence.
Consciousness is still a computation
@@bobhanger3370 No. Consciousness is quantum. It is the zero point of calculation. Computers are actionable; you are not. You will not have free will otherwise.
seems very overconfident
"How Cohere SAYS IT will improve ...."
We don't know it will improve anything. All CEOs say things about their product(s) that never happen.
which part of the conversation deals with the "AI Reasoning this year"? I watched the first 10 minutes and I gave up watching.
The title 😂
This needs to be far more clearly labeled as a sponsored video. Viewers will lose trust if you are not transparent with them. You also harm the entire UA-cam ecosystem. People need to know when they are watching a paid advertisement.
Did you not see the clearly marked "paid promotion" label at the start of the video? How much clearer could it be? Also we added a disclaimer "This is the second video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview." - and if you look at our back catalog we have been interviewing Cohere and their researchers for years without any commercials. I honestly couldn't think of a better possible revenue model for the channel, unless you want us to spam you with ads about NordVPN?
To demonstrate our genuine interest in Cohere look at back catalog with their team:
ua-cam.com/video/i9VPPmQn9HQ/v-deo.html
ua-cam.com/video/Dm5sfALoL1Y/v-deo.html
ua-cam.com/video/7oJui4eSCoY/v-deo.html
ua-cam.com/video/ooBt_di8DLs/v-deo.html
ua-cam.com/video/sQFxbQ7ade0/v-deo.html
@@MachineLearningStreetTalk Thank you for taking the time to respond. I'm afraid I missed it that, as I listen to interviews like this on my phone while I do chores, so I didn't see the visual cue. Ideally there would be an audio cue for people like me, but you're following UA-cam policies, so I'm all good. Apologies if I brought you down!
Another CEO hyping AI progress to boost share prices
tries to talk like Sam. A wannabe Sam
This guy has knowledge, and a sane worldview so no.