We're doing another giveaway! Subscribe to the Forward Future Newsletter for your chance to win a 24" Dell Monitor: gleam.io/q8wkK/dell-nvidia-monitor-2 (NA Only)
There is a graph that shows the cost of AI over time... It shrank by 2000% I've read, and it's going even faster thanks to NVIDIA chips. What a time to be alive.
I don't think you quite understood what prompt caching is. Imagine you send a prompt that contains: "your entire codebase + some request/question". If you send that prompt without prompt caching with 3 different requests/questions, GPT-4o will re-process the entire input prompt every time, including the massive codebase. However, with prompt caching, the model can remember all the compute required to process the codebase, and only actually process that final request/question at the end. Basically it allows the model to remember "oh yeah, I've seen this bit of this prompt before" and then only process the new bit of the prompt. this means: 1) it isn't something you can do yourself, as you don't have access to the model weights or raw matrix outputs 2) it It just so happens that storing the KQV matrix results means taking up a lot of storage, and so they charge for it. 3) the model will not respond the same way every time to the same prompt, as it will still randomly sample from the prediction distribution.
@@megafoxatron3rd521 I'm sure he's dealt with caching in his previous company, as he seems to understand that concept fine. It's just that he assumed what prompt caching was without first looking it up to make sure.
I just recently unsubbed from this channel. Every video is the SAME. He just puts a tweet on his video and READS IT OUT LOUD. The BS in the thumbnails same hyperbole in EVERY VIDEO!!!
Can't this be achieve with embeddings? You create embeddings for the context and prompt and save the response for that, if the prompt matches the embeddings you send the same result. I don't know much about this, feel free to correct me.
If it’s a full hour of audio-output-only then at $0.24/minute it’s $14.40/hr. If you assume a 50/50 split (half input at $0.06/minute and half output at $0.24/minute) then it’s $9/hr. Better than the Psychic Friends Network in the 90s that cost you $4.99/minute to talk to Miss Cleo!
open source voice models *WILL* be the future. uncensored (to whatever the user desires) private, locally run, secure, etc. Closed source and centralized AI will also exist, but open source, private, decentralized, locally run, uncensored etc. is truly for the people.
Matt, the caching they are talking about, you can not do yourself. They are caching the actual inference weights in the model so it does not have to redo the calculations. They are trading off memory for compute (but probably copying to less expensive storage). They are not just caching the text. This can be done on open source models as well but the overhead is not always worth it, I suspect.
Exactly. That also means, unless I misunderstood, that modifying any part of the cached data (for example, modifying a specific function buried in a previously cached code repository) need all the data to be fed again ?
@@rlfrYes. It is really for cases where you are loading a large code base or documents (text or pictures) and then querying it. Possibly using a very large system prompt.
@@lseder1 I don´t how it is where you live, but right here in Brazil is almost impossible to think about local models at scale. A Rig to run something with a good quality at portuguese would be so expensive that I would be able to buy a ridiculous amount of tokens in OpenAI or Claude's API
Well, the name OpenAI has been a misnomer for some time. I hope that when they become a for-profit that thy change the company name. ClosedAI would be a good start...
@@cbnewham5633 I don't think the name matters that much, people fixates too much on that, their VISION is what should matter. They aren't sharing much anyway, being called OpenAI, CloseAI or SamAI won't change that.
I think you're misunderstanding how caching in LLMs. *They aren't caching responses*. They are caching the token setup time for the attention in the model. This is why they ask you to set the static content at the start of the request and dynamic content at the end, because they can always add more to the end of the model.
🇧🇷🇧🇷🇧🇷🇧🇷👏🏻, I'm eagerly awaiting the release of real-time video for Plus users from OpenAI, as it was originally mentioned as part of the ChatGPT Omni update, which sadly never reached us. This feature will be revolutionary, enabling us to tackle a wide range of daily tasks more efficiently. Real-time video integration within ChatGPT would greatly enhance productivity by allowing for interactive, dynamic assistance and more streamlined workflows. It would be especially useful for tasks like desktop sharing-being able to visually assist and collaborate on real-time activities is just phenomenal. I hope this feature rolls out soon, as it could drastically improve how we approach everyday challenges.
Місяць тому+3
Two minute papers is awsome. looking his channel for years now. Thumbsup for your taste!
i just asked chatgpt to list all roman emperors and sort by length of reign.. it failed then gave up. not sure Id want it controlling my car just yet.. the DISCLAIMER and T&Cs for any app built using AI would be huge ;)
Don't bother to test LFMs (at least now) they suck, it's just hype. Check their benchmarks, everything is 4 to 5 shots to score and some are "reported by developers" (quoting their website)
For the average prompt that has something "cacheable", it could certainly be argued that only 50% of the prompt may be different, or that the re-prompt results in 50% of the OG prompt's neural activations. Bottom line though, is that they don't have to make anything cheaper for anyone..... but they are.
@@KCM25NJL If they didn't have to make it cheaper they wouldn't have. Making it cheaper indicates they feel pressured to compete on price to maintain market share.
@@KCM25NJL they didn't make it cheaper, they found a way to make extra money, they save more but they don't pass on all the saving. It's like they're selling ice cream for $2 that cost them $1 to buy, but now they are selling the ice cream for $1 but it only costs them 10c to buy.
counterpoint: you just cannot prove the lack of regulation was the cause of google, amazon and facebook becoming what they became. If something it was all the help the state gave to these companies that made them that giant.
I don't know. I think at this point if they had to regulate AI the attention should be geared towards "alignment and mislagnment". AI is a huge black hole. It's "oblivion" and "agnoticism" on the inside. I would suggest that all companies participate on the "alignment and mislagnment" issue and make their findings available publicly. So making this part "opens source" only. Just a thought.
This is prefix caching, which means the tail end of the prompt varies. Imagine a really long and detailed prompt specifying the schema for output format, if you need to bring in the same 2-8k tokens each time you will see some savings. For general caching you’re right and that’s something more aligned with the business idea you had a video on (asking someone else to start it)… any service that would be a stable interface that could be plugged in to apps (so that the apps aren’t tied to one single vendor) then caching might be key.
11:03 2 options: - LLM technology has reached it's limit and this limit is below AGI. Different approaches will require vast time and effort to reach even minor advancement. The AGI hype is going to diminish and OpenAI's stock price is going to fall. - Someone is greedy and forcing people out of the company to keep more money to themselves. I'm personally inclined to option 1.
I wouldn't exactly call that realtime API cheap, isn't that about $24 per hour all in? You could hire someone in a 3rd world country like the UK for that 😄
The Voice API pricing is unrealistic. $15+ an hour for audio output? Way too expensive to use it. It’s useful when it’s cheaper than a human, but more expensive than a human? No. I can get similar for $0.05 a minute, like from Azure, etc. Maybe for media production, but too expensive to use in conversational systems.
I think in the sense of unlimited usage at a fixed price like cell service. Nothings free. but it certainly is nicer to have unlimited GB usage at a fixed price, than talking to someone long distance for 10 minutes and stressing that the call cost $8.80. Those days sucked.
I think it's over-simplistic to frame regulation as a stifler of growth. Regulation can take many forms, and in many cases is more just there to direct, and avoid worst case scenarios. Lack of regulation can end up being much worse than having it.
They left because they were afraid of the product that they saw being produced... without control and the implications of what that would mean. Fear...
I believe the most logical and beneficial way to fund AI development would be through taxes, making it freely accessible to everyone. With well-designed safeguards in place, this would encourage broader use of AI's capabilities, accelerating progress that benefits all of humanity. Additionally, decentralizing AI in this way would promote more equitable access and innovation.
I am not very happy with caching. I noticed it in Grok2 in LLM Arena. On every refresh click with separate temperature it was returning me the same answer in chat. I wanted to see different answers but it simply ignored the temperature.
For us folks in Europe, actually integrating real time voice API in our apps will result in our users getting "advanced voice mode" ahead of the Chat GPT app for European users 😂
Might internal caching allow the equivalent of false prompt injections? False caches could inject undesired operations into immediate "memory" that in turn could be encorporated into other users product.
A new theory on why so many are leaving could actually be a smart and good thing. They created this algorithm to ensure AGI and, later, superintelligence. But since this requires so much compute and needs to be spread out to benefit the world, they must create hype and attract investors globally. And since they partnered with Microsoft early on, they put themselves in a bind. The agreement with Microsoft was based on the assumption that it would take decades before real, significant progress. With so many in the know on how to build this AI structure, they can branch out to other companies and create more investments worldwide, potentially spreading the mission of prosperity. They have the "commandments" in prompt form and now need to spread them before regulators have a chance to distort or change them. I could actually benefit humanity more than it looks.
We need a "robot tax". If an agent is taking a human job, that agent needs to pay taxes. This is the AI regulation we need. And the tax should be used for funding UBI.
I'm curious how you think we should measure this. Say a new company is started that only uses AI agents for its work. How do you determine if it is replacing human jobs and if so how many?
The entire purpose predicted on doing the right thing for humanity philosophy to now revamp the mission, vision and strategic imperative to anchored on profit imperative.
By caching OpenAI means that they keep your prompt in memory. It's not a response cache, it's an input prompt cache. So, if you have a prompt "Do xyz on the text below" you can cache it and follow it up with a normal prompt in the same thread. I think there is a misunderstanding over this in the video and in the comments.
Instead of output caching, we need support for git integration and gitpatches. I should be able to point the LLM at a git repo, and instead of it generating the full content in output or parts that I have to hand integrate, it would be better to just produce a gitpatch that I can apply to my local git.
I already built a translation app for Android that does near-time real-time language translation with multiple speaker diarisation (multiple voices stored in a vector db and given a different voice). So far I only have English to Japanese and Japanese to English.
I think anyone working at OpenAI deserves a break from it, I can’t imagine the immense pressure they’ve been under and if they have been making good money then a sabbatical is in order.
The newly released chat GPT voice mode on phones is just the tip the iceberg of what could be done with that technology. That said, I just made a video and uploaded it to my UA-cam channel that describes what might lie under the water of that iceberg. The video is called , No more Mice, Hey, It the 21st Century! And show how one use Chat GPT voice mode to rapidly and easily enter graphical data and such!
SignalWire has been offering a real time voice API to their AI since Oct 2023, and their pricing is pretty close at $0.25min. I wonder if this application- auto-attendants upgraded with AI, is the market OpenAI had in mind when arriving at their pricing? I'm not sure what AI SignalWire is using, but I will say the voice leaves something to be desired (compared to OpenAI).
I do not understand fully the caching thing: 1) LLMs are generative models so not supposed to give necessarily the exact same answer with the same prompt, unless the seed used for the generation of the output is the same, aren't they? 2) if the model answer has a dependency on inputs changing over time, such as news scraped over the net or weather forcasts, I do not think that you want the same question to give you the same ouptut over and over. So this caching strategy can be interesting but can comes with degraded performance depending on your use case no?
@@MichaelForbes-d4p there isn’t a day that goes by that I don’t feel excited and grateful to talk about AI. But some days I don’t feel like recording vids about it 😀
Will people finally give up including Sora when they talk about AI video generation? It seems that OpenAI have quietly shelved it. As for the API pricing - that's hideously expensive, not "cheap". I haven't got a problem with what they charge as I'm sure some companies with deep pockets will have no problem paying those kind of charges. But to claim that is cheap is really not correct - not for small developers. Those "pennies" will soon add up to large amounts of money.
I just tried Liquid on their playground with a small Bash utility script I've been working on and it immediately got into an endless reply loop of repeating itself (apparently they have to tweak their repeat penalty some more?). And that was literally the first thing I tried. Not looking great!
They have been using caching in OpenAI gpt4 and 4o and Bing chat for a long time. Have you guys not noticed the absolute garbage first responses to common questions. Where it ignore what you actually asked and just went on a tangent from some key words you used.
They say we are living in the age of the Anthropocene. I hope we are entering the age of the Metaciviliztion. Humanity will enter an age where not are we not only open, but we form a symbiosis with AI.
You data and prompt with in-out transformation in weigth and be accesible to ai as a secundary weigth is very good idea. If that is fast or made by one secundary NPU and make available to main NPU is the good way. I think this is q*. My noob opinion.
They started the company as nonprofit with mission and vision to go with it. Now going for profit it is misleading the original founder, mission and vision.
If we assume the bot was talking for a full 8hr shift at a call center. For the day it costs $115. Thats not cheaper than hiring someone in the Philippines for call centers.....
We're doing another giveaway!
Subscribe to the Forward Future Newsletter for your chance to win a 24" Dell Monitor: gleam.io/q8wkK/dell-nvidia-monitor-2
(NA Only)
Mathew I love your content. I think your channel could grow more if it had musical shorts that summarized the news.
I could do this for you for all your videos for free for a test then for $100 a month.
AI out here pricing minutes like it's a 90's long distance call.
I agree. But the market will make it essentially "free" in a much shorter time frame than the 30+ years we suffered through.
@@mordantvistas4019 facts, probably faster than any of us expect
It all depends on what they say. A well timed “Buy NVidia” 😊(for instance) could be worth a lot of money 😂
There is a graph that shows the cost of AI over time... It shrank by 2000% I've read, and it's going even faster thanks to NVIDIA chips. What a time to be alive.
Hell I remember paying $1.50 a min for in state long distance.
I don't think you quite understood what prompt caching is. Imagine you send a prompt that contains: "your entire codebase + some request/question". If you send that prompt without prompt caching with 3 different requests/questions, GPT-4o will re-process the entire input prompt every time, including the massive codebase. However, with prompt caching, the model can remember all the compute required to process the codebase, and only actually process that final request/question at the end. Basically it allows the model to remember "oh yeah, I've seen this bit of this prompt before" and then only process the new bit of the prompt.
this means:
1) it isn't something you can do yourself, as you don't have access to the model weights or raw matrix outputs
2) it It just so happens that storing the KQV matrix results means taking up a lot of storage, and so they charge for it.
3) the model will not respond the same way every time to the same prompt, as it will still randomly sample from the prediction distribution.
I honestly dont think he knows half of the technical stuff he talks about.
I call bull that he "dealt with this on his previous company"
@@megafoxatron3rd521 I'm sure he's dealt with caching in his previous company, as he seems to understand that concept fine. It's just that he assumed what prompt caching was without first looking it up to make sure.
I just recently unsubbed from this channel. Every video is the SAME. He just puts a tweet on his video and READS IT OUT LOUD. The BS in the thumbnails same hyperbole in EVERY VIDEO!!!
Can't this be achieve with embeddings? You create embeddings for the context and prompt and save the response for that, if the prompt matches the embeddings you send the same result.
I don't know much about this, feel free to correct me.
I stopped right there. If he doesn't understand prompt caching, which has been a hot topic last few months, why am I getting AI news from him?
$15 an hour to use voice is pretty damn high! Looks like we won't be seeing this in video games for at least a few more months.
If it’s a full hour of audio-output-only then at $0.24/minute it’s $14.40/hr. If you assume a 50/50 split (half input at $0.06/minute and half output at $0.24/minute) then it’s $9/hr. Better than the Psychic Friends Network in the 90s that cost you $4.99/minute to talk to Miss Cleo!
@@mshonleAlthough a very apt comparison to Miss Cleo 😂.
open source voice models *WILL* be the future. uncensored (to whatever the user desires) private, locally run, secure, etc.
Closed source and centralized AI will also exist, but open source, private, decentralized, locally run, uncensored etc. is truly for the people.
@@mshonleyou showed your age on that one and i’m showing mine by laughing hysterically at that.
Matt, the caching they are talking about, you can not do yourself. They are caching the actual inference weights in the model so it does not have to redo the calculations. They are trading off memory for compute (but probably copying to less expensive storage). They are not just caching the text. This can be done on open source models as well but the overhead is not always worth it, I suspect.
Understood now, thanks for clarifying.
Exactly. That also means, unless I misunderstood, that modifying any part of the cached data (for example, modifying a specific function buried in a previously cached code repository) need all the data to be fed again ?
@@rlfrYes. It is really for cases where you are loading a large code base or documents (text or pictures) and then querying it. Possibly using a very large system prompt.
Yes, testing the LFMs would be great
The price is... Pricey. It is acceptable for US folks, but for people with lower currency rate, it is expensive.
we have local llm.
Actually, if you scale it to a an app or service could be hugely expensive
@@lseder1 I don´t how it is where you live, but right here in Brazil is almost impossible to think about local models at scale. A Rig to run something with a good quality at portuguese would be so expensive that I would be able to buy a ridiculous amount of tokens in OpenAI or Claude's API
Well, it's a starting point. I hope (I'm actually sure) that it will continue the trend of decreasing in price
I want the best and I want it free! Wah! 😢😭😂
For just pennies? that's really expensive, the bill will add up really quick. We need open source solutions. Enough of OPEN AI.
Well, the name OpenAI has been a misnomer for some time. I hope that when they become a for-profit that thy change the company name. ClosedAI would be a good start...
@@cbnewham5633 I don't think the name matters that much, people fixates too much on that, their VISION is what should matter.
They aren't sharing much anyway, being called OpenAI, CloseAI or SamAI won't change that.
@@neociber24 I was being somewhat sarcastic. 🙂
I think you're misunderstanding how caching in LLMs. *They aren't caching responses*. They are caching the token setup time for the attention in the model. This is why they ask you to set the static content at the start of the request and dynamic content at the end, because they can always add more to the end of the model.
ty
🇧🇷🇧🇷🇧🇷🇧🇷👏🏻, I'm eagerly awaiting the release of real-time video for Plus users from OpenAI, as it was originally mentioned as part of the ChatGPT Omni update, which sadly never reached us. This feature will be revolutionary, enabling us to tackle a wide range of daily tasks more efficiently. Real-time video integration within ChatGPT would greatly enhance productivity by allowing for interactive, dynamic assistance and more streamlined workflows. It would be especially useful for tasks like desktop sharing-being able to visually assist and collaborate on real-time activities is just phenomenal. I hope this feature rolls out soon, as it could drastically improve how we approach everyday challenges.
Two minute papers is awsome. looking his channel for years now. Thumbsup for your taste!
shouldn't we boycot OpenAI, other companies are blacklisted for way less....
24 cents a minute output. Ridiculous. Open source is the way.
The board signed up to be a non profit. The company is taking a new direction so I can see why they are going.
i just asked chatgpt to list all roman emperors and sort by length of reign.. it failed then gave up. not sure Id want it controlling my car just yet.. the DISCLAIMER and T&Cs for any app built using AI would be huge ;)
Awesome content lately Mathew, feels like you are in a good place with your content strategy. Much appreciated
Thank you
you don't understand prompt caching. you just can't do it locally.
Yeah, I was thinking more about response caching.
"Hey Matthew! Loved the video, thanks! Sending lots of love from India."
I think its totally fucked that OpenAI is "switching" their status....and the fact that Elon is getting fucked in the process... Shady AF.
Don't bother to test LFMs (at least now) they suck, it's just hype. Check their benchmarks, everything is 4 to 5 shots to score and some are "reported by developers" (quoting their website)
Qwen2.5 is better
This promot caching is a dodgy deal, 50% price reduction when it would cost them next to nothing, and such a short expiry really limits the use.
For the average prompt that has something "cacheable", it could certainly be argued that only 50% of the prompt may be different, or that the re-prompt results in 50% of the OG prompt's neural activations. Bottom line though, is that they don't have to make anything cheaper for anyone..... but they are.
@@KCM25NJL If they didn't have to make it cheaper they wouldn't have. Making it cheaper indicates they feel pressured to compete on price to maintain market share.
@@KCM25NJL they didn't make it cheaper, they found a way to make extra money, they save more but they don't pass on all the saving. It's like they're selling ice cream for $2 that cost them $1 to buy, but now they are selling the ice cream for $1 but it only costs them 10c to buy.
12:30 counterpoint: the internet today fucking sucks. maybe we should have regulated before google amazon facebook became the monsters they are today
counterpoint: you just cannot prove the lack of regulation was the cause of google, amazon and facebook becoming what they became. If something it was all the help the state gave to these companies that made them that giant.
I'm getting 403 on the real-time api.... 😢 hope I get access soon!
I don't know. I think at this point if they had to regulate AI the attention should be geared towards "alignment and mislagnment".
AI is a huge black hole. It's "oblivion" and "agnoticism" on the inside.
I would suggest that all companies participate on the "alignment and mislagnment" issue and make their findings available publicly. So making this part "opens source" only. Just a thought.
Yes Matt, please do test LFM. Looking forward to your testing experience.
AI is on the cusp of being able to affect it's own environment. The bill might not have been perfect but we need something soon.
This is prefix caching, which means the tail end of the prompt varies. Imagine a really long and detailed prompt specifying the schema for output format, if you need to bring in the same 2-8k tokens each time you will see some savings. For general caching you’re right and that’s something more aligned with the business idea you had a video on (asking someone else to start it)… any service that would be a stable interface that could be plugged in to apps (so that the apps aren’t tied to one single vendor) then caching might be key.
11:03 2 options:
- LLM technology has reached it's limit and this limit is below AGI. Different approaches will require vast time and effort to reach even minor advancement. The AGI hype is going to diminish and OpenAI's stock price is going to fall.
- Someone is greedy and forcing people out of the company to keep more money to themselves.
I'm personally inclined to option 1.
Cache me ousside, how 'bout dat?
I wouldn't exactly call that realtime API cheap, isn't that about $24 per hour all in? You could hire someone in a 3rd world country like the UK for that 😄
Slipping in a little joke there 😂
That's actually true...I could hire two great English speakers from China/ India with some googling skills for the same price.
but wouldn't you miss the spicy hallucination factor though?
@@ticketforlife2103I would happily work 6 hours for that amount.
Lemme you know if you guys have some work for me.
The story about OpenAI would make for such a great Superhero Movie Antagonist!
"first order of business -- a cool animation" :) love it.
The Voice API pricing is unrealistic. $15+ an hour for audio output? Way too expensive to use it. It’s useful when it’s cheaper than a human, but more expensive than a human? No.
I can get similar for $0.05 a minute, like from Azure, etc.
Maybe for media production, but too expensive to use in conversational systems.
1:04 scammers dream
Amazing how quickly the Perplexity drama has been forgotten by the community.
in ten years this will all be free so, just wait.(this is sarcasm of course)
I think in the sense of unlimited usage at a fixed price like cell service. Nothings free. but it certainly is nicer to have unlimited GB usage at a fixed price, than talking to someone long distance for 10 minutes and stressing that the call cost $8.80. Those days sucked.
@@mordantvistas4019 you bet
Such an honour to watch the coming dystopia happen live day by day.
I think it's over-simplistic to frame regulation as a stifler of growth. Regulation can take many forms, and in many cases is more just there to direct, and avoid worst case scenarios. Lack of regulation can end up being much worse than having it.
...except in pretty much every case ever
11:17 *"I live in California."* 🤯
Matt confirmed for AI character.
3:30 OpenAI’s prompt caching is completely free and happens automatically without you having to do anything so there’s no reason not to use it.
They left because they were afraid of the product that they saw being produced... without control and the implications of what that would mean.
Fear...
Fear of being soon associated with a company having conned investors out of $billions.
Board members leaving, going to a for profit benefit corporation model and Gov vetoing suspicious timing...
Im not gonna pay expensive amounts to create a few second videos, we need open-source products.
Your thumbnails are getting quite repetitive..
I was just thinking that. The content of this video whilst good was definitely not “Huge AI News”.
I believe the most logical and beneficial way to fund AI development would be through taxes, making it freely accessible to everyone. With well-designed safeguards in place, this would encourage broader use of AI's capabilities, accelerating progress that benefits all of humanity. Additionally, decentralizing AI in this way would promote more equitable access and innovation.
Eeeew!
12:15 one little difference to this argument. The internet couldn't wipe out humanity.
I am not very happy with caching. I noticed it in Grok2 in LLM Arena. On every refresh click with separate temperature it was returning me the same answer in chat. I wanted to see different answers but it simply ignored the temperature.
For us folks in Europe, actually integrating real time voice API in our apps will result in our users getting "advanced voice mode" ahead of the Chat GPT app for European users 😂
I'm not paying for anything. Either I run open source. or I simply don't run it. I still enjoy using my brain to solve problems.
Might internal caching allow the equivalent of false prompt injections? False caches could inject undesired operations into immediate "memory" that in turn could be encorporated into other users product.
A new theory on why so many are leaving could actually be a smart and good thing. They created this algorithm to ensure AGI and, later, superintelligence. But since this requires so much compute and needs to be spread out to benefit the world, they must create hype and attract investors globally. And since they partnered with Microsoft early on, they put themselves in a bind. The agreement with Microsoft was based on the assumption that it would take decades before real, significant progress. With so many in the know on how to build this AI structure, they can branch out to other companies and create more investments worldwide, potentially spreading the mission of prosperity. They have the "commandments" in prompt form and now need to spread them before regulators have a chance to distort or change them. I could actually benefit humanity more than it looks.
Quality content!!!!
We need a "robot tax". If an agent is taking a human job, that agent needs to pay taxes. This is the AI regulation we need. And the tax should be used for funding UBI.
I'm curious how you think we should measure this. Say a new company is started that only uses AI agents for its work. How do you determine if it is replacing human jobs and if so how many?
The entire purpose predicted on doing the right thing for humanity philosophy to now revamp the mission, vision and strategic imperative to anchored on profit imperative.
I don't get why strawberry is the goto.
Beekeeper, Deepfreeze, Assesses, Assassins, Divisibility, Invisibility.
By caching OpenAI means that they keep your prompt in memory. It's not a response cache, it's an input prompt cache. So, if you have a prompt "Do xyz on the text below" you can cache it and follow it up with a normal prompt in the same thread. I think there is a misunderstanding over this in the video and in the comments.
Instead of output caching, we need support for git integration and gitpatches. I should be able to point the LLM at a git repo, and instead of it generating the full content in output or parts that I have to hand integrate, it would be better to just produce a gitpatch that I can apply to my local git.
This will send shockwaves through call centres ... Many people will lose their jobs
I already built a translation app for Android that does near-time real-time language translation with multiple speaker diarisation (multiple voices stored in a vector db and given a different voice). So far I only have English to Japanese and Japanese to English.
I think anyone working at OpenAI deserves a break from it, I can’t imagine the immense pressure they’ve been under and if they have been making good money then a sabbatical is in order.
At this point, it's becoming obvious that open ai is now officially sold out. I would suggest open ai to rename to "Sold AI"
They should be doing all this for FREE -guy who works for money
Thank you 🎉
The sponsor's link in your description leads to "The site cannot be reached"
The newly released chat GPT voice mode on phones is just the tip the iceberg of what could be done with that technology.
That said, I just made a video and uploaded it to my UA-cam channel that describes what might lie under the water of that iceberg.
The video is called , No more Mice, Hey, It the 21st Century!
And show how one use Chat GPT voice mode to rapidly and easily enter graphical data and such!
Cache time is better than anthropics 5 minute limit.
SignalWire has been offering a real time voice API to their AI since Oct 2023, and their pricing is pretty close at $0.25min. I wonder if this application- auto-attendants upgraded with AI, is the market OpenAI had in mind when arriving at their pricing? I'm not sure what AI SignalWire is using, but I will say the voice leaves something to be desired (compared to OpenAI).
The regulations part reminded me of Nintendo and its thousands of Patents haha
Yes please test liquid ai. Very interested in how it fares
I do not understand fully the caching thing: 1) LLMs are generative models so not supposed to give necessarily the exact same answer with the same prompt, unless the seed used for the generation of the output is the same, aren't they? 2) if the model answer has a dependency on inputs changing over time, such as news scraped over the net or weather forcasts, I do not think that you want the same question to give you the same ouptut over and over. So this caching strategy can be interesting but can comes with degraded performance depending on your use case no?
its not the cacheing of the result... it's the repeated portion/ context that you feed in along with your query
The mb link for Weave is broken. The Wandb URL ends with "AI" not "ME". I'd add the URLs but the comments are being taken down.
The section on all the founders leaving is interesting but I don't share your optimism about the future of this company
Do you ever have a bad day at work? Like, "Man, I just don't feel hyped about AI rn"
No
@@matthew_berman my man!
@@MichaelForbes-d4p there isn’t a day that goes by that I don’t feel excited and grateful to talk about AI. But some days I don’t feel like recording vids about it 😀
@@matthew_berman I hear ya. You gotta take those personal days. Although, it seems like you could be at the mercy of the news cycle.
Will people finally give up including Sora when they talk about AI video generation? It seems that OpenAI have quietly shelved it. As for the API pricing - that's hideously expensive, not "cheap". I haven't got a problem with what they charge as I'm sure some companies with deep pockets will have no problem paying those kind of charges. But to claim that is cheap is really not correct - not for small developers. Those "pennies" will soon add up to large amounts of money.
Great content
please test the LFM 40B, I'm very curious to seee as it performs
I keep trying the liquid ai models.
They forced her to leave before the moving to for profit. She lost equity.
I just tried Liquid on their playground with a small Bash utility script I've been working on and it immediately got into an endless reply loop of repeating itself (apparently they have to tweak their repeat penalty some more?). And that was literally the first thing I tried. Not looking great!
They have been using caching in OpenAI gpt4 and 4o and Bing chat for a long time. Have you guys not noticed the absolute garbage first responses to common questions. Where it ignore what you actually asked and just went on a tangent from some key words you used.
I don't know why my laptop currently doesn't have Siri-like AI capabilities.
It's so doable.
The real question is what did Illya see?
They say we are living in the age of the Anthropocene. I hope we are entering the age of the Metaciviliztion. Humanity will enter an age where not are we not only open, but we form a symbiosis with AI.
1:15 - Right, pennies per minute,YET thousands per pages! Lol 🤪
they left because of ai safety
How can you cache the context yourself when using a cloud API?
Discrete caching is easy, semantic caching is difficult.
That voice API is crazy expensive
They have already achieved AGI. They figured it out. That is why they are leaving.
I can confirm, in my wildest dreams.
AGI will send all employees home. OpenAI has AGI. Employees are going home. The math maths. 😅
definitively test liquid models
It's not AGI but how human is it that like most humans, AI can be bad at math and that it can suffer from the Dunning-Kuger effect.
They left because they see who Sam Altman really is, he’s a dangerous man with callousness toward humanity
And they also found out he screwed them on the money promises
You data and prompt with in-out transformation in weigth and be accesible to ai as a secundary weigth is very good idea. If that is fast or made by one secundary NPU and make available to main NPU is the good way. I think this is q*. My noob opinion.
They started the company as nonprofit with mission and vision to go with it. Now going for profit it is misleading the original founder, mission and vision.
Now waiting for someone's version of Jarvis
Thanks
If the results are true, LFM1B could be very useful for edge devices
As others have said I think you are not understanding what prompt cashing is, it's not something you can do yourself.
If we assume the bot was talking for a full 8hr shift at a call center. For the day it costs $115. Thats not cheaper than hiring someone in the Philippines for call centers.....
Those who left probably used GPT to write their Exit Emails!