OpenAI Discovers JSON (And Zod???)
Вставка
- Опубліковано 1 жов 2024
- OpenAI is using Zod to make using LLMs actually feasible for devs, so hyped to see AI companies taking DX seriously with structured outputs!
OpenAI Article: openai.com/ind...
Check out my Twitch, Twitter, Discord more at t3.gg
S/O Ph4se0n3 for the awesome edit 🙏
This is basically just an iteration on function calling which has been around a long time now, openai has supported json outputs via function calling for a long time now and it covers the same use cases
Yeah, before it was just verifying that the JSON was valid, now it's a system to format the output, I just started watching so not sure if Theo talks about it but the example they gave on the official statement was pretty cool, basically allowed the model to output a JSON to create an SQL statement with. Pretty cool but not so impressive, this should have been done years ago. It'd take a decent developer a few weeks of work to add a verify to the JSON output to check if it follows the exact schema. But I guess there's more to it than just validating it, you gotta make the model actually spit what you want which is basically as hard as making a potato talk and they achieved that.
nha man you are getting all wrong, you parse parameters using it to "make a json" is like an hack and it has not create performances (+ nested stuff is not great)
Exactly what I was saying. Been doing this for a while now, although it has been enhanced for sure, overall.
Also all VSCode AI extensions using OpenAI models use JSON as a means to obtaining only code all the time as part of a structured response.
@@creaky2436 YOU CANNOT SINCE YOU DON"T HAVE ACCESS TO THE SAMPLER FFS
Actual purpose-built improvement to AI tools?? Never thought I'd see the day
small correction, when including a list of reasoning steps in the output, it is not used to reference how the llm arrived at the answer, but to actually boost llm performance by allowing it to "think" before generating a response, which is why it is important that the steps array comes before the answer field, otherwise you will not see improvement at all.
Theo has no idea how LLMs work, like he literally has zero clue. It's sad to see that after sooooo many people told him this he still hasn't spent a few hours to learn about it.
@@themprsndevhe doesnt know anything but bullshet , influencers
I think this is lazy and bad architecture design. Yes, it is a really good idea to add JSON schema rules. But it is nonsense for me to add an extra dependency called zod with zero code architecture on the NPM source code and make the devs install an extra dependency in production… It could be just a simple JSON example or simple JSON format with a validator function helper, much more simple to implement with a simple documentation page. Adding any dependency small or not it is be really lazy on a product that will live inside a huge range of applications types. It doesn’t matter if you like zod, be minimal needs to be a much in a product that is used at this scale.
After some resarch even the zod playground is not helpful. I found some free online tools that convert JSON to zod, but non until now that converts zod to zod JSON output.
In conclusion, who does not want to install zod can ask a LLM what will be the zod output because most of the time the output is really simple (it is computed already) and use it as input on the new OpenAI API. If some of you know about better tools to do it without install it, please let me know.
We're using structured json for our project.
OpenAI can take a "description" field for each value. Which means you can actually embed a subprompt inside each field. describing what the output of that specific field is.
19:20 Nobody ever claimed JSON, HTML, CSS, etc are not languages, they're just not programming languages. For HTML, it obviously considers itself a language, it's got language in it's name(HpyerText Markup *Language*).
He is stoopid influencer
Someone tell him Google's Gemini already had this feature before?
Doesn’t Google also offer this with their Gemini models?
GPTs don't suck at counting, you just have to prompt them to account for tokenization. It's an ebkac/skill issue. Same as the 9.9 vs 9.11 'problem'.
If you say Write out the letters in "Javascript" and test if each letter is "a" or not to count how many a's are in it, it will get it right 100% of the time, even on old models.
I misheard him say 9.9 vs 9.91 sounds like an excuse to laugh at people who don’t speak English as their first language 🤔
Vercel AI and Langchain support these feature a decade ago 😂
Yes since gpt 0.1.0 😂
@@ahmedivy😂
And I thought function calling was the solution already.
Yeah and TGI has grammar sampling
Exactly
This sounds awesome and would be a step in the right direction for current artificial specific intelligence
Take a picture of some text and ask ChatGPT to format it as JSON. I did it with some tables of numbers in a book.
2:50 “Agentic” is the word used in the “systems” argot, as you’re not designing agents so much as you’re designing a system that uses an “agentic” framework
tl;dr JSON is now a language. Looking for a dev with 16 years of experience with the JSON language
Lmao
Hey theo, you are a mac user, why don't you use the desktop app of chat gpt
This is bigger news than Theo seems to realize. That 40% accuracy figure has been a problem for awhile. Not even models trained specifically for json output could score much higher. There are workarounds (retrying model calls), but nothing as good as this structured outputs approach.
Also, why is he randomly plugging Vercel's AI "sdk"? You think curl is too hard for experienced developers or something? I would be shocked if anyone actually used that.
The support for Zod is great. However apparently it messes with the “reasoning” so you need to have a 2 step process - one to generate the answer in unstructured format, then another call to actually format the answer. By the way the flakiness of previous versions is much exaggerated, I created complex JSON outputs based on typescript interfaces provided in the system prompt for non-trivial tress of objects and found GPT-4 and 4o to be very consistent. What did happen is that extraneous text such as: “here is your answer:” but it was easy enough to parse.
100%, been using chat gpt 4/4o to extract information following a typescript interface as well. Kinda felt like magic, then the output json feature was added and it was pretty good
no man, it doesn't use zod to make json, it used zod so the user can write zod stuff and parse it back, it is converted to a json schema, a cfg is created and it is used to do constrained decoding, so we remove tokens that do not make sense for a valid json
@@frazuppi4897 does it get applied on the server or is this just happening on the client in the api library? It seemed like the API has a field for it
@@martinbechard happens on the server on their side. Imagine decoding a json from a language model, you know that the first token must be "{" thus only that one is sampled, then you know that the first key is, for example from the json schema, "hello" so you can just fill it, then a ":" must appear and so on
It is explained in the blog post
The fact that ChatGPT 4 does not put in a code block is actually something I prefer. While experiencing whit some llms using ollama I received json wrapped in markdown when asking for json with some models, even with customized modelfiles. When you ask for json, you should get json!
not putting the result in a code block is the right thing to to tbh, you asked them not to include anything else, which is what they did by not putting it in a code block, putting it in a codeblock needs the model to respond with
```json
JSON DATA
```
which is not what you've asked
lol the comments in the live so confidently saying incorrect things about how LLMs work is so ironic… Reminds me of how LLMs can so confidently hallucinate wrong answers.
you could fine-tune an open-source language model to perform this task. Fine-tuning a base model typically takes minutes, and the output can be consistently formatted according to your specifications , also the cost is free.
5:30 it didn't forget to put it in a code block, it just followed your instructions and gave you raw text instead of wrapping it with ``` ```
Can't edit, but chat got me
Aren't you supposed to use the API if you want to test out the new features of the model?
Of course mine wasn't this good, i'm not delusional. Lol. However, using ai as an api with structured data is awesome. You can create any api you want.
Unstructured data to structured is what we all wanted.
Hahaha, I have been working for several weeks on a project that had as a core assumption that ChatGPT could already do this. It’s like St. Elizabeth and the Roses.
nah I actually really like the ai videos, like it or not most developers are going to be working on or with ai tools sooner or later
We had JSON mode.
but I built my own function calling model to handle this issue because it just did not listen to JSON and still I get outputs that are so on the edge its WILD!
Love the vid! (even though I watched it on stream. great topics as always btw)
L in HTML stands for Language It's a Markup Language mot a Programming Language.
Bro it’s been 1 minute and the bots are already digging in
hopefully reporting them actually does something
I can’t see them, I think he removed them
We've been able to get structured outputs with instructor for a year now?
it doesn't use zod to make json, it used zod so the user can write zod stuff and parse it back, it is converted to a json schema, a cfg is created and it is used to do constrained decoding, so we remove tokens that do not make sense for a valid json
Hmmmm, i read an openai article already in April of 2023, where they described, that the "tool use" capability of ChatGPT (wich came out then), was able to output valid JSON (i mean only generating valid JSON). So this capability existed for over a year...
The only actual improvement is that it adheres to schemas now.
Langchain has done this for years now for any inference endpoint
Even before this getting the exact format you wanted back wasn’t that hard. If your prompt is good, you can get back the right format 13 or 14 times out of 15. You put retry logic to try again if you get error on the format, and you use a simple Regex to parse out anything that’s not part of, say, your JSON object
I'm starting to understand why zod is so much more popular than something like typia, when the stuff you're validating is json anyway. Might as well use json to structure it instead of typescript.
not putting it in a code block is the correct response bruh
16:03 we actually use AI for that at work. There are already solutions that will hook into google meet and record it, transcribe it, and make notes. Our absolutely PO loves it. I found it useful quite few times too, especially it makes it easy to narrow down where it came from, and re-listen to that part of meeting if you are unsure or looking for specific anwser for question you know was asked etc. Very useful stuff.
I'd wager half the people getting mad at comparing GPT models to a suped-up autocomplete fundamentally does not understand how GPT models works (generating individual tokens based on the tokens that came before)
Have you ever done a video on why you use ORMs? I use Drizzle, but have been solidifying my sql knowledge recently and was wondering why not jump in two-footed? Might make a good video if you haven't done it?
ORMs fall apart when you do anything non-trivial. But most js devs queries are 'select * from xyz where a = b' so it's enough. Try writing nested loops (pretty standard in Postgres) in ORM lol
@@boccobadz Sure, it was more a rhetorical question on my part, and a specific one on why he personally uses them. I mean you aren't wrong, but non-trivial sql queries are still non-trivial in sql, no?
I'd say the primary reasons for me are:
1. ORMs offer protection from SQL injection, which is pretty vital if your statement involves some amount of user input
2. ORMs can make it easier to dynamically construct statements programatically - e.g. if you want to apply a bunch of filters/grouping/ordering determined by a user, I'd say its a LOT simpler to add these filters together through an object interface in Python/JS, rather than attempting to concatenate strings yourself
There's probably plenty more reasons but I can't think of them off the top of my head. I would say if your SQL query is static, doesn't involve any user input values, and you observe significant performance benefits for raw SQL over the ORM, then perhaps raw SQL is the way to go. But never prematurely optimise :)
The timeline in which deterministic output is a newsworthy feature... lol
function calling has been around for a while, zod types are also implemented at the vercel AI SDK. I think you should cover it.
WELP I cant use this! why not make it a model parameter and not a just when function calling caveat???
Teo is click bate talking shit anbout Postgres
JSDOC support would be nice :)
we had this already with openai function calling, but it's cool that it's now "100% reliable"
Some models were wise enough not to put in markdown markup in responses as you said that you'll parse the response directly, you saw it as a mistake
AI sucks, it can't even answer "What is love" with "baby don't hurt me". 0/10
Gave it a documentation one. Asked it questions about it. It rejected my documentation and substituted it with its own. Lets see if it works now.
Watching all the wrong answers in the chat 🙈
Can please someone make tree shaking in zodiac possible or we all just start using valibot please
I have been doing this since ai came out.
Damn. Every comment at this time is a bot.
Only you? I don’t see anyone else 👀
EDIT: seems Theo removed them. Yay!
Theo has had his head in the sand for so long on AI that he thinks this is groundbreaking
yeah like gemini had for months
i guess we might not want to piledrive theo for this information .....
So is this a LSLM ? (Large structured language model)
Gemini already have this
lol, imagine the situation where your website is so dynamic that you just on the fly always pull it out from an LLM and maybe do some vector database to store already generated pages :D
Imagine the security issues with this
Now let it return htmx
oof, love your stuff, but openAI? don't care. it's that much worse than claude. i'd rather have claude structure my data using the medium of dance than use openAI with zod. and i love zod
superman
Theo- can you do a video on how you use Apple Notes? TY
My like is for AI SDK. 12:44
thats huge!💪
Report the bots
Second
I feel like these formatted output are a big release the possibilities are endless
omfg
zod in the box