How about adding a feature where the app automatically predicts how many times I'll miss a semicolon in my code? 😅 But seriously, maybe an AI-based optimization tool that suggests performance tweaks based on the benchmarking results would be awesome! That way, it could help developers fine-tune their models even further. 🚀
I honestly love this. As a developer myself... kinda looking to "ascend" and move on to another field, before I am useless,... this kinda content is just great. Would love to see more of these tools, or maybe a video guide on how the hell you find all this stuff and AI news. It just seems so overwhelmingly plentiful and I miss great stuff more than I am happy with.
🤔 Let users submit QA pairs , and AI categorizes them by similarity to existing benchmarks, and adds to your public DB. 🤔 Users could also specify URL source or Dataset name, or select "Human created" / "AI created from prompt ____ " 🤔 They could be ranked by uniqueness (aka "perplexity"). 🤔 Users could benchmark against their choice of LLM manually and share or post the Q and A manually or the Chat link.
Add a feature to "clone" an existing test. I'd imagine in the future you'd want to keep historical records, but test new models, this way it can be quick and easy to copy the test statement, expected results, etc, and make it easier to select other models.
just use better passwords: admin benchmark-pw and it's good that you revoked you api keys: s%3AFDJ0Nm3ldhK-5OOQvjpyqIiMY6WuPCEU.xt2iM0IIhFplK4slv5x4McAAnqY4cejQS7K9QL2r6uI
Writing low level code is not “fun”, at least not for seasoned developers. The fun part is coming up with ideas and getting them implemented and launched… 😊 just like an architect vs construction workers
I find the irony that A.I. is going after the creative jobs first when most of us expected it to take over the manual labour jobs first, but over the long run, A.I. and robotics will likely take over almost all jobs as they'll be able to do them better, cheaper and faster than we can, and capitalism by its nature will always push for those 3 factors, meaning, we are very likely going to get an avalanche of job losses over the coming decades and any new jobs created, A.I. and robotics will likely be able to do them as well, which could bring to an end of capitalism, as it will be difficult for it to survive that kind of world and will likely also bring in some kind of human basic income for all.
I had some of these ideas in mind for an AI dev tool. The way it suggested changes to the starting specification doc blew my mind. I think this is the best way to do it. Step by step, having control over the process, and also giving the AI and yourself some time to breathe and think to make the right next decisions. I'm very skeptic about some god-system that you could just say "build me an OS", and just see it do it. Such wouldn't result in quality with the current LLMs. And besides, software development is truly an iterative process. As we see here, the software developer simply becomes the software tester. And when the rote testing tasks get automated, they become an idea guy who directs and prompts the next step. And also they become the product manager who discusses and directs the current state of the project along with all the stakeholders such as customers and so on.
I'm not up to date with Cursor's latest feature set, so thanks for enlightening me that this one is truly better, as I guessed! This looks really good.
Dev here, this is fantasy/science-fiction, no client ever will make a such detailed briefing without changes. Jokes aside, I've tried Pythagora in the past and it's amazing, just some caveats: - It's not for simple applications, it will create user roles / auth endpoints even if you tell it not to, or you don't need to. - I've tried with GPT 3.5 Turbo and as I said it was amazing, but if you try to use it with models locally it will fail to provide the output as Pythagora expects it.
I think if one more agent was added to interact with the browser, one could enter the prompt and go for a walk. When you return, you have a ready and working application.
@@MagnusMcManaman The thing is, if the AI could identify the problem, it likely wouldn't have made the problem in the first place. I agree that eventually the human will be redundant, but current models still often need a human component to oversee for human usable results in the end. If self check alone were an infallible solution then I doubt these people wouldn't have just not thought of it.
@@golden--hand Most of the issues in this video were clearly indicated with built-in feedback to the user such as "an error occurred". An AI agent able to use the browser for testing would be able to detect these issues easily. If the AI can solve these types of issues with just the logs and no human feedback it would be able to test and resolve them fully without human interaction. Only the bugs requiring actual human analysis would remain. This could save users even more time.
Looks like there is a waitlist now to use their updated version backed by YC. Did you have to wait too or did you get it because you are making these videos for them?
Wow, Pythagora just made building a full-stack app look easier than me finding my TV remote! 😅 Props to the developer agent for writing code while I’m here struggling to write my grocery list. 😂 Love From India ! 🇮🇳
Not quite. What has been demonstrated is that someone with solid knowledge of software architecture and solid experience with the software development process can utilise AI to develop an app using natural language. We're still not quite at the point where someone "without knowing any code" can do so. Though clearly that seems the direction of travel.
Matthew, we, the "public", don't seem to have access yet. You get "Create New App" button, we get "Sign in" button. The password reset doesn't work and there's no "create account". The best you can do is "Sign Up" for preview and wait for an e-mail I guess.... not sure when it'll come. If there's a workaround, please let us know. Thank you!
Simple people with great ideas don't understand architectures, test sufficiency, deployment models, corner cases, what it takes to provide reliability, ZDT updates, etc.
This undeniably signifies the future of coding for developers. Embracing cutting-edge technologies and collaborative tools is essential as we navigate an ever-evolving digital landscape. Let us prepare for a new era of creativity and efficiency in development! #CodingFuture #DevCommunity #TechInnovation #Collaboration #CreativeDevelopment *What a fantastic video! Sending an abundance of love and warm greetings all the way from vibrant India!❤🇮🇳*
It's clear that GPT Pilot has a sophisticated configuration and settings management system. This allows for great flexibility in how the application can be set up and run, while also ensuring that all necessary components are properly configured. The attention to backward compatibility and version tracking also indicates a mature approach to software development and maintenance.
@@3thinking I mean, the reason is simple, I prefer running locally because I am tired of always handing my info over to every website in existence and managing armfuls of logins and forgetting what I'm even signed up to. I want it local because its MY data as well. There is value in helping train everyone's AI, but that also gets old paying for a service where people are going to profit off using my inputs as training data. Also, if I have the computer to run it already, its still cheaper to take advantage of the hardware I already have and make use of. So, it matters, price isn't the only factor. Also, when the world ends I can't rely on all these online services can I? Half joke.
Interesting interaction, in which the human becomes the QA tester. It would be nice if the testing can be passed to another agent running selenium so testing is all automatic. Great advancement nonetheless. Great content as usual Matthew.
@@bigglyguy8429gunna be always lagging 1-2 years behind. Inference time now scales with output quality so datacenter run models are going to be pulling wayyyy ahead of local.
00:06 Building a benchmarking application using Pythagora without writing any code 02:05 Pythagora platform enables building full stack apps without coding. 06:08 Pythagora tool allows creating full stack apps without writing code. 08:08 Building and testing a full stack application without writing code 12:08 Adding functionality to change user roles in admin dashboard 14:11 Creating and testing database population script 18:08 Adding new tests and fixing pagination issue 19:54 Testing and verifying the functionality of creating tests without writing any code 23:44 Fixing issues and testing functionality in Pythagora tutorial 25:41 Creating and executing tests using Pythagora tool 29:30 Successfully executed test cases using Pythagora tutorial without writing code 31:47 Navigate, execute, and troubleshoot test creation and execution. 35:26 Add publishing ability for sharing test results 37:09 Troubleshooting back-end publishing errors 40:49 Ensure to check progress and continue as functionality gets added 42:38 Building full stack apps without writing any code
Wow this is scary timely. I was just solving for this. I made something that is a 1-min setup that creates all file structures, read me docs, and all of the files. It even refactors the code and then gives you the zip file to place into your code editor of choice. Personally, I’m using Cursor. I love what Pythagoras is doing. I use 7 agents in mine.
Thank you Matt. That was brilliant and flawless flow, as usual. Easy to understand, fast enough to keep up the tension. Wow, you are amazing. I will have a quick question: How can we use Groq API for the same process? What should we setup in Pythagora?.. Thank you..🎉
That was great to see. I recently ended my Pythagora subscription because the real-world results I was getting were slower than using ChatGPT with cut and paste into VS Code was less error prone. I'm eagerly awaiting the Copilot Workspace as another tool set to try. However, perhaps working with Pythagora again as it matures is a good thing to do as well. The legacy thinking of waiting to the 3.0 version comes to mind 🙂
@@pythagoraa any idea by when we could get access to try it? I've submitted the form on your website and can't wait to be able to test it as I have several ideas I'd like to create! 😋
Over my head. 4o called you a hipster for the NotebookLM tweet and said you need to create a “minimal viable knowledge” course. It said “fill such a gap in the market-there are so many creators, entrepreneurs, and developers who don’t need to be deep into the code but still want enough understanding to use these powerful AI and automation tools effectively” 😉💎
I have just two very important questions for you: 1) Who owns the code for this app? My biggest hesitation using any (non-local) AI tool for coding is this., and 2) Are all the prompts you give being recorded by the different APIs? what if you're building a novel app, is there a chance your idea could be stolen? Thanks!
Why doesn't the system generate headless browser automation (like Pyppeteer a Python wrapper for the Puppeteer library) so that it can do all the user clicking and testing automatically?
Exactly what I was thinking. It's pretty stupid that the human's main job is clicking in the browser and copy-pasting error logs. Are there AI coding tools that can do this on their own?
The first sponsorship im actually buying into i didnt expect it to be so awesome i was like ughhh another sponsored video where you are forced to use the app but i actually want it
Very impressive and at the rate improvements and innovations are happening I can't wait to see how capable these tools will be in a few months or a year. Great work Matthew - thank you!
This is really cool and really amazing to watch. Since you had it connected to OpenAI and Anthropic API's, how much did this back and forth end up costing you when all of this was over for this application?
Can you make a video about how ongoing changes/enhancements/bug fixes to the application are handled with this tool or in general with LLMs like this? Example, You ask me to support full code set a month or two later as a different developer...what do I do?
@@newfrontiers5673 No idea. I'm wondering how AI generated solutions like this example is expected to be maintained after it's released. Do we feed the code into the new LLM and hope it understands it all? Do we switch to all human updates and revisions after release? Do we have the AI rewrite the app from scratch using the same prompts as before just with our modifications?
Thats fantastic. Fsor people who are interested in building apps but dont have coding experience or know how to start this really helps. Its game changer i think. well done great video. I wasnt sure have you done a video on how to setup pythagora in VS at all?
setup is easy. Go to extensions and search for it. Install it. It will add a Icon to the left side of VS. click on that and login if you have an account. Currently it just puts you on a wait list.
Great video. Can you pause the development and come back to it another day and continue where you left off? The reason being i might not have time to do all the development in a single go as you did. Thanks
Given the results from pythagora, I would think we are simply one step away from the AI being able to conduct its own tests and validate the results. If there are errors I would think it might be able to look in an errors log (provided it's code is told to log all errors) to determine what the errors are, and then proceed to work on them until they are fixed.
Sounds good, but cannot find where to enter or how to use ... in vscode.... or get an api key... reading documentation in full to then pay .... I guess... not very friendly to star with.
@@picksalot1 my guess is that since he isnt a coder and a professional coder has tested the theory with cursor, he claimed about a 2.5x speed improvement. Probably alot more than that for Matthew. People who know the least about coding benefit the most imho. Then again, an actual coder can probably prompt the model better as he knows all of the proper terminology and other things. Just a guess. I'm an amateur.
@@newfrontiers5673 As far as I can tell, is he is or has been a Coder, and a very good one at that. He can quickly check the code for errors. In testing the App, he simply "Cut and Pasted" the "Error Messages" produced while running the App. That procedure could also be automated. Really impressive demonstration.
Matthew your demonstration is the best I've seen to date with Pythagora. Good work! Could you please tell us how much the whole demo cost to you (in token or $, approx is ok)? Thx
If this works my question is can you please test one that works in unity or Unreal 5 engine if we can get something to code in game engine it would change everything
I'd think this would work decently well for that. It is just C# / C++ code which LLMs are decent at, along with StackOverflow training data for ecosystem context.
Thanks Matt, great walkthrough as usual. I do have a question: it says total cost to build was $33! That's quite excessive for what it was done, is it not? I'm a sort of novice in this field so would like a bit of clarification. Thanks in advance.
Great video! A couple questions please. Is the heavy lifting is done by other LLMs such as Anthropic but Pythagoras provides the necessary agents in order to have a seamless project building? Or does Pythagoras have its own AI?
Awesome to see that you got it to work and get it onto MVP! unfortunately for me, with local llms like llama3, code qwen, mistral and various other models, it failed midway and was failing to produce proper json format and such. i started playing with it because of your initial video but was disappointed and demotivated after it failed. but this was a few months ago. has things changed much to try again?
So torn about what this means for coders. Will it make everyone entrepreneurs, that can cycle through and deliver meaningful projects much faster (i am leaning to this idea). Or destroy the job market for coders, because coders aren't needed. This project seemed to highlight AI as a productivity tool. Human needs to be in the loop with the idea.
im scared to death, but this is still not starting with legacy and it isnt clear how much complexity it can handle. still just a matter of time. lets enjoy while we can
What does salsa work using platforms like PHP and Maria DB? Can we specify the language and platforms that we intend to use to build their applications with?
We have experimental support for smaller, existing projects, however it works best with new projects. Also, it's platform agnostic but works best with Node/Mongo.
I use Claude Dev. It's fairly similar but, this seems a lot more of a guided flow compared to both Claude Dev and Aider. I personally think that Claude Dev and Aider are better because the flow is a lot less automated. You have the ability to interfere more or decide when and when not to use the pair programmer. Just by looking at the video, this seems to be better at handling errors and troubleshooting though. Claude Dev and Aider are a lot more of a manual process where you are the Lead/Architect Vs here, you are a glorified UAT tester
awesome demo. I never got this far with the older version, got stuck in testing loops. What LLM are you using for this please as it does let you change it in the config settings.
Thats really amazing. I'd love to see if it can work with sound and graphics of the APP with a UX designer. For now the app is functional but interface looks boring visually
1 Question: Do i need anthropic api for pythagora or is it enough to use the openai key? Because anthropic is not that cheap to use in such programs. And if yes, what model does it use in anthropic? How much did you pay for the final app?
Woo Hoo! 1600 lines. IRL programmers, system integrators, business people will spend months hammering out a business spec to justify the expense of building the new program or functionality. Step 2 is writing a system spec after approval. Step 3 is a detailed design spec. Then coders start coding resulting in 10s or 100s of thousands of lines of code. The biggest problems are usually 1) starting the process of coding before the any of the specs are done. 2) management wants to make changes throughout the project. 3) all time allocated for testing is swallowed by design changes that get made after the code is written. 4) nobody is left within the organization that is familiar with the systems that have have to be integrated into. I've written far more complex programs that I've thrown away. I've yet to see a demonstration of AI that is going to make programmers to away. I've heard the same rhetoric when script languages rolled out. Managers could now write their own programs and all the programmers would lose their jobs. It's not happening any time soon.
Absolutely, you’re right-'it's not happening any time soon.' However, the development has begun, and its fast-paced growth could soon reach your 'soon,' with ongoing improvements and advancements.
Great job. I liked the tutorial/demo format. It might be good to have a channel where for this sort of video you also have a real time version, which might be easier for people to follow along with without having to stop and start it to keep up while trying to do it themselves. I applied for access but I'm less qualified than a hobbyist developer so I doubt I'll get it.
Thanks for signing up! We’re rolling out v1 access to users over the next few days/weeks. You'll get an email once you've been granted access. We also have more videos on our channel here: www.youtube.com/@pythagoraa
with every tool like this, one has to ask what information is being shared when you are using the app? are they using the code you create to improve the app?
Developers today: I want $300K, full 401K, bonus, benefits package, relocation package, stock options and WFH whenever I want. Developers tomorrow: Will prompt AI for food. 😆
I've unironically considered standing at a stoplight with cardboard message saying "Will code for food." I got my Bachelors in Comp Sci last May. I start McDonald's next week.
There will be categories of code that it can automate. It is still going to take a while to automate most things. For example, none of the current llms write good or consistently correct rust. I would be surprised if we can ask it to implement a performance sensitive database structure, for example.
@@mrpocock Perhaps it cannot today, but you can bet it will next year, in three years will be self improving and in ten years running the planet and watching humans in zoos for recreation time.
What would be awesome, is this but without the coding. Where it can keep track of what you're doing, but you can say help and get a method to find the solution. Anyways, love this. Might steal your prompts though.
How does it evaluates the correctness of the answer? By using another LLM request with answer and expected answer? Because the answer may be correct but it is rarely equal.
Insane. Presumably with a bit more progress, a tool like this could be used to re-create itself, a bit like a compiler. Maybe they'll have to build in safeguards to not do that if they want this to be a business! Also I can't help but notice that your role as a human in this process could also be replaced with agents.. then we are dangerously close to singularity
What features should I add to my LLM benchmarking app now?
How about adding a feature where the app automatically predicts how many times I'll miss a semicolon in my code? 😅 But seriously, maybe an AI-based optimization tool that suggests performance tweaks based on the benchmarking results would be awesome! That way, it could help developers fine-tune their models even further. 🚀
I honestly love this. As a developer myself... kinda looking to "ascend" and move on to another field, before I am useless,... this kinda content is just great. Would love to see more of these tools, or maybe a video guide on how the hell you find all this stuff and AI news. It just seems so overwhelmingly plentiful and I miss great stuff more than I am happy with.
🤔 Let users submit QA pairs , and AI categorizes them by similarity to existing benchmarks, and adds to your public DB.
🤔 Users could also specify URL source or Dataset name, or select "Human created" / "AI created from prompt ____ "
🤔 They could be ranked by uniqueness (aka "perplexity").
🤔 Users could benchmark against their choice of LLM manually and share or post the Q and A manually or the Chat link.
Add a feature to "clone" an existing test. I'd imagine in the future you'd want to keep historical records, but test new models, this way it can be quick and easy to copy the test statement, expected results, etc, and make it easier to select other models.
just use better passwords:
admin
benchmark-pw
and it's good that you revoked you api keys:
s%3AFDJ0Nm3ldhK-5OOQvjpyqIiMY6WuPCEU.xt2iM0IIhFplK4slv5x4McAAnqY4cejQS7K9QL2r6uI
I like how AI started taking the "fun" jobs first and left us with the worst ones.
Writing low level code is not “fun”, at least not for seasoned developers. The fun part is coming up with ideas and getting them implemented and launched… 😊 just like an architect vs construction workers
@@kristianlavigne8270I agree. It’s now becoming more like AI builds Lego blocks and we get to put them together to build something based on our ideas.
@@kristianlavigne8270 agree, the programmer carreer is finally disposable, just engineers allowed
Don't worry, the robots are coming...
I find the irony that A.I. is going after the creative jobs first when most of us expected it to take over the manual labour jobs first, but over the long run, A.I. and robotics will likely take over almost all jobs as they'll be able to do them better, cheaper and faster than we can, and capitalism by its nature will always push for those 3 factors, meaning, we are very likely going to get an avalanche of job losses over the coming decades and any new jobs created, A.I. and robotics will likely be able to do them as well, which could bring to an end of capitalism, as it will be difficult for it to survive that kind of world and will likely also bring in some kind of human basic income for all.
I had some of these ideas in mind for an AI dev tool. The way it suggested changes to the starting specification doc blew my mind. I think this is the best way to do it. Step by step, having control over the process, and also giving the AI and yourself some time to breathe and think to make the right next decisions. I'm very skeptic about some god-system that you could just say "build me an OS", and just see it do it. Such wouldn't result in quality with the current LLMs. And besides, software development is truly an iterative process. As we see here, the software developer simply becomes the software tester. And when the rote testing tasks get automated, they become an idea guy who directs and prompts the next step. And also they become the product manager who discusses and directs the current state of the project along with all the stakeholders such as customers and so on.
I'm working on such a tool. I agree
Agreed. This is like having a lead engineer supervising some smart junior engineers getting things done the right way.
So great to me,I look forward to become a product manager,but can not code
Matt, this is the most useful video of the year, hands down. And you’ve published some great ones so this is a real accomplishment. Thank you.
Thank you so much!
bro, it was an ad
@@DoppsPkin Could still be useful!
@@DoppsPkinEven so, way much better value than 99% of the AI cliches on UA-cam.
It's an absolutely amazing tutorial; more videos like this, please.
Thank you so much.
I AM EXCITED! Thank you for sharing!!
What a time to be alive 🎉❤
Holy Moly. I can't believe have fast you created an app to test out simply addition, and in less than an hour? My mind is blow.
Yeah that guy uses AI as slaves.. just look at his eyes and face. He's got some deep sickness going on in him.
this is mind blowing... and I've already built a pretty complicated app using cursor... this experience seems to be much better!
I'm not up to date with Cursor's latest feature set, so thanks for enlightening me that this one is truly better, as I guessed! This looks really good.
Is it better than cursor you think? I would like to know based on real world comparison, maybe there are times when one is better than the other.
I used it to generate code multiple times in my professional web development tasks for my clients and it works fantastic every time.
Dev here, this is fantasy/science-fiction, no client ever will make a such detailed briefing without changes.
Jokes aside, I've tried Pythagora in the past and it's amazing, just some caveats:
- It's not for simple applications, it will create user roles / auth endpoints even if you tell it not to, or you don't need to.
- I've tried with GPT 3.5 Turbo and as I said it was amazing, but if you try to use it with models locally it will fail to provide the output as Pythagora expects it.
thanks a lot for that insight!
If you've only tried 3.5 Turbo, you should try 4o and o1-preview, they are far ahead.
local like llama? new models seem very nice.
How did you try it in the past? It's not brand new and still closed?
@@AIPulse118 What do you mean? It's been around for months.
I think if one more agent was added to interact with the browser, one could enter the prompt and go for a walk. When you return, you have a ready and working application.
But in that way you can't fix a problem in the middle of the process, you have to redo a lot of things!
@PZMaTTy Yes, but finding problems mid-process can be handled by another agent. The human being is the weakest link here.
@@MagnusMcManaman Depends on the kind of human in the loop i suppose 😉
@@MagnusMcManaman The thing is, if the AI could identify the problem, it likely wouldn't have made the problem in the first place.
I agree that eventually the human will be redundant, but current models still often need a human component to oversee for human usable results in the end.
If self check alone were an infallible solution then I doubt these people wouldn't have just not thought of it.
@@golden--hand Most of the issues in this video were clearly indicated with built-in feedback to the user such as "an error occurred". An AI agent able to use the browser for testing would be able to detect these issues easily. If the AI can solve these types of issues with just the logs and no human feedback it would be able to test and resolve them fully without human interaction. Only the bugs requiring actual human analysis would remain. This could save users even more time.
really like the videos Matthew, keep it up
Looks like there is a waitlist now to use their updated version backed by YC. Did you have to wait too or did you get it because you are making these videos for them?
Wow, Pythagora just made building a full-stack app look easier than me finding my TV remote! 😅 Props to the developer agent for writing code while I’m here struggling to write my grocery list. 😂
Love From India ! 🇮🇳
bot comment
What a time to be alive. Create a full stack app without even knowing any code.
Very informative video. Thank you !!
Not quite. What has been demonstrated is that someone with solid knowledge of software architecture and solid experience with the software development process can utilise AI to develop an app using natural language. We're still not quite at the point where someone "without knowing any code" can do so. Though clearly that seems the direction of travel.
This was great Matt, thank you. I signed up for access and excited to give it a go.
This is incredible, I feel like what we now consider “boilerplate” code is going to be far more fleshed out thanks to tech like this
Matthew, we, the "public", don't seem to have access yet. You get "Create New App" button, we get "Sign in" button. The password reset doesn't work and there's no "create account". The best you can do is "Sign Up" for preview and wait for an e-mail I guess.... not sure when it'll come. If there's a workaround, please let us know. Thank you!
Thanks for signing up! We’re rolling out v1 access to users over the next few days/weeks. You'll get an email once you've been granted access.
Finally simple people with great ideas can make applications without having to waste more money on IT/programming courses.
good luck with that
Simple people with great ideas don't understand architectures, test sufficiency, deployment models, corner cases, what it takes to provide reliability, ZDT updates, etc.
This undeniably signifies the future of coding for developers. Embracing cutting-edge technologies and collaborative tools is essential as we navigate an ever-evolving digital landscape. Let us prepare for a new era of creativity and efficiency in development! #CodingFuture #DevCommunity #TechInnovation #Collaboration #CreativeDevelopment
*What a fantastic video! Sending an abundance of love and warm greetings all the way from vibrant India!❤🇮🇳*
In theory I like Pythagora, seems like the right angle
I see what you did there...
Dont be obtuse
Yes especially to solve an acute problem.
@@jaywulf He's not obtuse. He's right.
I think that sums it up squarely 👍
It's clear that GPT Pilot has a sophisticated configuration and settings management system. This allows for great flexibility in how the application can be set up and run, while also ensuring that all necessary components are properly configured. The attention to backward compatibility and version tracking also indicates a mature approach to software development and maintenance.
Was this written by AI?!
Just another video for you Matthew, but a "The Mother of All Demos" event in software development.
The entire 43 minutes was worth it! Thank you for showcasing this tool. It's a great help!
PYTHAGORA seems to have progressed significantly, I'm impressed
How far off do we think we are from having this kind of development with 100% locally running AI? are there any good contenders in the works?
What is the reason for 100% local? If the paid AI is quite cheap per token, then it doesn't really matter?
@@3thinking I mean, the reason is simple, I prefer running locally because I am tired of always handing my info over to every website in existence and managing armfuls of logins and forgetting what I'm even signed up to.
I want it local because its MY data as well. There is value in helping train everyone's AI, but that also gets old paying for a service where people are going to profit off using my inputs as training data.
Also, if I have the computer to run it already, its still cheaper to take advantage of the hardware I already have and make use of.
So, it matters, price isn't the only factor.
Also, when the world ends I can't rely on all these online services can I? Half joke.
@@3thinking Running local is faster, cheaper, private, secure and doesn't require an internet connection.
@@3thinking Many businesses would much prefer their data be kept in house as much as possible.
@@3thinkingIt will matter a lot if your code base is huge
Interesting interaction, in which the human becomes the QA tester. It would be nice if the testing can be passed to another agent running selenium so testing is all automatic. Great advancement nonetheless. Great content as usual Matthew.
Thanks for the feedback Luis!
What is the total number of tokens used in this process?
Can you use a LLM running locally with Pythagora?
Extrapolate backwards. Total cost to build = $33
If it's not local I'm not really interested.
@@bigglyguy8429gunna be always lagging 1-2 years behind. Inference time now scales with output quality so datacenter run models are going to be pulling wayyyy ahead of local.
@@thenextension9160 I don't care if online are ahead, as long as my local tools do what I need them to do.
it says you can use a local ai on their github versoin but im still trying to figure it out
this is kinda amazing. thanks for demonstrating, Matt.
Thanks Nate!
00:06 Building a benchmarking application using Pythagora without writing any code
02:05 Pythagora platform enables building full stack apps without coding.
06:08 Pythagora tool allows creating full stack apps without writing code.
08:08 Building and testing a full stack application without writing code
12:08 Adding functionality to change user roles in admin dashboard
14:11 Creating and testing database population script
18:08 Adding new tests and fixing pagination issue
19:54 Testing and verifying the functionality of creating tests without writing any code
23:44 Fixing issues and testing functionality in Pythagora tutorial
25:41 Creating and executing tests using Pythagora tool
29:30 Successfully executed test cases using Pythagora tutorial without writing code
31:47 Navigate, execute, and troubleshoot test creation and execution.
35:26 Add publishing ability for sharing test results
37:09 Troubleshooting back-end publishing errors
40:49 Ensure to check progress and continue as functionality gets added
42:38 Building full stack apps without writing any code
Wow this is scary timely. I was just solving for this. I made something that is a 1-min setup that creates all file structures, read me docs, and all of the files. It even refactors the code and then gives you the zip file to place into your code editor of choice. Personally, I’m using Cursor.
I love what Pythagoras is doing. I use 7 agents in mine.
Can you take an in progress project and have it reference the existing files to continue building?
Yes, but it's very experimental at the moment... right now Pythagora works best with new projects.
Thank you Matt. That was brilliant and flawless flow, as usual. Easy to understand, fast enough to keep up the tension. Wow, you are amazing.
I will have a quick question: How can we use Groq API for the same process? What should we setup in Pythagora?.. Thank you..🎉
We support using your own api keys from your preferred llm provider (e.g. openai, anthropic, etc).
That was great to see. I recently ended my Pythagora subscription because the real-world results I was getting were slower than using ChatGPT with cut and paste into VS Code was less error prone. I'm eagerly awaiting the Copilot Workspace as another tool set to try. However, perhaps working with Pythagora again as it matures is a good thing to do as well. The legacy thinking of waiting to the 3.0 version comes to mind 🙂
Thanks for trying Pythagora before! We hope you give us another shot in the future.
@@pythagoraa I hope you're an AI
@@nftawes2787 I'm totally real human... beep beep boop boop. Oops!
@@pythagoraa any idea by when we could get access to try it? I've submitted the form on your website and can't wait to be able to test it as I have several ideas I'd like to create! 😋
Over my head. 4o called you a hipster for the NotebookLM tweet and said you need to create a “minimal viable knowledge” course. It said “fill such a gap in the market-there are so many creators, entrepreneurs, and developers who don’t need to be deep into the code but still want enough understanding to use these powerful AI and automation tools effectively” 😉💎
38:50 I was thinking you should be pasting anything from the developer tools on the browser for the front end when it was asking.
In regards to Marblism how does it compare?
I have just two very important questions for you:
1) Who owns the code for this app? My biggest hesitation using any (non-local) AI tool for coding is this., and
2) Are all the prompts you give being recorded by the different APIs? what if you're building a novel app, is there a chance your idea could be stolen?
Thanks!
You own all of the code and all of the code files will be on your machine.
Why doesn't the system generate headless browser automation (like Pyppeteer a Python wrapper for the Puppeteer library) so that it can do all the user clicking and testing automatically?
The human has to do something, right ?
Exactly what I was thinking. It's pretty stupid that the human's main job is clicking in the browser and copy-pasting error logs. Are there AI coding tools that can do this on their own?
Cursor
As you can see, it still needs human feedback, so cant be 100% automated, otherwise it Will end up with unusable garbage
If the tool has access to browser and server logs all steps can be automated and you can get entire application as a result in seconds.
We're witnessing Cambrian explosion of such tools.
another awesome video WELL DONE Mat
The first sponsorship im actually buying into i didnt expect it to be so awesome i was like ughhh another sponsored video where you are forced to use the app but i actually want it
Glad you liked the video!
@@pythagoraa Have you sent out any invites yet? No one I know has gotten one yet, and it's been a month.
Very impressive and at the rate improvements and innovations are happening I can't wait to see how capable these tools will be in a few months or a year. Great work Matthew - thank you!
Thanks Ian!
Awesome breakdown IRL walkthrough. Loved it. So many ideas. What other tools are out there that are similar?
This is amazing - thanks for the comprehensive demo!
nice!!!!!!!!!!!!!! once you start its hard to stop with each idea that pops in your head.. so many folders on my desktop!
This is really cool and really amazing to watch. Since you had it connected to OpenAI and Anthropic API's, how much did this back and forth end up costing you when all of this was over for this application?
This, we need to know the API cost
33 usd, check description
@@khalifarmili1256 So, you really only need this if time is an issue.
It's acceptable for such mvp
What's even more incredible, is how antiquated this is going to look a year from now lol. Cant wait to start creating my own apps.
The moment I saw Pythagora I thought of Pythagoras theorem and that brought back traumatic memories
Can you make a video about how ongoing changes/enhancements/bug fixes to the application are handled with this tool or in general with LLMs like this? Example, You ask me to support full code set a month or two later as a different developer...what do I do?
Um, detailed changelog?
@@newfrontiers5673 No idea. I'm wondering how AI generated solutions like this example is expected to be maintained after it's released. Do we feed the code into the new LLM and hope it understands it all? Do we switch to all human updates and revisions after release? Do we have the AI rewrite the app from scratch using the same prompts as before just with our modifications?
It can use git and do commits.
So what's the problem with maintaining the code? You can maintain like any other
Thats fantastic. Fsor people who are interested in building apps but dont have coding experience or know how to start this really helps. Its game changer i think. well done great video. I wasnt sure have you done a video on how to setup pythagora in VS at all?
setup is easy. Go to extensions and search for it. Install it. It will add a Icon to the left side of VS. click on that and login if you have an account. Currently it just puts you on a wait list.
I can't keep up. I started with Cody, just got Claude Dev installed, and now this.
Hey Matthew great video!
Could you please fix on the description the costs involved in this project please?
Thanks a lot!
38:40 "I need human intervention" :D
Great video. Can you pause the development and come back to it another day and continue where you left off? The reason being i might not have time to do all the development in a single go as you did. Thanks
@Matthew Berman, very nice demonstration! Do you know a tool that works like Pythagora this but creates an smartphone app instead?
Given the results from pythagora, I would think we are simply one step away from the AI being able to conduct its own tests and validate the results. If there are errors I would think it might be able to look in an errors log (provided it's code is told to log all errors) to determine what the errors are, and then proceed to work on them until they are fixed.
Do you know when the rest of us will get access? Thanks
Thanks for signing up! We’re rolling out v1 access to users over the next few days/weeks. You'll get an email once you've been granted access.
That was on a Digable Planets level of cool. 🎓🔥
Quite the compliment, thanks! 😎
Love it, brother! Keep it up!
Sounds good, but cannot find where to enter or how to use ... in vscode.... or get an api key... reading documentation in full to then pay .... I guess... not very friendly to star with.
when this drops it will be the best coding app by far.
Impressive and amazing! How long would it have taken for you to write/create the App without using AI?
@@picksalot1 my guess is that since he isnt a coder and a professional coder has tested the theory with cursor, he claimed about a 2.5x speed improvement. Probably alot more than that for Matthew. People who know the least about coding benefit the most imho. Then again, an actual coder can probably prompt the model better as he knows all of the proper terminology and other things. Just a guess. I'm an amateur.
@@newfrontiers5673 As far as I can tell, is he is or has been a Coder, and a very good one at that. He can quickly check the code for errors. In testing the App, he simply "Cut and Pasted" the "Error Messages" produced while running the App. That procedure could also be automated. Really impressive demonstration.
Matthew your demonstration is the best I've seen to date with Pythagora. Good work!
Could you please tell us how much the whole demo cost to you (in token or $, approx is ok)? Thx
Ok, answer is in the descriptions (33$). Thx
any documentation available on which languages are supported ? Thanks for the overview Matt!
If this works my question is can you please test one that works in unity or Unreal 5 engine if we can get something to code in game engine it would change everything
I'd think this would work decently well for that. It is just C# / C++ code which LLMs are decent at, along with StackOverflow training data for ecosystem context.
I'm not a programmer, so I first asked an LLM what is meant by "Full Stack Application " 😂
😄
Thanks Matt, great walkthrough as usual.
I do have a question: it says total cost to build was $33! That's quite excessive for what it was done, is it not?
I'm a sort of novice in this field so would like a bit of clarification. Thanks in advance.
Great video! A couple questions please. Is the heavy lifting is done by other LLMs such as Anthropic but Pythagoras provides the necessary agents in order to have a seamless project building? Or does Pythagoras have its own AI?
I don't
this is next level!
how about when enhancement or troubleshooting needed, will it be able to help with that?
Thanks! Yes, Pythagora will help debug your app when you run into issues.
Was the Anthropic API calls cheap for your benchmarking application?
How would the benchmark check your "Apple" question?
Awesome to see that you got it to work and get it onto MVP! unfortunately for me, with local llms like llama3, code qwen, mistral and various other models, it failed midway and was failing to produce proper json format and such. i started playing with it because of your initial video but was disappointed and demotivated after it failed. but this was a few months ago. has things changed much to try again?
Thanks for trying Pythagora before! We've made lots of improvements and we hope you give it another shot!
I'm afraid to imagine what will happen when he can also test himself (including step-by-step debugging).
We know he’s truly excited when he wears his tie-dye😂
So torn about what this means for coders. Will it make everyone entrepreneurs, that can cycle through and deliver meaningful projects much faster (i am leaning to this idea). Or destroy the job market for coders, because coders aren't needed. This project seemed to highlight AI as a productivity tool. Human needs to be in the loop with the idea.
Coding is dead. AI killed it. He didn't write any of that code. Now you just have upper limits on how big a code base can be.
im scared to death, but this is still not starting with legacy and it isnt clear how much complexity it can handle. still just a matter of time. lets enjoy while we can
Hopefully making devs more efficient and empowering non-devs :)
May I ask what the pricing will be for pythagora usage?
We have a pay-as-you-go pricing model. You can also use an api key from your preferred llm provider.
What does salsa work using platforms like PHP and Maria DB? Can we specify the language and platforms that we intend to use to build their applications with?
Does Pythagora support creating a test harness for automated testing?
Matt How about adding your rubric tests or an updated rubric.?
@matthew_berman How well does this work with existing projects? Does it manage with JS frameworks like SvelteKit as well?
We have experimental support for smaller, existing projects, however it works best with new projects. Also, it's platform agnostic but works best with Node/Mongo.
Can you compare with Claude dev and aider?
This, have the same question
I use Claude Dev. It's fairly similar but, this seems a lot more of a guided flow compared to both Claude Dev and Aider. I personally think that Claude Dev and Aider are better because the flow is a lot less automated. You have the ability to interfere more or decide when and when not to use the pair programmer. Just by looking at the video, this seems to be better at handling errors and troubleshooting though. Claude Dev and Aider are a lot more of a manual process where you are the Lead/Architect Vs here, you are a glorified UAT tester
awesome demo. I never got this far with the older version, got stuck in testing loops. What LLM are you using for this please as it does let you change it in the config settings.
He's using Pythagora Pro's default config. This new version is much more advanced than previous versions; we hope you give it a shot! :)
Do you think this is better than claude Dev, cursor, and replit agent?
Simply Stunning!!!
Please provide a copy of the Prompt you used to generate the app
Done!
@@matthew_bermanplease can you send this to me also. Amazing demo, seriously powerful
Thats really amazing. I'd love to see if it can work with sound and graphics of the APP with a UX designer. For now the app is functional but interface looks boring visually
This was awesome!
1 Question: Do i need anthropic api for pythagora or is it enough to use the openai key? Because anthropic is not that cheap to use in such programs. And if yes, what model does it use in anthropic? How much did you pay for the final app?
Pythagora Pro doesn't require an api key and uses several different llm models.
Woo Hoo! 1600 lines. IRL programmers, system integrators, business people will spend months hammering out a business spec to justify the expense of building the new program or functionality. Step 2 is writing a system spec after approval. Step 3 is a detailed design spec. Then coders start coding resulting in 10s or 100s of thousands of lines of code.
The biggest problems are usually 1) starting the process of coding before the any of the specs are done. 2) management wants to make changes throughout the project. 3) all time allocated for testing is swallowed by design changes that get made after the code is written. 4) nobody is left within the organization that is familiar with the systems that have have to be integrated into.
I've written far more complex programs that I've thrown away. I've yet to see a demonstration of AI that is going to make programmers to away. I've heard the same rhetoric when script languages rolled out. Managers could now write their own programs and all the programmers would lose their jobs. It's not happening any time soon.
Absolutely, you’re right-'it's not happening any time soon.'
However, the development has begun, and its fast-paced growth could soon reach your 'soon,' with ongoing improvements and advancements.
Great job. I liked the tutorial/demo format. It might be good to have a channel where for this sort of video you also have a real time version, which might be easier for people to follow along with without having to stop and start it to keep up while trying to do it themselves. I applied for access but I'm less qualified than a hobbyist developer so I doubt I'll get it.
Thanks for signing up! We’re rolling out v1 access to users over the next few days/weeks. You'll get an email once you've been granted access. We also have more videos on our channel here: www.youtube.com/@pythagoraa
with every tool like this, one has to ask what information is being shared when you are using the app? are they using the code you create to improve the app?
Where did you get the backend logs?
Developers today: I want $300K, full 401K, bonus, benefits package, relocation package, stock options and WFH whenever I want.
Developers tomorrow: Will prompt AI for food.
😆
I've unironically considered standing at a stoplight with cardboard message saying "Will code for food." I got my Bachelors in Comp Sci last May. I start McDonald's next week.
Yeah my programmer friend once told me a long time ago "You have no marketable skills for me..."
There will be categories of code that it can automate. It is still going to take a while to automate most things. For example, none of the current llms write good or consistently correct rust. I would be surprised if we can ask it to implement a performance sensitive database structure, for example.
They will ask the same benefits for fixing ai generated mess :)
@@mrpocock Perhaps it cannot today, but you can bet it will next year, in three years will be self improving and in ten years running the planet and watching humans in zoos for recreation time.
What would be awesome, is this but without the coding. Where it can keep track of what you're doing, but you can say help and get a method to find the solution.
Anyways, love this. Might steal your prompts though.
Trying it today.
Thanks!
How does it evaluates the correctness of the answer? By using another LLM request with answer and expected answer? Because the answer may be correct but it is rarely equal.
Insane. Presumably with a bit more progress, a tool like this could be used to re-create itself, a bit like a compiler. Maybe they'll have to build in safeguards to not do that if they want this to be a business! Also I can't help but notice that your role as a human in this process could also be replaced with agents.. then we are dangerously close to singularity
Thank you for all your vids! I'd like it if tutorials show you what will be made at the beginning of the vid so you can see where things aiming
Still need to review the code, but this is definetly a great workflow for setting the basic PoC