I'm filming a video tomorrow about all of the dumbest things people have said about UploadThing. Reply with some good ones here and you might get featured ;)
I've been using uploadthing for a long time now. I know how a S3 bucket works but honestly I got screwed up on handling the permissions of S3 initially. Uploadthing is faster, smoother to configure & clean in it's operations. I hope uploadthing becomes a norm for all the businesses. It's really good. Wishing good luck to Theo and Julias.
"Would rather use the much more stable and simpler Amazon S3, and does speed even matter? The user should be fine waiting a few more seconds." - Some guy in discord
I mean even if you didn't try to do that consciously, it would happen - you don't write everything perfectly the first time, specially when working on an MVP. It has to work before it can be optimized
@@bastianventura dude, it's not a nuclear fusion equation analysing app! it's a freakin S3 uploader! you could have it up and running in 1 prompt! but i guarantee you 90% of its code is to limit your ability to upload based on your tier... you should 1. PLAN 2. CODE 3. Optimize, i guessed he missed 1.
the web dev world is slowly reverting. soon we will get "we used literally zero npm packages and just vanilla JS, and our product shipped 10x faster, and the average API response time is 0.0001ms"
Curiously recently I discovered a way with vanilla navigation api and view transition to make an app like nextjs, with all features, faster and don't need build step
while the tooling has a few npm packages for sure Astro is great for that you can ship zero JS if you want, proper grid layout with a few lines of CSS (as much as I love tailwind it adds pages and pages of CSS), and the (optional) SSR features like Astro Actions are specifically designed to work without JS.
1) S3 does support resumability 2) File sizes can be checked using `content-length-range` 3) S3 can reject on file extension and mime types 3) You could have ditched Lambda and done a webhook back to the server
I can make it even simpler by removing your server and just by using free and open source Uppy to upload directly to S3/R2 or wherever, it has resumability and other plugins for free too
I think the difference is that Uppy's "ingest server" can be run by you (using Tus) or by Transloadit. You still need a server if you wish to have resumability and no ghost files, though
More and more, I'm coming around to the idea that all these microservices, serverless, edge networks, etc. create way more complexity than is needed for the vast majority of use cases. We devs do love to complicate things.
1 to 2 years and we will have completed a full cycle. "new" web devs already "discovering" PHP again. Not long before people uploading HTML files to a nginx/Apache server again and calling it "zero dependency websites". This will be the new big thing.
One day we will figure out how to cut out the middleman entirely and upload straight to our own servers, which can then transcode files, upload them to S3, etc. Oh wait, we actually had that figured out in 2005...
I had a debate with this bloke on his discord 2 years ago where he couldn’t fathom that I refused to use serverless for a next app due to serverless constraints and performance issues I was having. Good to see he’s finally coming around 🎉
Let us know about the costs difference later down the line, because serverless tends to be very expensive, but your new infrastructure uses a lot more bandwidth on the application side
I know the joke is that "this meeting could be an email" But i feel UT could just be a blog describing best practices for S3 and or a config file. What is the product offered?
Wow this is massive reduction in complexity! I hope though one day we'll have technology advanced enough to use this thing called "Your server" to store a file. Sure hope we would be able to achieve even less arrows on the graph then...
one year later. "we made file reading 10x faster and lowered our cloud cost 10x by going bare metal server. " it i always nice to see people gets excited when they re invent the wheel
Legitimate question, but isn’t 1.5s to upload 3.2MB still really slow? I don’t know what kind of internet you have, but a 50mbps upload would’ve sent the data in 500ms, what is taking the extra second?
@@ramonsouza9846 well, 11:15 is the *upgraded* version of the app... Sure it's faster, but I wouldn't call a turtle amazing when compared to a snail if it could have the speed of a rabbit...
This puts a limit on the bandwidth available as you are proxying the file uploads to s3, if you have a ton of concurrent uploads you will also need to scale your own servers.
Not if the "Ingress Server" is on AWS EC2. Instead of paying S3 traffic coming from internet, they are paying S3 traffic from inside AWS (Which may be even cheaper). Incoming traffic to the server from internet is free (Well, let's say included on the per-hour price)
@@framegrace1 I am not talking about pricing, but about bandwidth, S3 has distributed endpoints for content delivery and you can have 100s of people upload simulateniously at high mbps, on the other hand your one ec2 instance is limited to whatever mbps amazon has to it, and if you try to upload 4-5 big files at the same time (from different users with good bandwidth) it will bottleneck it for everyone
@@framegrace1 And what is your point exactly? What i said is that to handle more concurrent users they will need to scale the number of instances they run. Then they need to use load balancing to distribute the content across the ec2 instances. And what is more you lose the advantages of the distributed infrastructure of s3 that amazon has built.
Did you do any load/performance tests for your UT Ingest Server? Would be really nice to have a video just on that :) Also scaling of this server is an interesting topic...
@@aaronevans7713 I suppose you'd only use the AWS S3 SDK in the back-end server anyway and send pre-signed URLs to the front-end, right? Otherwise you'd have to push some form of credentials to the front end. Honest question: What's the issue with a 3.2 MB (uncompressed JavaScript) client in the back-end?
Thanks @dsherc, this is insane. Just side note: this could be a bit misleading: I see that you mentioned we can’t check file size of filed being uploaded. But even if we can’t dynamically check file size while uploading, we can limit the max file size via adding the size cap to the pre-signed post. Essentially this is what i did: on upload requests we ask for the file size, we return the presigned url with file size added to the signature as cap.
This can be simply solved by client notifying the server once the file upload done. This is just over engineering at its finest. His reasoning was there will be ghost files if the client didn't notify the server. Solution to that is client always upload to a temp location and move the file to actual location when client notified the file has been uploaded. And you setup a s3 lifecycle to delete files based on the update date.
does it allow for resume and improve the time for smaller uploads? Still, they made their changes and going back to S3 isn't feasible for their marketing too. Plus they now support other types of "buckets" so I guess it isn't just S3 being inefficiently use, instead it gives them marketing leverage to be more independent and agnostic
So the upload/forward from the UT ingest server to the "S3" is now not validated. Which means if the connection between those two fails for some reason at any point, you get invalid results. That is a huge cost. In theory even if you had validation, the ingest server would need to store the files until the actual upload/forward completes. Yes, even if you practically pipe the upload directly between two sockets. Additionally you keep connections alive (from the file upload to the ingest server) while waiting for the response of the external server. That's not good. If these servers take longer than expected to respond, your ingest server may stack a bunch of inactive sockets which it keeps open for no other reason than waiting. You essentially now have an external bottleneck for your hosted server, costing you resources. Also as you said by yourself. The difference is much higher with smaller files which just means that your overhead from different requests got reduced. Because of course less requests means, less added latency. The percentages are kind of misleading. You would actually need a graph that shows difference depending on file size.
Did you account for filesystem caching before/after in your demo? I assume so, given the breadth of architecture changes you described. But caching recently used files in RAM (as modern operating systems tend to do) can make a very noticeable difference in responsiveness. Especially if the files come from spinning disks, network storage, RAID with parity, SATA SSDs... Pretty much anything but NVME. Any kind of A/B performance testing where caching is a possibility requires either pre-caching all inputs (run it several times until the numbers look stable) or somehow guaranteeing that the inputs will never be cached. I'm sure you already knew that. But people often forget that it applies to their own demos.
I love it. I had implemented the same structure you had in the past and I was planing on creating an ingest to propagate super similar to your architecture. That’s a great validation of concept. I would love to use your project but I run all in GRPC to traffic the data.
Impressive. But wouldn't it be even faster, if we remove some more requests and roll the "Your Server", "UT Ingest Server" and "S3" components into a single thing managing uploaded files? Something that kind of works as a common base for data?
uploading should really just be a single chunked transfer http request with a single response. the server can easily athenticate that and save the partial data to get resumability, and more
I'd love to see Theo work on some Remix projects. Remix offers a great deal of built-in type safety, eliminating the need for extra implementation effort.
The bring your own bucket is really important. We have contracts at work that specify we have to story customer data in Australia, so if we can't control where it's stored, we can't use the service.
Huge improvements! It's great that you feel ready to tackle enterprises, but I can assure you - it's not easy, not at all. Data privacy standards are more looked at than ever, so I'd first go for SOC, HIPAA and EU variants of those to have certificates you can shield yourself against quick-shot enterprise questions :)
@@t3dotgg But YT manages their object storage :) I’m genuinely surprised there’s a market for what your company is offering-it’s something an above-average developer could probably knock out in a day as part of their sprint. That said, it takes real business savvy to identify a need and turn it into a viable product with customers. No criticism of your product at all-it’s more of an eye-opener for those of us in tech about how smart business moves can make all the difference.
@@hemanthaugust7217true, really amazing the javacripts guys can complicated everything, any reasonable back end dev finish that a single day with a lib
Fwiw, Lambdas are not the only way to have serveless compute in AWS. ECS Fargate also offers the benefits of serverless (scale to zero, pay for what you use, etc) without the limitations of Lambda.
Really cool stuff! Now that the infra is more flexible, something I'd love to see in UploadThing in the future is Cloudinary-like image transformations. UT would become a viable Cloudinary competitor with that! Could also be part of PicThing if you plan on doing more with it than just background removal :)
@@danhorus oh right, I got the impression it was clients own authentication and direct upload to S3. I obviously don't understand what this solution provides.
@@dancarter5595 An easier way to upload things? They also add some code to the process so you don't need to do it yourself. I mean it's like using Vercel so you don't have to set your infra.
Yeah, that's the thing here that sort of defeats using it for anything production that is user-data sensitive. In EU at least, cause the us ofc doesn't care for user data. Because you are going to be in breach of GDPR. Since you are the administrator of the data, you cannot share it with 3rd parties without consent.
Great success! It's also quite cute that, even after so many live-streams and videos that you have done, you end up sounding a bit like a school kid presenting their project the first-time in front of the class, when you are talking about something that you are really proud of.
I think you are ignoring a huge security loophole in your logic. If the browser gets the presigned URL, then they can just use it directly without having to go through your ingest server, thus ending up with ghost files anyways
I see the upload to the bucket from the client browser goes through te ingest server and forwards to the bucket hosting server. here is an idea for custom file scanning/checking: can there be a future where a a website can host their own "approval server" that receives a connection from the ingest server, and "listens in" on the file as it is being uploaded to the bucket server and gives a go/no back to the ingest server? it doesn't seem like it slows down the upload (as it is being scanned as it is uploaded), takes barely any time to get the green light, and if it gets rejected the ingest server just tells the bucket server to discard the upload and returns an error to the client browser. with how fast "just forward the packet" seems to be, it is mostly up to the approval server to respond quick enough. headers are always at the start and are the most checked thing to scan on, so by the time the file uploaded the headers has been processed and a green light has been given to ingest. Just an idea. let me know what you think.
Congrats on the launch, less complex and faster, net win 👍 I imagine you went with the previous architecture first because it let you bootstrap more quickly, without committing yet to the upfront cost of rolling & maintaining your own ingest server. is that right?
@@kevboutin everything is ok in the right context. Personally I tend to shy away from anything that names itself something that it clearly isn't. There are always servers...
@@m12652 so the name of something is your problem? The name is not a problem for me if it solves problems and increases productivity for less money. Priorities always vary I suppose. 🤷♀
Congrats on v7!! Quick question. If the uploads go to your server and then from your "proxy" to s3, aren't you duplicating network usage at the same time? I imagine that for large videos/files it would get quite expensive compared to the previous approach
@@t3dotgg nice! Hope they keep not charging for that in the future 😂 I guess this would become more noticeable if you allow "bring your own bucket" as it will no longer be in your account. What about the cost of re-processing/proxying the video/file on your server? You will go from 0 to "something". Really curious about this as well!
If theo makes this fully free (100% self hosted for everyone) I will be very happy It would be no longer a service tho But could offer premium capabilities for companies
Nice, we have a similar architecture. We built a file upload service for our healthcare application to allow clinicians to upload patient documents, which we also used for other clients. We never touch serveless. The system is deployed in Kubernetes and uses MinIO for the object store. Seeing uploadthing have some commercial success, I wonder if I should compete with you guys? Haha, nah too busy.
Well that is nothing surprising, everyone should know that each serverless our cloud computation application always has an overhead. It is like saying, the new built file upload in rust is 10x faster than in javascript lol
I'm doing a beginner's web dev course that has a file storage project. I ran into the latency issue with this architecture on day one. Originally I tried: 1. Client sends upload request to my server. 2. Server requests signed URL from Supabase. 3. Supabase responds with URL. 4. Server sends URL to client. 5. Client uploads and notifies server when it's done. 6. Server updates db and sends success response. I can't center a div but I could tell this was horrifically slow! I noticed immediately and switched to streaming through my server to Supabase which was 2-3x faster for small files.
hang on, are you basically getting the files from the clients now? Will you have the same bandwidth as the direct S3? Will you pay for the ingress traffic?
I love the update. Arguably, before, you didn’t really have a meaningful product when you were serverless (the value-add above using S3 was small), but now you really do.
Hey, not really a related question about the video, but how is the blog system set up? Is it just MDX, or is there something more behind it? Great video 👍
Very nice explanation and product improvement!! I'm curious about how this change would impact in the infrastructure cost and product pricing. Could you explore this topic?
I know this will sound smug and I am sorry, but: 100% faster should be 0 seconds. So 377% faster and 509% faster mentioned at 3:10 makes no sense, what do those numbers mean? How did you calculate them?
I believe he meant something like: 100% = two times as performant -> final time x/2 377% = three dot 77 times as performant -> final time x/3.77 If it took 5 seconds and now I takes me 1 second I would say my thingy is doing 500% better, because I can do one thingy five times in the time the old thingy took to do one
As someone who works with AWS for 3rd party security reviews, those enterprise features sound nice. Still, there’s a LOT of config settings that AWS requires (that are not always cheap and is constantly changing) to be meet the bar. Still, this is very cool infra design change and breakdown. I really appreciate this, folks who don’t work with AWS/cloud don’t understand.
I remember the whole serverless is designed to be short time, lightweight, infrequent requests for particular functionalities of your application. Hence the server doesn't need to run all the time and save your cost, and you don't need to maintaining the server. Lately, it was abused massively for all kinds of heavy tasks, which should belong to your own server. And people complain the serverless. The comment section is full of "devs" who say serverless is bad or host your own server is bad. Joke about the web dev, without understanding of those subtle details. The current generation has huge skill issues imo.
@@doc8527 Yeah, I get to witness some real nutball spaghetti lambda design. If you need to mange over 50+ lambdas for your backend plus have one for every single API, troubleshooting & DevOps becomes a nightmare. Gotta watch every lambda metric, have so many cloudwatch logs etc. Thats where Vercel like companies do serverless a little better, they're taking on more of that burden, but its priiicy! I'd pay for it in a heart beat to save me time though. Docker containers are where its at. Fargate/ECS that thing. Even EC2 management has improved a lot with CDK + SSM scripts.
When we trigger S3 uploads/copies through various means, rather than having our API state update the front end we allow our client to hit a headObject presigned url to assert that the object has successfully landed. Requires some ugly polling but it’s cheap polling
This is just meant to be educational to show what things can be slow and how to resolve them for unexperienced developers who haven't reached or considered these steps on their journey.
@@macchiato_1881 That was my 2nd guess but I was seeing a lot of comments in bad faith so I really couldn't tell without any tone indicators lol. In a way my reply speaks to them too
@@sanjaux why do you need tone indicators? People like you need to handle negative comments better. I get not all criticism is good. But are you just going to whine at every valid negative criticism or joke you get?
@@macchiato_1881 Well the actual jokes no I'd ignore those, but criticism is best resolved through talking it out. Since this isn't criticism, more signs would have helped differentiate your joke from something actually worth discussing. Handle them better? I'm just trying to understand the thought process behind some comments (the serious ones)
From an infrastructure and architecture perspective, are you using managed services in a cloud provider or deploying and managing your own orchestration systems?
With the approach of responding to the persistent connection on the ingest server - How do you handle scaling beyond one process/vps? Obviously if the original upload request occurs on one server and the subsequent onUploadComplete response could potentially be on another process entirely with no direct access to the original socket.
I get that rearranging the flow to have less steps should yield better performance. But i’m not sure what serverless vs serverful has to do here. Couldnt the ingest server still be on serverless? Also, do fargate (and cloud run on gcp) qualify as serverless? Looks like good places to deploy something like the ingest server without having to go full serverful.
This just in. Serverless proven to be a buzzword to keep you purchasing overpriced subscription model technology. In other news, paint is wet when applied.
This statement is probably coming from someone who has never built any applications professionally using serverless solutions. It's a paradigm shift and one many people haven't wrapped their heads around yet. People fear what they do not understand and despise things that require LOTS of real world work to become proficient in.
But what did you use to build your ingest server?!?! typescript? .NET? Go? Rust? Something else??? I wanna know the details about your serverFULL architecture!!! There's no details in your blog post either about what you used to build your ingest server in, how it's hosted, etc. I'm extremely interested in what you landed on for those tech choices.
It's kind of sad that resumable file transfer is a big feature now, because I remember it being a standard thing when I was a kid. It was lost somewhere along the way, and I'm glad to see someone is paying attention.
I wonder how pricing would work with "bring your own bucket". But we're very excited for it since our organisation has rules on what geolocation a bucket can exist in. And even just using local infrastructure.
@@Itsneil17 you know that this is like saying "just make you WordPress"? I guess Upload Thing is simpler, but getting right is really hard. That's why we use abstractions that hide the real complexity
Infra matters more than what frontend/client could ever achieve. Because on frontend you can only show the loader nothing else because client has limited internet bandwidth.
I'm filming a video tomorrow about all of the dumbest things people have said about UploadThing. Reply with some good ones here and you might get featured ;)
"Typical case of things developers care about, but the customers dont"
- some twitter user
@@martinlesko1521 that’s the one that inspired the video :’) The security one was too good as well
I've been using uploadthing for a long time now. I know how a S3 bucket works but honestly I got screwed up on handling the permissions of S3 initially. Uploadthing is faster, smoother to configure & clean in it's operations. I hope uploadthing becomes a norm for all the businesses. It's really good. Wishing good luck to Theo and Julias.
"Would rather use the much more stable and simpler Amazon S3, and does speed even matter? The user should be fine waiting a few more seconds." - Some guy in discord
"I mean, just self host. ¯\_(ツ)_/¯" - Another random discord guy
Rule No.2 when you make an App: Make it slow so that when you remove the slow logic in the code, you can brag about how fast it became.
What is rule No. 1?
I mean even if you didn't try to do that consciously, it would happen - you don't write everything perfectly the first time, specially when working on an MVP. It has to work before it can be optimized
😂
@@bastianventuraexactly, premature optimization is the death of projects. Make it work, then make it fast
@@bastianventura dude, it's not a nuclear fusion equation analysing app! it's a freakin S3 uploader! you could have it up and running in 1 prompt! but i guarantee you 90% of its code is to limit your ability to upload based on your tier... you should 1. PLAN 2. CODE 3. Optimize, i guessed he missed 1.
the web dev world is slowly reverting. soon we will get "we used literally zero npm packages and just vanilla JS, and our product shipped 10x faster, and the average API response time is 0.0001ms"
bro i'm writing on paper.
and I am here moving from vanilla JS into npm land..
who wouldve known that less is more
Curiously recently I discovered a way with vanilla navigation api and view transition to make an app like nextjs, with all features, faster and don't need build step
while the tooling has a few npm packages for sure Astro is great for that you can ship zero JS if you want, proper grid layout with a few lines of CSS (as much as I love tailwind it adds pages and pages of CSS), and the (optional) SSR features like Astro Actions are specifically designed to work without JS.
1) S3 does support resumability
2) File sizes can be checked using `content-length-range`
3) S3 can reject on file extension and mime types
3) You could have ditched Lambda and done a webhook back to the server
Truth! I was shaking my head during so much of this.
very very true
Lmao expecting proper knowledge from Theo is stupid. He's a UA-cam influenza
2024 is the year of serverlesslessness
Wouldn't it be a serverfulness?
bro left vercel and realised serverless is better
And serverless was never actually serverless
Or serverfulness
Ran out of VC money 😂
I can make it even simpler by removing your server and just by using free and open source Uppy to upload directly to S3/R2 or wherever, it has resumability and other plugins for free too
I think the difference is that Uppy's "ingest server" can be run by you (using Tus) or by Transloadit. You still need a server if you wish to have resumability and no ghost files, though
@@danhorus Interesting
Theo finally discovered servers. Massive win
More and more, I'm coming around to the idea that all these microservices, serverless, edge networks, etc. create way more complexity than is needed for the vast majority of use cases. We devs do love to complicate things.
but then deploying everything yourself isnt a great idea either
@@martinlesko1521 why not?
We are just learning. We want to make things better, so we try something new. Then the flaws show up and we adapt.
I've been saying that for years! Every major outage too its basically always one of DNS or _microservices_
Resume Driven Development.
Doesn't help AWS (in particular) sell you their shit even if it's worse for you.
1 to 2 years and we will have completed a full cycle. "new" web devs already "discovering" PHP again. Not long before people uploading HTML files to a nginx/Apache server again and calling it "zero dependency websites". This will be the new big thing.
One day we will figure out how to cut out the middleman entirely and upload straight to our own servers, which can then transcode files, upload them to S3, etc. Oh wait, we actually had that figured out in 2005...
I used to use TUS in C# and it was a pain in the ass, I ended up writing my own upload client and server code and the code was 10x simpler...
why would we do something faster and more logical when we can do something easy and new? Logic left the room long time ago
I had a debate with this bloke on his discord 2 years ago where he couldn’t fathom that I refused to use serverless for a next app due to serverless constraints and performance issues I was having. Good to see he’s finally coming around 🎉
S3 has resumability. You must tweak a bunch of config and code to do it. But it works.
the other day Theo was working on Laravel, now he's going back to servers, tech really is evolving backwards
The old ways are still best.
@brainiti I don't know about best, but it helps that the old ways were resource constrained so we know how to makes things well while being lean
Wait until he figures out how quick and simple FTP is...
Wait until pfqniet realizes that this is built for people with actual users...
@@t3dotgg well, in many cases FTP was enough for enterprises, so... :D
@@d3stinYwOw it still is 😢 (sftp will NEVER die)
@@t3dotgg just steer clear of bank tech and you'll never have to find out
@@t3dotgg lol you sound so goofy when you reply to people like this
Let us know about the costs difference later down the line, because serverless tends to be very expensive, but your new infrastructure uses a lot more bandwidth on the application side
I know the joke is that "this meeting could be an email"
But i feel UT could just be a blog describing best practices for S3 and or a config file.
What is the product offered?
😂 savage
I thought the selling point was that with upload thing your data never passes through it.
The most astonishing thing is how this can be a product someone pays for :) 99,99999% of is just S3.
Did you watch the video?
it's incredible how such nice things happen when Vercel turns off the taps 😉
next step would dont use a SaaS and setup S3 on your own
Why I need a service for this in the first place?
It turns out you really really don't. In fact it's probably dirtier and bad practice to use this.
Wow this is massive reduction in complexity! I hope though one day we'll have technology advanced enough to use this thing called "Your server" to store a file. Sure hope we would be able to achieve even less arrows on the graph then...
the worst part is, theo trash talk dhh's blog post about leaving the cloud just a year ago and he is slowly getting towards it...
Bro has 7 major versions in a year
We follow semver :)
@@t3dotgg7 breaking changes in a year? Still insane
@@alexeydmitrievich5970 it's a new product, of course they are gonna have a lot of breaking changes
Yikes 😬
Uhu, so your customers had to rewrite the entire integration 7 times in the same year? So sad for people with real projectd
one year later.
"we made file reading 10x faster and lowered our cloud cost 10x by going bare metal server. "
it i always nice to see people gets excited when they re invent the wheel
How is it even possible to upload 4MB of images in 1.5 seconds, nooo, impossible, upload so fast. I mean what are we even watching...
So uploadthing is an abstraction on s3? S3 already has a dead simple API so what am I missing?
Legitimate question, but isn’t 1.5s to upload 3.2MB still really slow? I don’t know what kind of internet you have, but a 50mbps upload would’ve sent the data in 500ms, what is taking the extra second?
Groundwork, check the 5:58 mark.
@@ramonsouza9846 well, 11:15 is the *upgraded* version of the app... Sure it's faster, but I wouldn't call a turtle amazing when compared to a snail if it could have the speed of a rabbit...
Hoped for more info about the new server ( why no serverless, what's the tech, etc. ), but this looks amazing and makes sense now 😊 Great video 😊
So, if I write my own upload logic, instead of using serverless upload services (like uploadthing), my apps will be much faster?
This puts a limit on the bandwidth available as you are proxying the file uploads to s3, if you have a ton of concurrent uploads you will also need to scale your own servers.
Not if the "Ingress Server" is on AWS EC2. Instead of paying S3 traffic coming from internet, they are paying S3 traffic from inside AWS (Which may be even cheaper).
Incoming traffic to the server from internet is free (Well, let's say included on the per-hour price)
@@framegrace1 I am not talking about pricing, but about bandwidth, S3 has distributed endpoints for content delivery and you can have 100s of people upload simulateniously at high mbps, on the other hand your one ec2 instance is limited to whatever mbps amazon has to it, and if you try to upload 4-5 big files at the same time (from different users with good bandwidth) it will bottleneck it for everyone
@@halfsoft If they use a normal single EC2 instance on the free tier, of course. But I guess they have someone who knows what they are doing.
@@framegrace1 And what is your point exactly? What i said is that to handle more concurrent users they will need to scale the number of instances they run. Then they need to use load balancing to distribute the content across the ec2 instances. And what is more you lose the advantages of the distributed infrastructure of s3 that amazon has built.
we will be able to use uploadthing to upload to our own google bucket? mind blowing!
Theo realized that he would be homeless if he continued using serverless
Anyone needs a s3 upload proxy?😮
But why? U can just upload to s3 directly.
Did you do any load/performance tests for your UT Ingest Server? Would be really nice to have a video just on that :) Also scaling of this server is an interesting topic...
This might be a stupid question, but what is the advantage this service provides over a library integrated on my server or front end?
@@aaronevans7713 I suppose you'd only use the AWS S3 SDK in the back-end server anyway and send pre-signed URLs to the front-end, right? Otherwise you'd have to push some form of credentials to the front end. Honest question: What's the issue with a 3.2 MB (uncompressed JavaScript) client in the back-end?
Its the old convenience vs performance choice in software. A tale as old as time.
You've truly mastered the art of making things simple (or should I say, too simple) while monetizing the convenience. Well played. 👏
Sounds like serverless slop is circling back. Also Just uploading directly to S3 is theoretically still faster.
Interesting to see you share the thought process behind everything, helps to learn :)
Thanks @dsherc, this is insane.
Just side note: this could be a bit misleading:
I see that you mentioned we can’t check file size of filed being uploaded.
But even if we can’t dynamically check file size while uploading, we can limit the max file size via adding the size cap to the pre-signed post.
Essentially this is what i did: on upload requests we ask for the file size, we return the presigned url with file size added to the signature as cap.
1.5s to upload 4 images and a total of under 4MB? That's the fast version that has chat asking how it's possible?
Yeah haha, people never deal with massive uploads, nowadays SaaS is the goat
This can be simply solved by client notifying the server once the file upload done. This is just over engineering at its finest. His reasoning was there will be ghost files if the client didn't notify the server. Solution to that is client always upload to a temp location and move the file to actual location when client notified the file has been uploaded. And you setup a s3 lifecycle to delete files based on the update date.
does it allow for resume and improve the time for smaller uploads? Still, they made their changes and going back to S3 isn't feasible for their marketing too. Plus they now support other types of "buckets" so I guess it isn't just S3 being inefficiently use, instead it gives them marketing leverage to be more independent and agnostic
Doesn't moving files cost money with S3? Not sure
@@theairaccumulator7144once in the region you can transfer within the region for free. It going back out the region will then cost again
We always did that with Rails years ago using a free gem maintained by the community not a SaaS company
> This can be simply solved by client notifying the server once the file upload done.
Rule #1 of web security:
Never believe the client.
So the upload/forward from the UT ingest server to the "S3" is now not validated. Which means if the connection between those two fails for some reason at any point, you get invalid results. That is a huge cost.
In theory even if you had validation, the ingest server would need to store the files until the actual upload/forward completes. Yes, even if you practically pipe the upload directly between two sockets.
Additionally you keep connections alive (from the file upload to the ingest server) while waiting for the response of the external server. That's not good. If these servers take longer than expected to respond, your ingest server may stack a bunch of inactive sockets which it keeps open for no other reason than waiting. You essentially now have an external bottleneck for your hosted server, costing you resources.
Also as you said by yourself. The difference is much higher with smaller files which just means that your overhead from different requests got reduced. Because of course less requests means, less added latency. The percentages are kind of misleading. You would actually need a graph that shows difference depending on file size.
Did you account for filesystem caching before/after in your demo? I assume so, given the breadth of architecture changes you described. But caching recently used files in RAM (as modern operating systems tend to do) can make a very noticeable difference in responsiveness. Especially if the files come from spinning disks, network storage, RAID with parity, SATA SSDs... Pretty much anything but NVME.
Any kind of A/B performance testing where caching is a possibility requires either pre-caching all inputs (run it several times until the numbers look stable) or somehow guaranteeing that the inputs will never be cached. I'm sure you already knew that. But people often forget that it applies to their own demos.
I love it. I had implemented the same structure you had in the past and I was planing on creating an ingest to propagate super similar to your architecture. That’s a great validation of concept.
I would love to use your project but I run all in GRPC to traffic the data.
Impressive. But wouldn't it be even faster, if we remove some more requests and roll the "Your Server", "UT Ingest Server" and "S3" components into a single thing managing uploaded files? Something that kind of works as a common base for data?
uploading should really just be a single chunked transfer http request with a single response. the server can easily athenticate that and save the partial data to get resumability, and more
I'd love to see Theo work on some Remix projects. Remix offers a great deal of built-in type safety, eliminating the need for extra implementation effort.
The bring your own bucket is really important. We have contracts at work that specify we have to story customer data in Australia, so if we can't control where it's stored, we can't use the service.
Hey @t3dotgg, curios to know if/how you have mitigated against slowloris DoS attacks with the new architecture?
What I have learned, when it comes to IT.. the absurd amount of work is usually necessary due to initial incompetence...
That upgrade sounds as the logical path. Amazing optimization and simplification from user perspective!
Huge improvements! It's great that you feel ready to tackle enterprises, but I can assure you - it's not easy, not at all. Data privacy standards are more looked at than ever, so I'd first go for SOC, HIPAA and EU variants of those to have certificates you can shield yourself against quick-shot enterprise questions :)
So the product is a S3 proxy server? Alright
Technically speaking, UA-cam is also just a proxy server on top of object storage ;)
Technically speaking that's only a part of their API
@@t3dotgg But YT manages their object storage :) I’m genuinely surprised there’s a market for what your company is offering-it’s something an above-average developer could probably knock out in a day as part of their sprint. That said, it takes real business savvy to identify a need and turn it into a viable product with customers. No criticism of your product at all-it’s more of an eye-opener for those of us in tech about how smart business moves can make all the difference.
@@hemanthaugust7217true, really amazing the javacripts guys can complicated everything, any reasonable back end dev finish that a single day with a lib
Everything is an API over a storage
Fwiw, Lambdas are not the only way to have serveless compute in AWS. ECS Fargate also offers the benefits of serverless (scale to zero, pay for what you use, etc) without the limitations of Lambda.
Really cool stuff! Now that the infra is more flexible, something I'd love to see in UploadThing in the future is Cloudinary-like image transformations. UT would become a viable Cloudinary competitor with that! Could also be part of PicThing if you plan on doing more with it than just background removal :)
You should take a look at how PicThing is handling images ;)
Oh cool, now my third party upload service has access to all the data I store. Neat.
They already had access before, no? It's their S3 bucket
@@danhorus oh right, I got the impression it was clients own authentication and direct upload to S3. I obviously don't understand what this solution provides.
@@dancarter5595 An easier way to upload things? They also add some code to the process so you don't need to do it yourself. I mean it's like using Vercel so you don't have to set your infra.
Yeah, that's the thing here that sort of defeats using it for anything production that is user-data sensitive. In EU at least, cause the us ofc doesn't care for user data. Because you are going to be in breach of GDPR. Since you are the administrator of the data, you cannot share it with 3rd parties without consent.
Great success! It's also quite cute that, even after so many live-streams and videos that you have done, you end up sounding a bit like a school kid presenting their project the first-time in front of the class, when you are talking about something that you are really proud of.
It's pretty cool you naturally use a sequential diagram to explain it without even thinking about it or at least mentioning it.
BYOB: Bring Your Own Bucket
Are bandwidth costs negligible now? If not this seems much more expensive for UT to scale.
Kudos Theo! And thank you for driving us away from serverless!
I think you are ignoring a huge security loophole in your logic. If the browser gets the presigned URL, then they can just use it directly without having to go through your ingest server, thus ending up with ghost files anyways
I see the upload to the bucket from the client browser goes through te ingest server and forwards to the bucket hosting server.
here is an idea for custom file scanning/checking:
can there be a future where a a website can host their own "approval server" that receives a connection from the ingest server,
and "listens in" on the file as it is being uploaded to the bucket server and gives a go/no back to the ingest server?
it doesn't seem like it slows down the upload (as it is being scanned as it is uploaded), takes barely any time to get the green light,
and if it gets rejected the ingest server just tells the bucket server to discard the upload and returns an error to the client browser.
with how fast "just forward the packet" seems to be, it is mostly up to the approval server to respond quick enough.
headers are always at the start and are the most checked thing to scan on,
so by the time the file uploaded the headers has been processed and a green light has been given to ingest.
Just an idea. let me know what you think.
Congrats on the launch, less complex and faster, net win 👍
I imagine you went with the previous architecture first because it let you bootstrap more quickly, without committing yet to the upfront cost of rolling & maintaining your own ingest server. is that right?
Serverless has become a huge pain, I'll definitely not use it for a new project.
It was just another pointless sales pitch...
This is such a ridiculous comment. I can provide dozens of real examples where serverless has transformed team productivity.
@@kevboutin everything is ok in the right context. Personally I tend to shy away from anything that names itself something that it clearly isn't. There are always servers...
@@m12652 so the name of something is your problem? The name is not a problem for me if it solves problems and increases productivity for less money. Priorities always vary I suppose. 🤷♀
Congrats on v7!! Quick question. If the uploads go to your server and then from your "proxy" to s3, aren't you duplicating network usage at the same time? I imagine that for large videos/files it would get quite expensive compared to the previous approach
If the storage and server are in the same AWS region and account, AWS will not charge :)
@@t3dotgg nice! Hope they keep not charging for that in the future 😂 I guess this would become more noticeable if you allow "bring your own bucket" as it will no longer be in your account.
What about the cost of re-processing/proxying the video/file on your server? You will go from 0 to "something". Really curious about this as well!
If theo makes this fully free (100% self hosted for everyone) I will be very happy
It would be no longer a service tho
But could offer premium capabilities for companies
Nice, we have a similar architecture. We built a file upload service for our healthcare application to allow clinicians to upload patient documents, which we also used for other clients. We never touch serveless. The system is deployed in Kubernetes and uses MinIO for the object store. Seeing uploadthing have some commercial success, I wonder if I should compete with you guys? Haha, nah too busy.
Well that is nothing surprising, everyone should know that each serverless our cloud computation application always has an overhead. It is like saying, the new built file upload in rust is 10x faster than in javascript lol
This is exactly the kind of content I crave. I love seeing how people improve the operational side of services. Also, this is open source? Seriously?
With the amount of time webdev goes full circle I am surprised we never get dizzy.
I'm doing a beginner's web dev course that has a file storage project. I ran into the latency issue with this architecture on day one. Originally I tried:
1. Client sends upload request to my server.
2. Server requests signed URL from Supabase.
3. Supabase responds with URL.
4. Server sends URL to client.
5. Client uploads and notifies server when it's done. 6. Server updates db and sends success response.
I can't center a div but I could tell this was horrifically slow! I noticed immediately and switched to streaming through my server to Supabase which was 2-3x faster for small files.
You improved your product by eliminating network hops as you should do. But the main component (s3) is still server less.
How do you host your ingestion server? Are you running your own k8s cluster?
hang on, are you basically getting the files from the clients now? Will you have the same bandwidth as the direct S3? Will you pay for the ingress traffic?
I love the update. Arguably, before, you didn’t really have a meaningful product when you were serverless (the value-add above using S3 was small), but now you really do.
Hey, not really a related question about the video, but how is the blog system set up? Is it just MDX, or is there something more behind it? Great video 👍
Time to remove all the sleeps in the code
So awesome. Curious where the service is hosted now and is it still all typescript or was go/rust needed to help with anything?
Very nice explanation and product improvement!!
I'm curious about how this change would impact in the infrastructure cost and product pricing.
Could you explore this topic?
I know this will sound smug and I am sorry, but: 100% faster should be 0 seconds. So 377% faster and 509% faster mentioned at 3:10 makes no sense, what do those numbers mean? How did you calculate them?
I believe he meant something like:
100% = two times as performant -> final time x/2
377% = three dot 77 times as performant -> final time x/3.77
If it took 5 seconds and now I takes me 1 second I would say my thingy is doing 500% better, because I can do one thingy five times in the time the old thingy took to do one
As someone who works with AWS for 3rd party security reviews, those enterprise features sound nice. Still, there’s a LOT of config settings that AWS requires (that are not always cheap and is constantly changing) to be meet the bar.
Still, this is very cool infra design change and breakdown. I really appreciate this, folks who don’t work with AWS/cloud don’t understand.
I remember the whole serverless is designed to be short time, lightweight, infrequent requests for particular functionalities of your application. Hence the server doesn't need to run all the time and save your cost, and you don't need to maintaining the server.
Lately, it was abused massively for all kinds of heavy tasks, which should belong to your own server. And people complain the serverless.
The comment section is full of "devs" who say serverless is bad or host your own server is bad. Joke about the web dev, without understanding of those subtle details.
The current generation has huge skill issues imo.
@@doc8527 Yeah, I get to witness some real nutball spaghetti lambda design. If you need to mange over 50+ lambdas for your backend plus have one for every single API, troubleshooting & DevOps becomes a nightmare. Gotta watch every lambda metric, have so many cloudwatch logs etc. Thats where Vercel like companies do serverless a little better, they're taking on more of that burden, but its priiicy! I'd pay for it in a heart beat to save me time though.
Docker containers are where its at. Fargate/ECS that thing. Even EC2 management has improved a lot with CDK + SSM scripts.
14:50 No offense but weird to round one up and the other down, when both are 733 ms.
3733 ms -> almost 4 seconds
733 ms -> almost half a second
If the amount is greater than 1, it's natural to round to a whole number :)
Yeah, the double standard rouding is a bit cringe. But you can tell he's really happy, and with those numbers, I'd be happy too.
When we trigger S3 uploads/copies through various means, rather than having our API state update the front end we allow our client to hit a headObject presigned url to assert that the object has successfully landed. Requires some ugly polling but it’s cheap polling
You mean to say, removing a thing which causes you thing to be slow makes your thing go fast? 🤯🤯🤯🤯🤯🤯
This is just meant to be educational to show what things can be slow and how to resolve them for unexperienced developers who haven't reached or considered these steps on their journey.
@@sanjaux it's a joke. Jesus karen
@@macchiato_1881 That was my 2nd guess but I was seeing a lot of comments in bad faith so I really couldn't tell without any tone indicators lol. In a way my reply speaks to them too
@@sanjaux why do you need tone indicators? People like you need to handle negative comments better. I get not all criticism is good. But are you just going to whine at every valid negative criticism or joke you get?
@@macchiato_1881 Well the actual jokes no I'd ignore those, but criticism is best resolved through talking it out. Since this isn't criticism, more signs would have helped differentiate your joke from something actually worth discussing. Handle them better? I'm just trying to understand the thought process behind some comments (the serious ones)
From an infrastructure and architecture perspective, are you using managed services in a cloud provider or deploying and managing your own orchestration systems?
With the approach of responding to the persistent connection on the ingest server - How do you handle scaling beyond one process/vps?
Obviously if the original upload request occurs on one server and the subsequent onUploadComplete response could potentially be on another process entirely with no direct access to the original socket.
Up next: I went back to client side react
I get that rearranging the flow to have less steps should yield better performance. But i’m not sure what serverless vs serverful has to do here.
Couldnt the ingest server still be on serverless? Also, do fargate (and cloud run on gcp) qualify as serverless? Looks like good places to deploy something like the ingest server without having to go full serverful.
But now your server handles all traffic which cloud be a problem if it doesn't hold well.
"just forward the packet bro, don't process it" *makes app 5x faster*
This just in. Serverless proven to be a buzzword to keep you purchasing overpriced subscription model technology. In other news, paint is wet when applied.
This statement is probably coming from someone who has never built any applications professionally using serverless solutions. It's a paradigm shift and one many people haven't wrapped their heads around yet. People fear what they do not understand and despise things that require LOTS of real world work to become proficient in.
C'mon bro, next you are gonna try to tell me water is wet or something?
But what did you use to build your ingest server?!?! typescript? .NET? Go? Rust? Something else??? I wanna know the details about your serverFULL architecture!!! There's no details in your blog post either about what you used to build your ingest server in, how it's hosted, etc. I'm extremely interested in what you landed on for those tech choices.
It's kind of sad that resumable file transfer is a big feature now, because I remember it being a standard thing when I was a kid. It was lost somewhere along the way, and I'm glad to see someone is paying attention.
S3 doesn't support resuming!? Jesus Christ. This is exactly what I mean.
Great use case! Love this kind of videos
Right after the Vercel sponsorship ended we get this..?
hmm... i dont get it. why not just request s3 upload permission from the client and upload directly to s3? bit confused...
What library should be used to create blog in your t3 stack app? Mainly for resources and forums
I wonder how pricing would work with "bring your own bucket". But we're very excited for it since our organisation has rules on what geolocation a bucket can exist in. And even just using local infrastructure.
Just make your own infra/software for this. Waste of money spending it on upload thing
@@Itsneil17 you know that this is like saying "just make you WordPress"? I guess Upload Thing is simpler, but getting right is really hard. That's why we use abstractions that hide the real complexity
@@Itsneil17 making our own infra/software also costs money.
@@RedPsyched I've made my own infra for stuff like this. Yes it wasn't cheap at the start but now it costs less than using 3rd party
@@Qrzychu92 yes infact new discovery that not everyone uses wp. People use frameworks not a drag and drop editor for building websites.
Infra matters more than what frontend/client could ever achieve.
Because on frontend you can only show the loader nothing else because client has limited internet bandwidth.