► What should I test next? ► AWS is expensive - Infra Support Fund: buymeacoffee.com/antonputra ► Benchmarks: ua-cam.com/play/PLiMWaCMwGJXmcDLvMQeORJ-j_jayKaLVn.html&si=p-UOaVM_6_SFx52H
Thanks again for this test Anton! Would be really interesting to see how latency compare when both client/server are on the same machine (with communication over a socket rather than localhost). This would help provide a more realistic scenario as SQLite would be a drop in replacement when everything is on the same machine. When client and servers are on different machines, usually SQLite can’t step in without non trivial architectural changes 🎉
3 дні тому
Thanks for all of your test. Mezzio vs Laravel next. Please.
I would like to see your thoughts on Aurora versus RDS MySQL, and I appreciate you sharing the results of so many technologies already evaluated. Congrats!
Do you think any of your tests allow you to take full advantage of the concurrency of Go/Elixir and fault tolerance of BEAM? Or would you need to design new tests to properly check the real world performance of languages like Erlang, Elixir and Gleam? Or is it just too expensive to properly test concurrency?
I use SQLite as an intermediate log storage engine before logs get shipped out. We can do billion+ inserts per day. Every day. It never fails. It's a Swiss Army knife database
@@jitxhere we use Loki/grafana and opensearch for long term storage. SQLite is only present as a temporary buffer and gets pushed out by an agent to long term storage
Too bad embedded client sides library make unsafer and SQLite incapable for both client/server sides. You're right on this particular benchmark, why so serious
@@jesusruiz4073 You'll get faster low latency and more clients access processed at once if you put SQLite embedded in your client software devices. Library full of clients as user datasets on their device used endangering data theft by hacker who targeting clients devices because user data stored in user devices always connect to your software developer server who provide related services. Correct me if wrong.
I run potsgrSQL more often than MySQL (or rather MariaDB), but mostly the client and server are the same machine. That is a configuration that you really shouldn't have ignored, because I expect a significant difference with the case where the database is on a different server.
NOT just because of network latency. Postgres is made for concurrent access, not just sequential queries. If you try to write to SQLite with multiple threads, the locking will degrade its performance dramatically compared to Postgres.
That is a prejudice not based on actual facts. It would be good if Anton makes a benchmark to prove it. The current test is good, but I would have preferred to test a "typical" scenario for a Web application (eg. in Go), without limiting the threads. The bottleneck is the disk, not the number of writer threads.
@@habba5965 Fully agree, though in my applications I tend to observe on order of magnitude less (But it is yet much more than needed in practice). Glad to see that I am not alone on considering SQLite ready for many production use cases.
Great video. Regarding spikes, you can see them for either just more or less pronounced. They're likely maintenance tasks such as vacuuming, rebalancing, checkpointing (in case of SQLite WAL). If you wanted a more apples-to-apples comparison you could bench SQLite-servers such as PocketBase, TrailBase (author here), or libsqlD (not to be confused with local libsql, which is just SQLite). On a tangent: given how cheap SQLite operations are the FFI overhead of Go is actually not insignificant, you would probably get quite a bit more performance from a C/C++/Rust client.
@@AntonPutra That range is too wide. They are saying that they would like a small video for each test you do. This allows them to compare prices and make decisions.
@@AntonPutra Maybe a separate section every end of video that Test 1: PostgreSQL running for 2 hours costs 10 USD. Test 1: MySQL running for 2 hours costs 11 USD. Test 2: PostgreSQL running for 1 hour and 31 minutes costs 50 USD. Test 2: MySQL running for 1 hour and 31 minutes costs 49 USD.
@@startappguy Yeah, I think knowing the cost of each technology would be really interesting, but it's not really fair if one is cheaper because the requests failed, so you'd really want to know the cost of the first 100K requests or something like that., for each technology.
Thank you for showing this! I have known about and recommended SQLite for small projects for a long time. Thanks for sharing! Really good comparison to prove this for someone who didn't trust it at all! I think SQLite is a really good solution as a database for small and even medium-sized projects because it simplifies infrastructure and provides awesome performance on modern SSDs.
Yes, I agree. Except that SQLite supports up to 1000 tx/second, which is not a "medium-sized" project. In reality, the limitations of SQL for "normal" projects are not related to performance, but with how the project is structured. For example, you can not have different applications (even with different business logic) writing to the same instance of the database. Well, you can not with the "vanila" version of SQLite. With things like rqlite that limitation dissapears (together with most of the performance advantage, of course).
What about the same test, just that the PostgreSQL server and client are installed on the same local server, then compare it with SQLite. Love your content, keep them coming!
@@kevikiru That is correct. Though it will for sure be faster due to it running on localhost now essentially, you still have the translation layer of Ethernet frames/TCP/PG protocol. So that will always add time compared to just a function call and a file open syscall.
Yes, That would be a more similar comparison. And also to raise the limitation on threads, to make the comparison more "real-world". However, I expect to see SQLite beating PostgreSQL (as per my experience). Up to 500 tx/second and when I can have a single application server, I use SQLite. Much simpler to operate and maintain.
Thank you so much for this! I’m one of the (many) who have asked for it. For the haters out there: yes they don’t totally compare, BUT, both can be used to store and manipulate data with SQL and in sooo many situations, especially on small/medium websites, the setup ends up being application and DB running on the same machine. In these situations (being either MySQL [whatever the flavour Percona, Maria…], Postgre or MS) having a client/server configuration just adds complexity, latency, configuration overhead and we needed to know about potential performance! Without this enlightening test, people would still go choosing based on what’s trendy or heard of “best” in a completely different setup.
true, many personal projects would greatly benefit from using simple SQLite rather than spending time configuring and maintaining traditional databases
I store my data in A4 size paper using a printer and to read I use scanner and ocr and that is faster than mongo. . . . . . Just joking I use sqlite ❤️ for my personal projects. btw great video 👍🏻.
Use SQLite as often as possible unless you absolutely need the extra features of a full RDBMS. It’s incredible how much SQLite can handle. It’s even more impressive if you have a use case where it can run in ramdisk :)
Suggestion how to improve the test: Use multiple clients in parallel (slowly increasing them over time), since sqlite will lock and bottleneck on multiple concurrent writes (as per the official sqlite docs). Lets find the point on how much RPS or concurrent client writes a separate client-server db like postgres is better.
I enjoy these comaprison videos. Thanks for making them. FYI: I think the unexplained spikes are due data re-indexing and re-organisation that would happen as more data is inserted.
I would love to see two more tests when comparing these two: 1. Running Postgres on the same machine as the client 2. Make the application/client multithreaded (maybe using more cores as well), so they can actually issue more read/write requests at the same time. I'd like to see a scenario where the advantage of postgres being able to execute multiple write requests simultaniously could actually be observed (if that's possible).
I second this request. It will prove that SQLite is better than PostgreSQL in more realistic scenarios. This test was good, but to be more relevant it would be fantastic to prove that SQLite is better in those scenarios.
@@habba5965 That is not true in the (admittedly limited) tests that I performed 1 or 2 year ago. And it makes sense, when you use the WAL. Do you have factual proof of what you say? I think that Anton could just test it by not putting any limit on the threads. I know that many people are "passing around" that fake fact, coming from remote times before the WAL.
@@jesusruiz4073 WAL helps here but is still slower. The SQLite docs themselves express that it much prefers single threaded to avoid lock contention. Different scenario if you are using some Async runtime in your program though, since those are usually built to be pretty good at scheduling. In my testing I have found that for Rust Tokio is actually faster than SQLite itself when doing "parallel' (concurrent) writes.
@@habba5965 Fair enough. I was thinking on using Go (as Anton does many times) as a backend server for a web application, and do not put any restrictions on threads. Normal application developers in Go just access the DB in each incoming HTTP request, letting Go handle the multiplexing of goroutines to OS thrreads. This would be a simple but realistic scenario that application developers can relate to. And of course, in this scenario I would expect SQLite to win by a big margin, because it is an embedded database (with all the advantages and disadvantages that this may have ...)
Results are very interesting indeed. I even may try to use SQLite for some parts of the projects where it might be faster. Also it would really be interesting to compare SQLite with Postgres without network latency because for small/medium projects it is more common to have DB on the same server as application. It also might be interesting to compare connection to Postgres via local network and via unix socket. It is unclear for me how much faster unix sockets are.
Yes, but then you do not have persistence of the data in case of a crash. But I agree with you, in many cases, SQLite is a fantastic in-memory database with a standard (mostly) SQL interface. I have used it like that in many projects for many years.
@jesusruiz4073 you will have persistance if you do periodic backups and then importing latest one on restart. Or you can do io copy on the memory database itself, basicly duplacating its contents to peraiatent storage.
@@tandemwarhead Ok, for some use cases you can trade durability for performance, for example if it is not a disaster to lose some data when the machine crashes between backups. But with the very high performance of SQLite, I tend to have only two extreme cases: 1) full durability (if the app received OK, then the data is in the disk. 2) In-memory SQL database. What I mean is that for "business-type" applications the performance of SQLite with full durability is more than enough.
@@YuriyNasretdinov Yes, memory mode serializes everything (because it is in memory in the same process), but it is still faster than writing to a disk. Especially if you sync to disk every write: "Write transactions are very fast since they only involve writing the content once (versus twice for rollback-journal transactions) and because the writes are all sequential. Further, syncing the content to the disk is not required, as long as the application is willing to sacrifice durability following a power loss or hard reboot. (Writers sync the WAL on every transaction commit if PRAGMA synchronous is set to FULL but omit this sync if PRAGMA synchronous is set to NORMAL.)"
i really love your videos! Can you do the a single video, where you prepare all this infrastructure (Grafana, Prometheus and etc) and AWS itself. I guess, it would be great to see the process of preparation
Thank you so much for taking up this comparison :)! Also a very good introduction and explanation of the differences. Personally I'd be more interested in seeing this comparison with postgres on only one single machine, but this was already enough for me to move from postgres to sqlite for our next project. Previously I didn't trust sqlite enough, since a lot of information used to say that sqlite isn't for production, but I love working with it (we develop enterprise software, which runs on a single server. Currently we use postgres, since "nobody gets fired for choosing postgres", but sqlite would off er a lot of benefits to our use-cases, and the performance would apparently be more than enough). So thanks again, I found this highly valuable!
If you have less than 500 tx/second in your application, SQLite is better than PostgreSQL for many types of applications. And do not listen to the whiners complaining about the limitation of one writer: the bottleneck is the disk speed for most applications (like "normal" web applications). However, the main limitation is architectural: with "vanila" SQLite, the database belongs to the application. If the application exposes APIs, this is not a big thing, but you should look at these things before you make a decision.
Same here. ppl always telling me "do not use sqlite for prd". I did my own tests and started to using it in all projects, incluind e-commerce's with large datasets, with lots of writes and reads.. It's incredible how good SQLite performs. So, don't underestimate the potential of sqlite
Please always mention which version of what you are using :) Postgres v17 was just released in September 2024 and if according to some blogs aimmense step forward in performance compared to pg_16. Also which version of SQLite3 are you using? Thanks for the Video ^^
It does not matter. The difference is in the network roundtrip when using PostgreSQL (or any network database). For SQLite, the bottleneck is the disk, which is typically much faster than the network.
Always so interesting to see your benchmarks! Thank you for that! I think it will be very interesting to see how duckdb is behaving compared to sqlite on larger db size since duckdb is meant to be for analytics.
Again thanks for the videos. Really helpful. I only use SQLite on local device specially on mobile and PostgreSQL most of the time on server. Suggestion: Maybe you can add a separate section every end of video that talks about the pricing or total cost incurred during the test? Thank you! Example: Test 1: PostgreSQL running for 2 hours costs 10 USD. Test 1: MySQL running for 2 hours costs 11 USD. Test 2: PostgreSQL running for 1 hour and 31 minutes costs 50 USD. Test 2: MySQL running for 1 hour and 31 minutes costs 49 USD.
I love your analysis videos, they give so much insight into systems performance and I think this type should be tough at university. A couple of comments: 1. Why is Delete operation latency closer to insert than update? Deleted are usually combined with a where clause, so I would expect a table scan of some type. 2. I think showing the measures metrics along data size in the db (or some related value) would be very useful for picturing how much "data" it takes to reach certain behavior. 3. Using docker containers for postgres db and client in a same VM would be more like running on same server or different VMs as your test? My intuition would be that it would be faster, but then again VMs are usually in the same physical server
Thanks for doing these benchmark comparisons. I have a few comments for improvement In the first test, you should have reads as well. You could for example select from the product table as if you wanted to check the availability. in the second test you should fix the join types. do INNER JOIN of customer table, there will always be a customer for an order. It probably won't change the results because the customer table is small and you select a single order. Maybe you could select a range of orders instead?? You should definitely start with a large database. Any database engine can be fast with very little data. You should have thousands or customers and hundreds of products. Also thousands or millions of orders. Ideally, it would be nice to test a database larger than memory. You can generate CSV files, then some database engines have the option to quickly load a csv so you don't waste hours. Or, for example with SQLite, you could initialize the DB at home and upload the database to Amazon. Like others said, use multithreading. Let SQLite lock the DB for writes. The issue with your test is some people will think that SQLite is better in any use case which I'm sure it is not.
Just saw in SQLite documentation that WAL by default is not durable. From the documentation: "Further, syncing the content to the disk is not required, as long as the application is willing to sacrifice durability following a power loss or hard reboot." This means it is not fair to compare in this mode with other durable RDBMS. You should set PRAGMA synchronous to FULL if you test SQLite again.
I wonder how the performance compares with HDD latencies (HDDs are sometimes still used for DB due to reliability, but obviously not on AWS). Also I wonder if it would be fairer to postgres to test using UNIX socket instead of AF_INET. The spikes are probably AWS, the AWS has a bad reputation with some SQL admins I worked with.
The spikes can also be caused by vacuum process. It's Postgress's garbage collection process which uses quite a lot of resources and acquires locks in the process. But if you know any other reason that could cause the same spikes, please let me know, because we see them in our projects as well.
@@LukaszChelmicki it's time sharing in the KVM hypervisor. HyperV does it too, I don't mean to single out AWS. I/O is the biggest loser when virtualising.
What a champion. Thanks for the amazing content. Do you run these benchmarks locally or in the cloud? Is it possible for us to reproduce and possibly extend the benchmarks? Also, what is your favorite tool for creating diagrams and animations in your videos? They are very nice!!!
Thank you for this test. These two are so vastly different but comparing apples and Elephants is valid if you are comparing their vitamin C content. I hadn't considered disk IO cost as a possible factor.
Using docker compose to add a Postgres database to the same virtual machine that the web server is running on is common (like with FastAPI server.) You could compare a docker FastAPI server using SQLite vs a docker compose version that adds a postgres container. Many times people do not want to pay for a separate VM for a database. This is the situation that sqlite vs postgres is more common, because you're in maximum cost savings mode and are very concerned about the extra resources postgres needs, vs what it offers.
I would still choose Postgres because it can store Arrays natively without doing a relation with other tables which has nothing to do with benchmarks but it is a DX win
When designing tests you should consider the number of “independent variables”, which is the number of “things” you’re changing between two tests. In this video, you have two: postures vs sqlite and local vs network access. This means you can’t know how much of the dependent variable (outcome) is because of which independent variable. IMO you should have done two separe tests: one with just Postgres, both remote and local And another with both Postgres and sqlite, both local.
Привет! Спасибо за бенчмарки, было бы интересно сравнить производительность при размерах БД не помещающихся в память. Не понял из видео был ли произведён тюнинг постгри? У неё по умолчанию настройки такие чтобы запускаться на чайнике =)
Should have included local postgres listening on UNIX socket. Also please do a test with a few more CPU cores and concurrency. SQLite should do extremely well on concurrent reads. A bit less so on concurrent writes, but on 2 CPU cores I don't believe Postgres would pull ahead since SQLite does writes significantly faster than Postgres in general.
Really wish you hadnt brushed off the postgres advantage with concurrent users; that's the question not "if we use postgres wrong so it fits the model of sqlite how does it compare" it's "using both correctly how do they compare". You should really do a test with concurrent users emulating a normal user of a website and see how far you can push them. Im firmly in the sqlite is more than good enough for most things, and yes that includes concurrent users because the filesystem and not having the overhead is fast enough that single writer really doesnt matteru ntil you scaleto ridiculous numbers but it would be nice to see where that falls exactly but also the resource usage/costs of each
That’s cool, but back in the days when I was working on my first project there was one old network engineer guy who said the next: if your db is on public you must be fired. Still and always the truth.
That is a big truth. But I don't know why it is relevant for the SQLite/PostgreSQL comparison. Unless that network engineer assumes that people using SQLite are brainless, of course. Also, it may be that that network engineer does not really know about how to secure things, but just guessing.
Thanks, this confirms my experience: for Web applications, up to 1000 write tx/second, SQLite blows PostgreSQL away. This is one of the many "sweet spots" for SQLite. By the way, I would like to know who in the comments has an application with more than 1000 writes/second in production. When you reach the vertical scalability of Web applications with SQLite, then you need to scale horizontally. Until now, it was difficult to do with SQLite, but at this moment it is not a problem. Of course, people will continue using PostgreSQL for those scenarios where SQLite is better, mainly because it is something they know. And for some things, "better the devil you know than the angel you do not know".
Thank you for doing these tests, however, this is not relevant. You should test SQLite against plain file access for the single client and single core example. Run one of the larger database test frameworks to test 100s or 1000s of clients on computers with 10s of cores and 10s of GB RAM. You will see that SQLite deteriorates once you need advanced functionality and concurrent data access. Your scenario is not a good fit for Postgres.
► What should I test next?
► AWS is expensive - Infra Support Fund: buymeacoffee.com/antonputra
► Benchmarks: ua-cam.com/play/PLiMWaCMwGJXmcDLvMQeORJ-j_jayKaLVn.html&si=p-UOaVM_6_SFx52H
Thanks again for this test Anton! Would be really interesting to see how latency compare when both client/server are on the same machine (with communication over a socket rather than localhost). This would help provide a more realistic scenario as SQLite would be a drop in replacement when everything is on the same machine.
When client and servers are on different machines, usually SQLite can’t step in without non trivial architectural changes 🎉
Thanks for all of your test. Mezzio vs Laravel next. Please.
I would like to see your thoughts on Aurora versus RDS MySQL, and I appreciate you sharing the results of so many technologies already evaluated. Congrats!
It would be i teresting to compare different embedded databases: apache derby, h2 and sqlite
Do you think any of your tests allow you to take full advantage of the concurrency of Go/Elixir and fault tolerance of BEAM? Or would you need to design new tests to properly check the real world performance of languages like Erlang, Elixir and Gleam? Or is it just too expensive to properly test concurrency?
I use SQLite as an intermediate log storage engine before logs get shipped out. We can do billion+ inserts per day. Every day. It never fails. It's a Swiss Army knife database
cool! thanks for providing a use case
where is it finally stored?
@@jitxhereprobably at some S3 bucket in the cheapest TB/dollar area
@@jitxhere we use Loki/grafana and opensearch for long term storage. SQLite is only present as a temporary buffer and gets pushed out by an agent to long term storage
@@NerdyWasTaken S3 is expensive, better alternatives exist if storage expense is a concern.
Man you are truly a modern day hero. Thanks for listening to the requests
haha, thank you!
That request was a bit silly.
Just consider this test as another data point. Don't take it too seriously.
Too bad embedded client sides library make unsafer and SQLite incapable for both client/server sides. You're right on this particular benchmark, why so serious
@@aliandy.jf.nababan Cause people are sensitive with the technology they work with , they hate if its inferior
@@AntonPutra upsss
@@aliandy.jf.nababan Could you elaborate on this? It seems a prejudice more than something based on facts. My experience is the contrary.
@@jesusruiz4073 You'll get faster low latency and more clients access processed at once if you put SQLite embedded in your client software devices. Library full of clients as user datasets on their device used endangering data theft by hacker who targeting clients devices because user data stored in user devices always connect to your software developer server who provide related services. Correct me if wrong.
I run potsgrSQL more often than MySQL (or rather MariaDB), but mostly the client and server are the same machine. That is a configuration that you really shouldn't have ignored, because I expect a significant difference with the case where the database is on a different server.
for sure! i'll consider it as long as i get enough requests
@@AntonPutra well, then I have to second this request :D
+1
Yes I hope those thinking sqlite is better remember this fact lol no one separates postGres and expects local performance.
@@edism damn, I was replacing my database at this exactly moment
NOT just because of network latency. Postgres is made for concurrent access, not just sequential queries. If you try to write to SQLite with multiple threads, the locking will degrade its performance dramatically compared to Postgres.
Yep.
But you can use only one thread to write and multiple to read in the SQLite
That is a prejudice not based on actual facts. It would be good if Anton makes a benchmark to prove it. The current test is good, but I would have preferred to test a "typical" scenario for a Web application (eg. in Go), without limiting the threads.
The bottleneck is the disk, not the number of writer threads.
You have to be enormous scale to oversaturate a modern CPU and SSD for writes to SQLite. My laptop does 50k inserts/second. Reads can be parallelized.
@@habba5965 Fully agree, though in my applications I tend to observe on order of magnitude less (But it is yet much more than needed in practice).
Glad to see that I am not alone on considering SQLite ready for many production use cases.
Great video. Regarding spikes, you can see them for either just more or less pronounced. They're likely maintenance tasks such as vacuuming, rebalancing, checkpointing (in case of SQLite WAL).
If you wanted a more apples-to-apples comparison you could bench SQLite-servers such as PocketBase, TrailBase (author here), or libsqlD (not to be confused with local libsql, which is just SQLite).
On a tangent: given how cheap SQLite operations are the FFI overhead of Go is actually not insignificant, you would probably get quite a bit more performance from a C/C++/Rust client.
It would be interesting to see the costs incurred while running these tests in a separate video or appended to the end of each video
Yes!
sure, it’s between $10 and $50, depending on the time it takes to run
@@AntonPutra That range is too wide. They are saying that they would like a small video for each test you do. This allows them to compare prices and make decisions.
@@AntonPutra Maybe a separate section every end of video that
Test 1: PostgreSQL running for 2 hours costs 10 USD.
Test 1: MySQL running for 2 hours costs 11 USD.
Test 2: PostgreSQL running for 1 hour and 31 minutes costs 50 USD.
Test 2: MySQL running for 1 hour and 31 minutes costs 49 USD.
@@startappguy Yeah, I think knowing the cost of each technology would be really interesting, but it's not really fair if one is cheaper because the requests failed, so you'd really want to know the cost of the first 100K requests or something like that., for each technology.
Thank you for showing this! I have known about and recommended SQLite for small projects for a long time. Thanks for sharing! Really good comparison to prove this for someone who didn't trust it at all! I think SQLite is a really good solution as a database for small and even medium-sized projects because it simplifies infrastructure and provides awesome performance on modern SSDs.
rue, if you consider running your project or a website on the same VM/server, SQLite can work well!
Yes, I agree. Except that SQLite supports up to 1000 tx/second, which is not a "medium-sized" project. In reality, the limitations of SQL for "normal" projects are not related to performance, but with how the project is structured. For example, you can not have different applications (even with different business logic) writing to the same instance of the database. Well, you can not with the "vanila" version of SQLite. With things like rqlite that limitation dissapears (together with most of the performance advantage, of course).
What about the same test, just that the PostgreSQL server and client are installed on the same local server, then compare it with SQLite. Love your content, keep them coming!
There is still going to be a latency due to the network functions, even if local. I think.
@@kevikiru That is correct. Though it will for sure be faster due to it running on localhost now essentially, you still have the translation layer of Ethernet frames/TCP/PG protocol. So that will always add time compared to just a function call and a file open syscall.
@@kevikiru On the same server you can use unix sockets
Yes, That would be a more similar comparison. And also to raise the limitation on threads, to make the comparison more "real-world". However, I expect to see SQLite beating PostgreSQL (as per my experience). Up to 500 tx/second and when I can have a single application server, I use SQLite.
Much simpler to operate and maintain.
Just the explaination of the differences in the beginning between two tools is to the point, keep up the good work!
Thank you so much for this! I’m one of the (many) who have asked for it. For the haters out there: yes they don’t totally compare, BUT, both can be used to store and manipulate data with SQL and in sooo many situations, especially on small/medium websites, the setup ends up being application and DB running on the same machine.
In these situations (being either MySQL [whatever the flavour Percona, Maria…], Postgre or MS) having a client/server configuration just adds complexity, latency, configuration overhead and we needed to know about potential performance!
Without this enlightening test, people would still go choosing based on what’s trendy or heard of “best” in a completely different setup.
true, many personal projects would greatly benefit from using simple SQLite rather than spending time configuring and maintaining traditional databases
@@AntonPutra all the more when you get to see the application is often using the db’s root user…
I store my data in A4 size paper using a printer and to read I use scanner and ocr and that is faster than mongo.
.
.
.
.
.
Just joking I use sqlite ❤️ for my personal projects. btw great video 👍🏻.
thanks :)
I understand why you had to include the last line, some will think it's true 😅
@@AntonPutra Actually, It would be a great test for the 1st of April video 😆
I am always impressed when non-native English speakers give technical presentations. Very good.
thank you!
Use SQLite as often as possible unless you absolutely need the extra features of a full RDBMS.
It’s incredible how much SQLite can handle. It’s even more impressive if you have a use case where it can run in ramdisk :)
Really appreciate the introduction here, very well done
my pleasure!
Suggestion how to improve the test:
Use multiple clients in parallel (slowly increasing them over time), since sqlite will lock and bottleneck on multiple concurrent writes (as per the official sqlite docs).
Lets find the point on how much RPS or concurrent client writes a separate client-server db like postgres is better.
I enjoy these comaprison videos. Thanks for making them.
FYI: I think the unexplained spikes are due data re-indexing and re-organisation that would happen as more data is inserted.
awesome benchmark! would be interesting to see libsql by turso
I would love to see two more tests when comparing these two:
1. Running Postgres on the same machine as the client
2. Make the application/client multithreaded (maybe using more cores as well), so they can actually issue more read/write requests at the same time. I'd like to see a scenario where the advantage of postgres being able to execute multiple write requests simultaniously could actually be observed (if that's possible).
I second this request. It will prove that SQLite is better than PostgreSQL in more realistic scenarios. This test was good, but to be more relevant it would be fantastic to prove that SQLite is better in those scenarios.
Don't try to parallallize writes to SQLite, not worth it. It will actually be slower than doing them single threaded in nearly all cases.
@@habba5965 That is not true in the (admittedly limited) tests that I performed 1 or 2 year ago. And it makes sense, when you use the WAL. Do you have factual proof of what you say? I think that Anton could just test it by not putting any limit on the threads.
I know that many people are "passing around" that fake fact, coming from remote times before the WAL.
@@jesusruiz4073 WAL helps here but is still slower. The SQLite docs themselves express that it much prefers single threaded to avoid lock contention.
Different scenario if you are using some Async runtime in your program though, since those are usually built to be pretty good at scheduling. In my testing I have found that for Rust Tokio is actually faster than SQLite itself when doing "parallel' (concurrent) writes.
@@habba5965 Fair enough. I was thinking on using Go (as Anton does many times) as a backend server for a web application, and do not put any restrictions on threads. Normal application developers in Go just access the DB in each incoming HTTP request, letting Go handle the multiplexing of goroutines to OS thrreads.
This would be a simple but realistic scenario that application developers can relate to.
And of course, in this scenario I would expect SQLite to win by a big margin, because it is an embedded database (with all the advantages and disadvantages that this may have ...)
Results are very interesting indeed. I even may try to use SQLite for some parts of the projects where it might be faster. Also it would really be interesting to compare SQLite with Postgres without network latency because for small/medium projects it is more common to have DB on the same server as application. It also might be interesting to compare connection to Postgres via local network and via unix socket. It is unclear for me how much faster unix sockets are.
SQLite can be placed in RAM, drasticly increasing its performance. Read the SQLite doc about mode=memory and preferably cache=shared
Yes, but then you do not have persistence of the data in case of a crash. But I agree with you, in many cases, SQLite is a fantastic in-memory database with a standard (mostly) SQL interface. I have used it like that in many projects for many years.
@jesusruiz4073 you will have persistance if you do periodic backups and then importing latest one on restart. Or you can do io copy on the memory database itself, basicly duplacating its contents to peraiatent storage.
@@tandemwarhead Ok, for some use cases you can trade durability for performance, for example if it is not a disaster to lose some data when the machine crashes between backups. But with the very high performance of SQLite, I tend to have only two extreme cases: 1) full durability (if the app received OK, then the data is in the disk. 2) In-memory SQL database.
What I mean is that for "business-type" applications the performance of SQLite with full durability is more than enough.
I believe that in memory mode you can't have reads parallel with writes, so, surprisingly, WAL can still often be faster
@@YuriyNasretdinov Yes, memory mode serializes everything (because it is in memory in the same process), but it is still faster than writing to a disk. Especially if you sync to disk every write:
"Write transactions are very fast since they only involve writing the content once (versus twice for rollback-journal transactions) and because the writes are all sequential. Further, syncing the content to the disk is not required, as long as the application is willing to sacrifice durability following a power loss or hard reboot. (Writers sync the WAL on every transaction commit if PRAGMA synchronous is set to FULL but omit this sync if PRAGMA synchronous is set to NORMAL.)"
Insane production, amazing channel. Became a fan only one video! Keep up the good work!
Can you do postgres vs mongodb . Please gotta end this war.
i really love your videos!
Can you do the a single video, where you prepare all this infrastructure (Grafana, Prometheus and etc) and AWS itself. I guess, it would be great to see the process of preparation
You asked about the spikes. With WAL SQLite has checkpoints just like PostgreSQL.
I was just searching for Postgres vs SQLite benchmark 2 days ago. Nice breakdown, keep things up!
wow, that was sweet. Had to watch with popcorn on big screen. Thanks for the detailed test, buddy ❤
Thank you so much for taking up this comparison :)! Also a very good introduction and explanation of the differences.
Personally I'd be more interested in seeing this comparison with postgres on only one single machine, but this was already enough for me to move from postgres to sqlite for our next project. Previously I didn't trust sqlite enough, since a lot of information used to say that sqlite isn't for production, but I love working with it (we develop enterprise software, which runs on a single server. Currently we use postgres, since "nobody gets fired for choosing postgres", but sqlite would off er a lot of benefits to our use-cases, and the performance would apparently be more than enough). So thanks again, I found this highly valuable!
If you have less than 500 tx/second in your application, SQLite is better than PostgreSQL for many types of applications. And do not listen to the whiners complaining about the limitation of one writer: the bottleneck is the disk speed for most applications (like "normal" web applications).
However, the main limitation is architectural: with "vanila" SQLite, the database belongs to the application. If the application exposes APIs, this is not a big thing, but you should look at these things before you make a decision.
Same here. ppl always telling me "do not use sqlite for prd". I did my own tests and started to using it in all projects, incluind e-commerce's with large datasets, with lots of writes and reads.. It's incredible how good SQLite performs. So, don't underestimate the potential of sqlite
Please always mention which version of what you are using :)
Postgres v17 was just released in September 2024 and if according to some blogs aimmense step forward in performance compared to pg_16.
Also which version of SQLite3 are you using?
Thanks for the Video ^^
It does not matter. The difference is in the network roundtrip when using PostgreSQL (or any network database). For SQLite, the bottleneck is the disk, which is typically much faster than the network.
JVM vs .NET vs Go! Battle of the titans
Vs. Rust...
@@kingo55
It's comparing languages with gc
Rust doesn't...
well its not gonna be jvm
these are the best and useful videos nowadays .. congrats anton
Always so interesting to see your benchmarks!
Thank you for that!
I think it will be very interesting to see how duckdb is behaving compared to sqlite on larger db size since duckdb is meant to be for analytics.
Cool test! You can also try communicating with Postgres not using network but by Unix socket?
yes i will consider it
Again thanks for the videos. Really helpful. I only use SQLite on local device specially on mobile and PostgreSQL most of the time on server.
Suggestion:
Maybe you can add a separate section every end of video that talks about the pricing or total cost incurred during the test? Thank you!
Example:
Test 1: PostgreSQL running for 2 hours costs 10 USD.
Test 1: MySQL running for 2 hours costs 11 USD.
Test 2: PostgreSQL running for 1 hour and 31 minutes costs 50 USD.
Test 2: MySQL running for 1 hour and 31 minutes costs 49 USD.
This is a great video. Happy Diwali from New York !
Thank you for providing this community requested benchnark,
Much appreciated.
my pleasure!
I love your analysis videos, they give so much insight into systems performance and I think this type should be tough at university.
A couple of comments:
1. Why is Delete operation latency closer to insert than update? Deleted are usually combined with a where clause, so I would expect a table scan of some type.
2. I think showing the measures metrics along data size in the db (or some related value) would be very useful for picturing how much "data" it takes to reach certain behavior.
3. Using docker containers for postgres db and client in a same VM would be more like running on same server or different VMs as your test? My intuition would be that it would be faster, but then again VMs are usually in the same physical server
Thanks for the upload, I was one of the people asking!
The tests completely proved my points on the previous video comments, and some users didn't understand it.
Long live SqueelLite!
haha yes
I really would have liked if this included a third run with Postgres co-located with the client. Either way, great video as always
What is DB size in the end? Will something change if we start with prefilled DB , a ~few Gb ?
it's quite small, under gig
Spike on postgres is checkpoint + autovacuum which drop buffers to disks, reindex and update table stats
Yes, I agree. But you meant "SQLite", instead of "postgres", right?
make sense, thanks
Thanks for doing these benchmark comparisons. I have a few comments for improvement
In the first test, you should have reads as well. You could for example select from the product table as if you wanted to check the availability.
in the second test you should fix the join types. do INNER JOIN of customer table, there will always be a customer for an order. It probably won't change the results because the customer table is small and you select a single order. Maybe you could select a range of orders instead??
You should definitely start with a large database. Any database engine can be fast with very little data. You should have thousands or customers and hundreds of products. Also thousands or millions of orders. Ideally, it would be nice to test a database larger than memory. You can generate CSV files, then some database engines have the option to quickly load a csv so you don't waste hours. Or, for example with SQLite, you could initialize the DB at home and upload the database to Amazon.
Like others said, use multithreading. Let SQLite lock the DB for writes. The issue with your test is some people will think that SQLite is better in any use case which I'm sure it is not.
You could use Turso instead of raw SQLite
That could be interesting
Just saw in SQLite documentation that WAL by default is not durable. From the documentation:
"Further, syncing the content to the disk is not required, as long as the application is willing to sacrifice durability following a power loss or hard reboot."
This means it is not fair to compare in this mode with other durable RDBMS.
You should set PRAGMA synchronous to FULL if you test SQLite again.
yYeah, I know there are some limitations with wal, but in the first place, it's hard to compare them
I wonder how the performance compares with HDD latencies (HDDs are sometimes still used for DB due to reliability, but obviously not on AWS). Also I wonder if it would be fairer to postgres to test using UNIX socket instead of AF_INET. The spikes are probably AWS, the AWS has a bad reputation with some SQL admins I worked with.
Many people I ve worked with unaware of Unix sockets, thanks for mentioning it here
yes, if enough people ask me to rerun this test and colocate Postgres and a client on the same VM, I'll use a Unix socket for the client
The spikes can also be caused by vacuum process. It's Postgress's garbage collection process which uses quite a lot of resources and acquires locks in the process. But if you know any other reason that could cause the same spikes, please let me know, because we see them in our projects as well.
@@LukaszChelmicki it's time sharing in the KVM hypervisor. HyperV does it too, I don't mean to single out AWS. I/O is the biggest loser when virtualising.
Thanks so much for this. Is it possible to do Postgres Vs Mongo DB and Postgres Vs Cassandra/ScyllaDB
yes, i'll do mongo soon as well
@@AntonPutra That's great! Thank you
What a champion. Thanks for the amazing content.
Do you run these benchmarks locally or in the cloud? Is it possible for us to reproduce and possibly extend the benchmarks?
Also, what is your favorite tool for creating diagrams and animations in your videos? They are very nice!!!
I love the philosophy of SQLite. Imo this is the best DB for the new projects.
Thank you for this test. These two are so vastly different but comparing apples and Elephants is valid if you are comparing their vitamin C content. I hadn't considered disk IO cost as a possible factor.
Great test. What charting library did you use for app?
Great video again, what do you use for the animated architecture diagrams?
Using docker compose to add a Postgres database to the same virtual machine that the web server is running on is common (like with FastAPI server.) You could compare a docker FastAPI server using SQLite vs a docker compose version that adds a postgres container. Many times people do not want to pay for a separate VM for a database.
This is the situation that sqlite vs postgres is more common, because you're in maximum cost savings mode and are very concerned about the extra resources postgres needs, vs what it offers.
A test to determine how SQLite performance degrades with increasing file size and record count would be quite insightful.
I would still choose Postgres because it can store Arrays natively without doing a relation with other tables which has nothing to do with benchmarks but it is a DX win
When designing tests you should consider the number of “independent variables”, which is the number of “things” you’re changing between two tests.
In this video, you have two: postures vs sqlite and local vs network access.
This means you can’t know how much of the dependent variable (outcome) is because of which independent variable.
IMO you should have done two separe tests: one with just Postgres, both remote and local
And another with both Postgres and sqlite, both local.
Don’t get me wrong, loved this all all other videos! Keep it up :)
6:43 It's possible by interleaving work among workers. SQlite doesn't do it for users, let alone automatically.
sqlite has a huge potentials of adapting to diff scenarios, even wrapping as diff newsql db
Привет! Спасибо за бенчмарки, было бы интересно сравнить производительность при размерах БД не помещающихся в память. Не понял из видео был ли произведён тюнинг постгри? У неё по умолчанию настройки такие чтобы запускаться на чайнике =)
Думаю пики у постгри связанны либо с чекпойнтером либо с вакумом
Hi! Do you plan on benchmarking database conn poolers such as pgCat, pgBouncer, and odyssey in the future?
this what I needed. But its better to see few more scenarious: both apps are on single instance with finetuned connection pool for postgres.
Good video. For db comparison, should include db size for same data
Should have included local postgres listening on UNIX socket. Also please do a test with a few more CPU cores and concurrency. SQLite should do extremely well on concurrent reads. A bit less so on concurrent writes, but on 2 CPU cores I don't believe Postgres would pull ahead since SQLite does writes significantly faster than Postgres in general.
Lately, I start projects on SQLite and only upgrade to Postgres if I want to containerize it or use some fancy DB feature.
Thank you for this, very informative. Could you compare postgres with libsql?
Sqlite is capable of handling concurrent writes with WAL enabled
2:42 We can use low level tools to write files or modify disks in parallel... They should persist. It's not clear what do you mean by safe.
What about libsql instead of sqlite?
i think it's worth comparing. let's see what people think
Id love to see SQL Server vs PostreSQL
ok noted!
Awesome content. Thanks a lot.
Yugabyte is worth considering... it has the most scalability, postgres compatibility, and NO VACUUM!!
If DB is not large, you can place the SQLite file in /dev/shm/ and get a 50x+ performance boost 😀
Sqlite with pocketbase is all you need for your side projects believe me ❤
Could you create a benchmark video comparing the performance of ClickHouse with its competitors like PostgreSQL, MySQL, and others ?
It would be handy to have a graph or the data size either in rows or Gb so we can know when a database could start to degrade too much?
I didn’t notice how you configured your Postgres cluster. Just the defaults?
Maybe Dqlite against PostgresSql? It's just a Sqlite with raft based replication for clustered apps.
interesting, i'll take a look!
Hi! Can you please add rust SurrealDB to comparison?
It also has go client
yes it's in my list
Really wish you hadnt brushed off the postgres advantage with concurrent users; that's the question not "if we use postgres wrong so it fits the model of sqlite how does it compare" it's "using both correctly how do they compare". You should really do a test with concurrent users emulating a normal user of a website and see how far you can push them. Im firmly in the sqlite is more than good enough for most things, and yes that includes concurrent users because the filesystem and not having the overhead is fast enough that single writer really doesnt matteru ntil you scaleto ridiculous numbers but it would be nice to see where that falls exactly but also the resource usage/costs of each
That’s cool, but back in the days when I was working on my first project there was one old network engineer guy who said the next:
if your db is on public you must be fired.
Still and always the truth.
That is a big truth. But I don't know why it is relevant for the SQLite/PostgreSQL comparison. Unless that network engineer assumes that people using SQLite are brainless, of course. Also, it may be that that network engineer does not really know about how to secure things, but just guessing.
Thanks, this confirms my experience: for Web applications, up to 1000 write tx/second, SQLite blows PostgreSQL away. This is one of the many "sweet spots" for SQLite. By the way, I would like to know who in the comments has an application with more than 1000 writes/second in production.
When you reach the vertical scalability of Web applications with SQLite, then you need to scale horizontally. Until now, it was difficult to do with SQLite, but at this moment it is not a problem.
Of course, people will continue using PostgreSQL for those scenarios where SQLite is better, mainly because it is something they know. And for some things, "better the devil you know than the angel you do not know".
13:14 Risky and unreliable are bad words when talking about such things. What do you mean exactly by this?
Pretty sure SQLite allows concurrency for reads. You could test that against Postgres.
Can you please tell us how to configure this kind of server and how can I implement it for testing
Needs postGres on the same server to be fair.
No, what we need is a SQLite with network support hahaha
@navossoc lol
@@navossoc Yes, it is like breaking a leg to an athlete which is faster than what you wish: it is the only way to beat him.
Thank you for doing these tests, however, this is not relevant.
You should test SQLite against plain file access for the single client and single core example.
Run one of the larger database test frameworks to test 100s or 1000s of clients on computers with 10s of cores and 10s of GB RAM. You will see that SQLite deteriorates once you need advanced functionality and concurrent data access.
Your scenario is not a good fit for Postgres.
SQLite vs RocksDB, both are embedded databases.
Those spikes could be related to the wal checkpointing.
i’d like to see duckdb
There's never been a reason to use anything other than sqlite 😎
Any chance of comparing PostgreSQL to SqlServer?
very high chance! 😊
10:02 insert latency for sqlite should be the period when the engine reconciles the Wal (write ahead log) into the db. For postgres I dunno
For projects with ~ 50-100 users, sqlite is best option
Спасибо, Антон! Твои видео помогают в сложных выборах.
Python web frameworks
FastAPI vs Flask vs Django vs Robyn
Try to use a unix domain socket to connect to Postgres on the same server
What was the data size after which the requests per second deoped.
7:03 what have i seen right now?! TABS instead of spaces?!!!! UNFORGIVABLE!11