Notion hit 100 million users recently so I wanted to do quick cover of their database evolution. Hope you get to learn something from this. Thank you again for taking the time to watch this video and for your continued support!
This video has made me realise how much of a nightmare it must be to scale up a database in production. But Notion is insanely fast now so it obviously paid off well.
coming up with this solution is tough for sure but the real challange is orchestrating all the teams and people involved in this. that job is incredible and I bet there were a few key people who managed all this and had to do a lot of overtime to achieve it, especially when critical errors & bugs popped up
Exactly what I was thinking... I was reminescing over what my company went through when we converted from a monolith to a micro-services architecture, but this... this is something you can't do without investor money, the literal best talent, and some of the best management in the world. Whoever these engineers and project managers are should be incredibly proud. Also, can you imagine being a new backend dev or database guy at Notion :D:D:D:D:D:D
We recently did a db upscale with around 12tb of data which is just a fraction of what Notion did and it was already a herculion task for us. It took us weeks of planning and work to make it a success. Working with data is one of the most challenging things in IT
Your videos are so short and clean. Even though I am just a recent grad I get a lot of value from these vids. also didnt realize you could scale so much wwith postgresql
@@Flocksta yeah that is very true ! i totally agree with you. Talent is used to maximum but compensation is tried to keep at minimum for them to improve the profit margins, sad reality!
Great video!! Loved this level of detail along with the animations. This is a differentiating factor from many other videos on such topics that don't go into detail but cover such topics at a very high level. You could link to explanations of some of the concepts mentioned for understanding but continue keeping this level of detail as that is what makes it great in the first place!
This is great PR for Notion. I loved Notion when it arrived, went all in, then it slowed to a painful pace so I jumped to Obsidian.... This has got me buzzed to come back to Notion! Great video
Great video! Just want to appreciate your videos as no one else does good summaries of engineering blogs or writeups, and I appreciate the lack of dilution of the concepts since there's just way too much content catered to beginners and not enough of more mid-level content like yours (digestable, consumable summaries of interesting solution architecture writeups) out there on UA-cam.
Thank you for letting me know! It’s definitely a goal to make videos for people with experience. I was also worried that people wouldn’t be able to follow. But I’m glad that intermediate folks are okay with the pace
Great video. Great topic. Adapting your infrastructure to your customer growth is one of the hardest thing to do. Sooo many constraints. Great job notion !
A company I worked for faced similar issues during Covid. We were IOPS heavy, relied on SMB, Windows nested folders additionally. It was fixed using a technique similar to what Notion did here.
I think Notion is still pretty slow for a majorly text-oriented application. I mean yes it does support non-text objects, but it's majorly text-based, and it's as slow as OneNote sometimes. Should text really take that long to load? Idk
@@wz3xn9os3s Obsidian is a local application that works with files, while Notion is a shared application that works with databases between million users.
Its slow not because there is a lot of text, but because they have a lot of abstractions and services that they ask for your data in a representable format. Just like any other big company app, making many requests to many things at once seems like a fine approach. This is probably so that they allow large teams to work independently. I remember a Doordash developer interview that said they have around 500 microservices which is a bit too much for me. Good performant alternative to Notion is MediaWiki. Its design is "old-school" and it runs very quick.
@@veryCreativeName0001-zv1ir lol that's the stupidest comparison between Notion and Obsidian. I have been using Obsidian aggressively since more than a year, I can't be shifted to any other platform.
Wouldn't it be easier to use a No-SQL database like Cassandra? Cassandra already manages all the logic to distribute the data in partitions. It also distributes the data into the different nodes and by its nature it scales horizontally.
or easier, they could use YugabyteDB or CockroachDB, they are almost 100% postgres compatible and scales horizontally by automaticaly sharding the data
My guess is that in their core product they are relying heavily on some sql features that they couldn’t afford to lose and that’s why chose extreme sharding compared to no-sql
Just continually sharding their DB across more and more machines seems like a linear solution to their exponential user growth. Isn't there something they can change in their architecture to avoid needing 96 separate DB instances? That is sort of ridiculous.
My thought too. I suspect they could make the application much smarter by putting in-progress work into a non-sql database to avoid frequent writes to postgres. Also, one row for each text block seems over normalized. End armchair analysis.
Their team is big (It says that they are around 500 total employees), probably around 200, working on different parts of the app. Most of them probably fall into "this is not my job" or "I don't have enough power to say" type of situation and they keep patching.
@@KenSnyder1 seems like it would just shift the problem to another system. OK, your pgsql isn't getting hammered with writes, but now your redis, mongodb, etc. is and then it's still going to push all that data to pgsql anyway and also you have to pull down from both pgsql for committed data and then reconcile that with uncommitted data in your intermediate store in order to get consistency for the user. For users they also tend to notice read delays more than write delays unless the write delay is substantial or catastrophically fails. Besides which, this video is narrowly focused on how they fixed specifically a database problem. We don't know if they already had other performance solutions in place such as caching unchanged blocks or whole documents to avoid database reads.
Yeah, it kind of seems like they should've stuck with writing to a NoSQL database like Dynamo and streamline everything to be stored in the postgres database, maybe. Maybe they didn't do it because Notion needs immediate reads after writing data as events, but that would be probably faster using Kafka. But who am I to tell this is the best solution. That's not easy at all Imao.
awesome information so in-depth, would be great if you could explain the research that went behind learning about how they did it and why they did it! insane video 💯
Noting to never interview for Notion XD.. But jokes aside, it's a huge effort collaborating with the team all the while maintaining the development of such a feature.. kudos to the team
@@kikisbytes I personally think this video was perfectly paced and is the right length of time for what it covered. You obviously need some background in the concepts to understand them, so making it easier to understand would be to actually teach the concepts / technologies as well which would be an entirely different video, in my opinion.
Document based database seems the best data architecture here. Notion is very document centric. Having 1 document as a doc in db makes so much sense. Sharding and clustering would be a lot easier because the relationship between documents will be minimized. I guess they had a wrong architecture in the first place and its too hard to change in the middle of the exponential growth.
amazing engineering and a great video explaining it all, just wondering why you would be happy with ~20% cpu utilization during peak hours, sorry if it sounds like a noob question but i genuinely don't get it
CPU utilization is the amount of the CPU that the application is using up. So high CPU utilization is a bad thing. CPU utilization at 100% means your application taking up all the computational power of the CPU, which is bad because now no other programs can run
This is a good question! I used to work at a team where our postgres instance was nearing and sometimes hitting full utilization. This was scary because we were running some critical services and our db performance was soo bad that our queries were super slow to a point where requests were being dropped. So I can see why notion was happy that it dropped to ~20% and not having to deal with these types of issues. On the plus size, it gives room for future growth that they won't have to worry about for a while.
They allowed the utilization to go down because of the optimization they did. Keeping your utilization high can be dangerous because peak usage can cause bottlenecks and even cascade failures from time contraints. I had a project that was using 10% for 22 hours, but the other 2 hours it was taking 80% CPU. It is always better to have more space than you need. Plus at that scale that they are operating it does not really matter the cost and wastefulness.
if their user is exploding again then they would need to do the same thing again right? is this the industry standard on scaling the database or did they just stuck on this tech? I feel like this kind of scaling will hit a wall sometimes soon
Were those inconsistent size blocks within blocks within blocks stored out in the wild instead of belonging to a specific user? Also, having an id for each and every action must be a nightmare especially since they didn't do ULIDs.
Great video and channel overall! Just some feedback: I found the voice-over speed a bit too fast for educational content like this, which made it challenging to fully absorb all the information. Slowing down the player to 0.75x speed makes it too slow and isn't a practical solution. Perhaps a slight reduction in the speaking pace would enhance the learning experience. Hope this helps with finding the right pacing. Keep up the great work, you've just gained a new subscriber! 🤩 Edit: I would say the current speed feels like it's at 1.05x when it should be at 1.00x, just a touch too fast.
Awesome in-depth video. As stated in some other feedback comment, it might be a bit overwhelming for beginners or people with non-expert level of tech understanding (who are majority of the target audience on UA-cam). You could maybe incorporate some short explainations in about a concept (shard, pgbouncer, etc.). People who are interested in learning that concept can always go to a more detailed in-depth video (you can also route them to your topic related videos if available) More power to you and good luck! Subscribed
I disagree, it's nice to see a channel just tell an animated story like an engineering blog without watering everything down to a tutorial like every other channel
The reason postgres goes in read only mode when transaction ids are exhausted is - read operations do not need transactions, only write operations need. hence read operations can still go on without the need of new transaction ids for them.
It's still horribly slow and if you have bigger tables with connections to others it's unusable. Notion is just a passing trend that maybe shouldn't have happened.
It's quite possible they wouldn't see enough benefit from caching to justify using it. There might not be enough people sharing the same documents to see much performance improvement, and every time someone made an edit to a document the cache would need to be updated. I suppose it depends on how they check for updates, etc.
Notion hit 100 million users recently so I wanted to do quick cover of their database evolution. Hope you get to learn something from this. Thank you again for taking the time to watch this video and for your continued support!
What tools do you use to make your videos? I love them!
This video has made me realise how much of a nightmare it must be to scale up a database in production. But Notion is insanely fast now so it obviously paid off well.
hahah definitely!!
In which world is notion fast?
Always takes notable amount of time to just load my shopping list
notion is many thing but not fast 🤣
I'm passing by this nightmare in my company right now, in our case we're gonna migrate to TiDB instead of sharding
@@Ergydion I find the initial load can take a second or two but making edits are basically instant
What do you want to shard?
Notion engineers: YES
coming up with this solution is tough for sure but the real challange is orchestrating all the teams and people involved in this. that job is incredible and I bet there were a few key people who managed all this and had to do a lot of overtime to achieve it, especially when critical errors & bugs popped up
Exactly what I was thinking... I was reminescing over what my company went through when we converted from a monolith to a micro-services architecture, but this... this is something you can't do without investor money, the literal best talent, and some of the best management in the world.
Whoever these engineers and project managers are should be incredibly proud.
Also, can you imagine being a new backend dev or database guy at Notion :D:D:D:D:D:D
We recently did a db upscale with around 12tb of data which is just a fraction of what Notion did and it was already a herculion task for us. It took us weeks of planning and work to make it a success. Working with data is one of the most challenging things in IT
bro casually dropped 1mil+ youtuber level content
hahaha that so nice for you to say!!
Думаю классный коммент, а тут еще и ру
insanely underrated channel, you're gonna be huge
hahah thank you!! Just want to make videos that are educational and fun to watch :)
Agreed. I wish there was more of this type of content. In-depth, real problem solving.
Your videos are so short and clean. Even though I am just a recent grad I get a lot of value from these vids. also didnt realize you could scale so much wwith postgresql
yay I'm so glad! As long as you can learn something new I'm happy!! Are you currently job hunting or already working?
@@kikisbytes
I am job hunting. 😂Help me get a job
Good god I feel tired just going through this can't even imagine the stress on DBAs and System architects in Notion
This is crasy good content dude! You will be 1+ million views in no time
awhhh thank you, I appreciate that!! 😭
Engineering team at notion did a fantastic job !
for sure!
Yea they did an amazing job hiring a young freelancers underpaying theym by a factor of 2/3.
@@Flocksta yeah that is very true !
i totally agree with you.
Talent is used to maximum but compensation is tried to keep at minimum for them to improve the profit margins, sad reality!
Great video!! Loved this level of detail along with the animations. This is a differentiating factor from many other videos on such topics that don't go into detail but cover such topics at a very high level. You could link to explanations of some of the concepts mentioned for understanding but continue keeping this level of detail as that is what makes it great in the first place!
These videos are always so good, always happy to see when a new one is posted :)
Awhh thank you so much for your support! I truly appreciate that!
nice English subtitles, wow. you deserve a like!
Thank you!
This is great PR for Notion. I loved Notion when it arrived, went all in, then it slowed to a painful pace so I jumped to Obsidian.... This has got me buzzed to come back to Notion! Great video
Watched a couple vids and they're wicked! Love the newer videos you've been uploading!
Thank you!!!
I'm not smart enough to be here.
Bro for real I'm gonna shard myself in a minute
@@kratosgodofwar777 "Go shard yourself" might be the most CS insult ever
yeah same, i'm just nodding the entire time like i know what i'm watching
What an amazing video, production quality at its highest level. 😁
so educational and entertaining at the same time!! i know nothing about systems but the video was so well-paced and funny I kept watching
Thank you Tokuyuu I'm going to cry now😭 Awaiting your next release!
Great video! Just want to appreciate your videos as no one else does good summaries of engineering blogs or writeups, and I appreciate the lack of dilution of the concepts since there's just way too much content catered to beginners and not enough of more mid-level content like yours (digestable, consumable summaries of interesting solution architecture writeups) out there on UA-cam.
Thank you for letting me know! It’s definitely a goal to make videos for people with experience. I was also worried that people wouldn’t be able to follow. But I’m glad that intermediate folks are okay with the pace
Amazing Video! I'll have to rewatch this over and over to understand it more.
Thank you for the Heavenly Path cameo!
Great video. Great topic. Adapting your infrastructure to your customer growth is one of the hardest thing to do. Sooo many constraints. Great job notion !
Great animations! Don't stop this
ty ty glad you enjoyed this video!
A company I worked for faced similar issues during Covid. We were IOPS heavy, relied on SMB, Windows nested folders additionally. It was fixed using a technique similar to what Notion did here.
Heh the profile Pic explains why it's so
I think Notion is still pretty slow for a majorly text-oriented application. I mean yes it does support non-text objects, but it's majorly text-based, and it's as slow as OneNote sometimes. Should text really take that long to load? Idk
Go, Obsidian, go!!!
@@wz3xn9os3s Obsidian is a local application that works with files, while Notion is a shared application that works with databases between million users.
Its slow not because there is a lot of text, but because they have a lot of abstractions and services that they ask for your data in a representable format. Just like any other big company app, making many requests to many things at once seems like a fine approach. This is probably so that they allow large teams to work independently. I remember a Doordash developer interview that said they have around 500 microservices which is a bit too much for me. Good performant alternative to Notion is MediaWiki. Its design is "old-school" and it runs very quick.
if you value time you use notion else use obsidian
@@veryCreativeName0001-zv1ir lol that's the stupidest comparison between Notion and Obsidian. I have been using Obsidian aggressively since more than a year, I can't be shifted to any other platform.
Wouldn't it be easier to use a No-SQL database like Cassandra?
Cassandra already manages all the logic to distribute the data in partitions.
It also distributes the data into the different nodes and by its nature it scales horizontally.
that's exactly what i suggested
or easier, they could use YugabyteDB or CockroachDB, they are almost 100% postgres compatible and scales horizontally by automaticaly sharding the data
My guess is that in their core product they are relying heavily on some sql features that they couldn’t afford to lose and that’s why chose extreme sharding compared to no-sql
Their data is relational, why would they use non-relational database?
@@alexander_farkas you are right. Why would someone would want to use a hammer to drive a nail if they already have drill? 😂
Very interesting video with some cool networking and ideas related to breaking up problems relating to their datastructures
as a newbie Sol Archi. my brain hurts lmao
Haha dw some day it’s gonna make sense 😉
Jokes aside how was your transition to solution architect?
Just continually sharding their DB across more and more machines seems like a linear solution to their exponential user growth. Isn't there something they can change in their architecture to avoid needing 96 separate DB instances? That is sort of ridiculous.
My thought too. I suspect they could make the application much smarter by putting in-progress work into a non-sql database to avoid frequent writes to postgres. Also, one row for each text block seems over normalized. End armchair analysis.
Their team is big (It says that they are around 500 total employees), probably around 200, working on different parts of the app. Most of them probably fall into "this is not my job" or "I don't have enough power to say" type of situation and they keep patching.
@@KenSnyder1 seems like it would just shift the problem to another system. OK, your pgsql isn't getting hammered with writes, but now your redis, mongodb, etc. is and then it's still going to push all that data to pgsql anyway and also you have to pull down from both pgsql for committed data and then reconcile that with uncommitted data in your intermediate store in order to get consistency for the user.
For users they also tend to notice read delays more than write delays unless the write delay is substantial or catastrophically fails.
Besides which, this video is narrowly focused on how they fixed specifically a database problem. We don't know if they already had other performance solutions in place such as caching unchanged blocks or whole documents to avoid database reads.
Yeah, it kind of seems like they should've stuck with writing to a NoSQL database like Dynamo and streamline everything to be stored in the postgres database, maybe. Maybe they didn't do it because Notion needs immediate reads after writing data as events, but that would be probably faster using Kafka. But who am I to tell this is the best solution. That's not easy at all Imao.
@@JoãoLinharesGomes One of their goal was also to reduce cost. Introducing Dynamo to such a large model would certainly not do that :D
Congrats for the content Kiki!
Thank you!
awesome information so in-depth, would be great if you could explain the research that went behind learning about how they did it and why they did it! insane video 💯
Thank you for the feedback!! Yeah I definitely cut down some details to try to fit within the time limit but will keep that in mind for the future
Noting to never interview for Notion XD.. But jokes aside, it's a huge effort collaborating with the team all the while maintaining the development of such a feature.. kudos to the team
That was an awesome explanation, I almost understood some of it!
Not your fault though, I'm not the brightest
Thank you for watching and please let me know how I can improve to make it even easier to understand!
@@kikisbytes I personally think this video was perfectly paced and is the right length of time for what it covered. You obviously need some background in the concepts to understand them, so making it easier to understand would be to actually teach the concepts / technologies as well which would be an entirely different video, in my opinion.
beautifully illustrated
Document based database seems the best data architecture here. Notion is very document centric. Having 1 document as a doc in db makes so much sense. Sharding and clustering would be a lot easier because the relationship between documents will be minimized.
I guess they had a wrong architecture in the first place and its too hard to change in the middle of the exponential growth.
How did you make this video? Was it all AFX from scratch, or something like Prezi?
Awesome video!
How do you make such awesome animations?
The things that come to mind when I see this: replication and upgrades. Good luck Notion!
Awesome content! What did you use for that animation? Very smooth.
Great video. Very nicely explained. Which software do you use to create these kind of animated videos ?
Having a record for each block of the document is crazy, I wonder what was the reason behind this decision.
You deserve more subscribers.
Imagine being the new guy on the DB team at Notion...
thank you for this good explanation
How are these animations made if you dont mind sharing? They are glorious! :) is it motion canvas??
This video overwhelms me🤯
Awesome make more videos explaining these stuff
Thank you, will do for sure!!
Hi Kiki. I enjoyed this video. In the future try to slow down a little during presentation & graphics for a better learning experience.
This shows why it was better to use a distributed DB in the first place. Cassandra, DynamoDB...
Exactly, would be interesting to calculate the technical debt due to Postgres in that case vs using a distributed solution
"in the first place", oh wow we got a genius over here.
This is amazing!
Really nice how do you edit your videos?
awesome video dude, thanks for this great video
Thank you for taking the time to watch this video!
Great video. To the point without any zig zag, but the audio do not feel natural.
Thank you for the feedback. I'm still trying to figuring out audio so please bear with me while I get the right settings :)
tldw: sharding + better connection pooling + pub-sub based migration
Really enjoying your videos, keep them up!
imagine going to the meeting with stakeholders and explaining to them why the billing jump 400% in one month.
db migrations are always painful, great to see they had a solution
Thanks for the video
Amazing job
amazing engineering and a great video explaining it all, just wondering why you would be happy with ~20% cpu utilization during peak hours, sorry if it sounds like a noob question but i genuinely don't get it
CPU utilization is the amount of the CPU that the application is using up. So high CPU utilization is a bad thing. CPU utilization at 100% means your application taking up all the computational power of the CPU, which is bad because now no other programs can run
@@sakamad4856 i assume notion would be running their dbs on dedicated servers? i get why 100% would be bad, but 20 seems too low lol
I think they're saying is what used to be 90-100%+ utilization is now 20%, not that 20% is some magical number they landed on
This is a good question! I used to work at a team where our postgres instance was nearing and sometimes hitting full utilization. This was scary because we were running some critical services and our db performance was soo bad that our queries were super slow to a point where requests were being dropped. So I can see why notion was happy that it dropped to ~20% and not having to deal with these types of issues. On the plus size, it gives room for future growth that they won't have to worry about for a while.
They allowed the utilization to go down because of the optimization they did. Keeping your utilization high can be dangerous because peak usage can cause bottlenecks and even cascade failures from time contraints. I had a project that was using 10% for 22 hours, but the other 2 hours it was taking 80% CPU. It is always better to have more space than you need. Plus at that scale that they are operating it does not really matter the cost and wastefulness.
What did I just listen to at 4 in the morning
Great channel 🎉
Another top level video
Thank you so much!
Hello, thats a amazing content!!! keep doing and you will become 10ml channel soon!!!
what do you use for animations?
if their user is exploding again then they would need to do the same thing again right? is this the industry standard on scaling the database or did they just stuck on this tech? I feel like this kind of scaling will hit a wall sometimes soon
Next level of DB scalability is Decentrailzed Storage solutions.
Amazing video!
Ken!!! Omg thank you for taking the time to watch this video!!
Were those inconsistent size blocks within blocks within blocks stored out in the wild instead of belonging to a specific user?
Also, having an id for each and every action must be a nightmare especially since they didn't do ULIDs.
Great video and channel overall!
Just some feedback: I found the voice-over speed a bit too fast for educational content like this, which made it challenging to fully absorb all the information.
Slowing down the player to 0.75x speed makes it too slow and isn't a practical solution.
Perhaps a slight reduction in the speaking pace would enhance the learning experience.
Hope this helps with finding the right pacing.
Keep up the great work, you've just gained a new subscriber! 🤩
Edit: I would say the current speed feels like it's at 1.05x when it should be at 1.00x, just a touch too fast.
nah perfect for me
Thank you for the feedback!!! This is noted and I will try to make the pacing a better for the next video.
I watched this at 2x like most content and I considered reducing the speed to 1.5x but ultimately wasn’t necessary
It was good enough speed
I could not disagree more, english is not even my native language and I had no trouble to get all the content at 1x
Banger
Great video thanks
Notion is everything but fast. That's 4 sure. Amazing video anyways🎉 thanks 4 sharing
Thank you for watching!!
Next video, I wanna know how Kiki's Bytes channel scaled to 1M subscribers without exploding
hahaha that made me laughed so hard 😂 . One can dream 😜
0:40 Oh, the friend you mentioned, did he also make this video?🐶
maybe... 😛
Awesome in-depth video. As stated in some other feedback comment, it might be a bit overwhelming for beginners or people with non-expert level of tech understanding (who are majority of the target audience on UA-cam).
You could maybe incorporate some short explainations in about a concept (shard, pgbouncer, etc.). People who are interested in learning that concept can always go to a more detailed in-depth video (you can also route them to your topic related videos if available)
More power to you and good luck! Subscribed
I disagree, it's nice to see a channel just tell an animated story like an engineering blog without watering everything down to a tutorial like every other channel
how do you do your animations?
It seems to me that they have overengineered their architecture and are solving problems the hard way, because they are smart enough to do it. KISS.
how you edit videos
Resharded resharding :)
same thought when I was doing the research 🤣
Notion bought out my email service, shut it down (they only properly warned us a week before they shut it down!), and I will never forgive them
Except not forgiving them, what else are you going to do?
@@87hb775yggg he will hack your e-dildos in Notion HQ
Same, not sure why anyone would ever want to use their product I swear half the notion users spend more time taking notes than actually using them
@@hepticftwthat’s the point
Currently building product with postgres this gives me nightmares 😢
Damn it ws a marathon❤
how to take backup and restore in case of sharding
The reason postgres goes in read only mode when transaction ids are exhausted is - read operations do not need transactions, only write operations need. hence read operations can still go on without the need of new transaction ids for them.
I’m curious if any in-memory caching was considered or also used on this expansion odyssey. Not every read needs to go to the database.
These are the kinds of problems I’d kill my business to have😂
Brah, they need to start looking into no sql databases like Scylla or Cassandra if their volume is this high.
exactly
I wonder why they did not use a document database from the get go
So what you're saying is that they should've started with Cassandra.
96 cpu still overwhelmed? w00t?
It's still horribly slow and if you have bigger tables with connections to others it's unusable. Notion is just a passing trend that maybe shouldn't have happened.
timeline and team size would be nice to know
could have used json type of database instead.
Why didn't they use a database layer like Redis for caching?😊
It's quite possible they wouldn't see enough benefit from caching to justify using it. There might not be enough people sharing the same documents to see much performance improvement, and every time someone made an edit to a document the cache would need to be updated.
I suppose it depends on how they check for updates, etc.
Relatively easy to scale when your customers don't interact
why not just use cockroachdb instead of manually sharding
that haaaa my friend has got me bro :D
how do you know?
Can and do they do backups?
Is this the same as db normalisation.