Beautiful explanation. I love the animations. As someone who's implemented a few persistent B-trees, allow me to point out a few more details: * This is the classic original B-tree. However, most databases use a B+tree, which is different in that the values are stored only in the leaves; keys in upper nodes just point to lower nodes. When a node splits, you don’t move the middle value up, it stays in one leaf or the other. * B-trees I’ve looked at, like SQLite, don’t have a fixed number of keys in a node. In real usage, keys and/or values are variable size, like strings, and the nodes are fixed-size disk pages (often 4kb.) The number of keys or values that fit in a node is highly variable. So instead you keep adding to a node until its size in *bytes* overflows a page, and then split. Some nodes might have a hundred keys, some might have only four. It doesn’t matter; the algorithms still work.
Yes, I was thinking about the latter point while I watched: "How do you decide on the number of keys per node? I guess you size it to just about fit into available memory, in order to minimize the number of expensive database queries." If the keys were of consistent size, you could just divide the memory you want to allocate by that size, but in most contexts I can see how it would be more practical to divide keys by size rather than by count. Do you get problems if e.g. you have adjacent large keys? Would that result in "wasted" space in a node?
@@SantiagoArizti no, but I found it confusing that he would undim parts of the tree as he said the sentence rather than just undimming the whole section together
@@thacium Some say that a special case of b-tree is equivalent to the red-black tree, another strategy which is, however, a binary search tree as well.
I have an exam on algorithms and data structures in 3 days and this video manages to break down hours of lectures into 12 minutes... Incredibly helpful
Wow! Great explanation! Basically the same as how we learned it 45 years ago, but with these animations it takes much less time. Back than, the professor was running around with chalk to change the diagrams on the chalkboard ...
So good, the easiest explanation for such a common data structure I've seen. It's crazy how we teach all these other trees in computer science classes but leave out the most used one. I especially appreciated the performance advantages against binary trees being explained.
Very elegant explanation and animation, you cleared up my misunderstandings from when I just read about b-trees. Your little blob dudes are great communicators!!
Man, I remember having balancing B-Trees as assignments in my CS exams and failing to get the right answers. This video does a much better job of explaining it than any class I took in university in a fraction of the time we spent learning it. Thanks for finally getting me to understand👍
This is a great video. Normally, databases do not store data in internal nodes, only leaf nodes and function as intermediate pointers. Leaf nodes on the other hand contain the actual data entries or pointers to those data entries. Additionally, most databases have pointers to neighboring leaf nodes. By the way, these pointers to neighboring leaf nodes help lot with solving concurrent operations on the B-Tree.
A different approach to concurrency is copy-on-write, where instead of changing a node in place you make a copy of it with the changes. This means changes don’t affect any concurrent readers since nodes are immutable. However, any nodes that point to the modified node also have to be updated to point to the new one, which means a change spreads up to the root node. (This may sound wasteful, but in practice you can modify new nodes in place because no one else can see them. So a node only has to be copied once during any transaction.) In this type of tree you can’t have links between siblings because you’d end up having to copy all the siblings too. Instead, when you’re going down the tree you keep a breadcrumb trail, your path, and to get to a sibling you look up, move one child, and go back down. This type of tree is used by CouchDB and LMDB, among others.
I more or less knew how B-trees work already, but this was just such a neat refresher that I now want to implement it (maybe together with a couple of formal verification proofs, to show that all these properties are always maintained)
Awesome explanation. I always had the feeling that I almost understand how B-trees work, but I wasn't quite there yet. This video showed me the things that I was missing. Thank you!
I've just started the CS50 AI course as background learning at work, and imagine my surprise to hear this voice again. I really enjoy your teaching style, and I'm so glad to have found more content from you!
There's an additional bonus -- nodes tend to be able to fill in some special size, like a disk block or a cache line, which allows further efficiency when doing comparisons for a search. That's why for some applications it's objectively better than even the red-black tree.
This is a REALLY good series and I wish Spanning Tree would post more content. They have a fantastic layout, information is conveyed well, and it's thorough but not so technical that people don't understand it. Please make more content. It's hard to find solid video series like this that explain important programming concepts. 10/10
Ah, I always struggled with B-trees during my Database Design class. If I had this video, I wouldn't have struggled at all! Fantastic work, very informative.
Absolutely Brilliant Explanation and Animation. I had never encountered B-Trees and became interested in them because they are not in A-level Computer Science Exam Boards. This video taught me B-trees in just 12 minutes and encouraged me to build my own in C++.Thanks!
I'm listening to "designing data intensive applications", and wasn't quite able to visualize a b-tree through audiobook only. This really cleared everything up. Thanks!
I've been a professional developer for 17 years now. A really good one. I always knew that BTrees are used to store database indexes and that they reduce search complexity to logarithm. I knew how binary trees and AVL trees work. I always wondered how BTrees work. I was too lazy to do the reading. Now I know. Thank you.
Something similar happened to me, I knew the tool and used it but i didn't fully grasp it. Now that I'm making my own SQLite in Rust is really helpful!
@@docjoesweeney I always wonder how people in 80s, used to write the code. I'm assuming there won't be any proper documentation or persistent internet connection for that matter.
Thank you a lot for the beautiful explanation of this topic! I always encounter the question about indexing algorithms and used data structures in databases for a data engineer position. So this video helps a lot in understanding b-trees. Thanks again!
is the first time i subscribe to a channel after i saw 5 seconds and skipped a couple of minutes after other 3 seconds. All your effort needs to be encouraged
This helped me a lot! The Animations are really well done and help explaining the thing you're talking about. This is how explanatory videos should be done!
I was looking to implement btrees a while back and all the literature on it were conflicting and varying. I like how you handled all the variances subtely. This is a great video, the definitive one on btrees for sure. Cheers.
It's really an awesome way of explanation ruling out drawing the trees and explaining...no words to appreciate...thank you for making such wonderful videos...great work
Important timestamps. **Insertion Scenarios:** 5:08 - Basic Insertion 6:20 - Another Basic Insertion with Root partially full 6:50 - Insertion with full root **Deletion Scenarios:** 7:48 - Basic Deletion 8:15 - Deletion of Key in Node w/Min Count (take sibling key - it becomes new separator - old separator is moved to original node where deletion occurred) 9:42 - Deletion of Key when Sibling Nodes are also at Minimum Count (merge sibling and separator together into single full node) 10:10 - Same as above, but causing Parent node to fall below Minimum Count as well 10:28 - Deletion of key from middle of tree (new separator needed - either take largest val in left subtree, or smallest val in right subtree - may need to take key from sibling or merge 2 nodes together if it causes child node to fall below min key count)
im not joking when i say, i was literally looking into making my own db 2 days ago, but then the optimization algorithms discouraged me because it looked so complex. however, this video was so easy to grasp that i might just end up committing to the project
Curious. What is your motivation to create your own DB? If you are creating DB for learning or teaching, then it is fine.There are already many open source databases, that are quite good. If you are creating your own DB, you will also need to consider caching, and a lot more features.
Thank you so much, i have an exam that includes b-trees, but ALL the explanations are plain text with a lot of nonsensical expressions and variables. This is so simple and straightforward compared to that c:
Awsome expl🙏🙏🙏 For me personaly at first was confusing that you dim not related part, I would intuitively expect get brighter the part where change was happening instead of dimming the rest.. Great work, thx for expaining 🙏
I bet, if this video was available while I was doing my bachelor's in CS back then in 2009, I would have got two Gold Medals instead of one. Life was so hard at that time, black & white text was so boring and everything had to be imagined, and I often fancied myself no less than Einstein himself performing some thought experiments on DS. I remember, when I took hold of basic DS, I started tutoring my friends in exchange for fast food in our favourite restaurants. Fondly enough, even my workstation PC today is named after my favourite Physicist, Einstein, just because of how I mastered DS and that paved my way to my role as a Backend Architect.
2:36 When designing a b-tree, how does the programmer determine what the maximum amount of keys in a node should be? For which applications is it appropriate to set the limit to 2, and which applications should have a limit of hundreds? There's a happy medium somewhere, but how is that happy medium calculated?
It depends on your application. For example for a database system I'd go with a node size that can fit in L1 cache of the processor. Or maybe use the filesystem block size? And then delete the size by the size of an element to figure out the number of elements in a block. And then test, modify and repeat.
In practice comparisons tend to be very cheap compared to main memory access, combine that with the fact that the processor loads memory in lines of cache (64 bytes) you want to pick a node size at least 64 bytes large. The sequential prefetcher makes 128 bytes a good option too
There are a variety of locality effects that can come into play. tactic00l mentioned cache line size which is a good minimum for in-memory b-trees. You might also size them to fit nicely into a block on disk, so you can make more comparisons for each disk read operation. There are subtler impacts from prefetch logic and alignment that might be worth measuring, depending on your microarchitecture or allocator respectively. some considerations might not be just about latency of accessing memory; depending on your concurrency model, likelihood of paging, data structures you are storing as values etc you might find reason to increase or decrease the number of keys stored in a node.
It's "B-tree." The point of a B-tree is to reduce accesses to secondary storage. If your B-tree resides on disk, then the node certainly should be as large as can fit in a sector, because you don't want to be reading disk space that you don't use. Whether it should be two sectors, or four, or something else, depands on the characteristics of the device and your judgment about what will work well. If your entire B-tree will reside in RAM, then you might as well use a 2-3 tree, which is the minimum B-tree.
I had to make a goddamn Red Black tree in C without any external library for a single almost worthless point in university. I didn't know a data structure could have that many bugs. That thing python b tree doesn't scare me.
Early in your development you must study machine code, assembly language, if you want to assess the actual processor cost of architecture choices, such as designing search trees. This is an excellent presentation, but in reality there are many essential layers below what is discussed here.
There is also the B-Tree with prefix compression, which is particularly relevant if you need to store long strings as keys in the B-Tree. Once a leaf page is overflowing after a key insert, simply split it to create two leaf pages. Promote a copy of the first key from the right new page up one level, but only take as many characters from that key as are necessary to differentiate it from the last key in the left page. Do this only when a leaf page needs to be split. This approach will help save space in the upper pages up to the root. Additionally, keep in mind that if you feed a B-Tree with already sorted keys, you will end up with half-filled pages, wasting space. In this situation, the point where you split the pages will give you a choice on how to balance the B-Tree effectively.
I consider it to stand for "Bayer," its inventor, but it could be something else. Bayer didn't say. It definitely is not "Binary," though. Euler did not call "e" Euler's number, but everyone else does.
I prefer an offset tree. Say you have a buffer or file full of nodes. Every node has a fixed offset from the head. As long as you know the head's pointer or file id you can get any node by it's offset. This causes the offset to work as a node ID which is useful for fast record access. Sticking to only offsets/indices can create an issue around adding/removing nodes at random places. This is fixed by building 2 pairs of "next" & "previous" members into the nodes. One pair is for tracking removed nodes, the other pair is for tracking assigned nodes. In code the node access would look something like this: next = head + node.assigned.nextOffset or prev = head + node.assigned.prevOffset You can attach the pairs directly to the node or via a separate list - in which case you use the determine the offset of the data by the offset of the node like this: index = (node - nodeList) / sizeof(node) data = dataList + (index * sizeof(data)) An example of where you'd want to do the latta is text. You'd want to perform common string operations on the text such as finding a sequence of characters. There're ready made functions for that sort of thing but they assume continuous data, not nodes. Identifying the node from the data can also be done easily by a similar process: index = (data - dataList) / sizeof(data) node = nodeList + (index * sizeof(node)) This gives the best of all worlds. Fast adding/removing of nodes via linked list format, fast node lookup via indices/offsets, easy to perform common operations via preexisting functions. Sorting can be fast too simply by making a list of ID & integer value pairs to be sorted by integer value. Then simply going through the actual list and 1 by 1 swapping the existing data with the data that should be there. For a file the swapping would look something like this: read( nodeList, offsetA, nodeA, sizeof(node) ) read( nodeList, offsetB, nodeB, sizeof(node) ) read( dataList, offsetC, dataA, sizeof(data) ) read( dataList, offsetD, dataB, sizeof(data) ) write( nodeList, offsetA, nodeB, sizeof(node) ) write( nodeList, offsetB, nodeA, sizeof(node) ) write( dataList, offsetC, dataB, sizeof(data) ) write( dataList, offsetD, dataA, sizeof(data) ) The integer values is a case by case thing that can be calculated upon creating the data so the sortable integer list can just be kept as a 3rd list along side the node list and the data list. I've never had a reason to use binary trees as of yet.
I really like how the little guys turn their heads to look at the thing you're talking about :)
They are very cute and innocent.
He does this in every video, Sherlock.
@@Scrolte6174not everyone watches what you watch, Sherlock
@@Scrolte6174 what does that have to do with anything? even if they do it in every video you can be appreciative of it?
Having completed data structures and algorithms at an ABET accredited institution, I nod my head knowingly at this video.
Beautiful explanation. I love the animations. As someone who's implemented a few persistent B-trees, allow me to point out a few more details:
* This is the classic original B-tree. However, most databases use a B+tree, which is different in that the values are stored only in the leaves; keys in upper nodes just point to lower nodes. When a node splits, you don’t move the middle value up, it stays in one leaf or the other.
* B-trees I’ve looked at, like SQLite, don’t have a fixed number of keys in a node. In real usage, keys and/or values are variable size, like strings, and the nodes are fixed-size disk pages (often 4kb.) The number of keys or values that fit in a node is highly variable. So instead you keep adding to a node until its size in *bytes* overflows a page, and then split. Some nodes might have a hundred keys, some might have only four. It doesn’t matter; the algorithms still work.
Thank you! Very interesting!
Learning something new everyday! Thanks for the info
Yes, I was thinking about the latter point while I watched: "How do you decide on the number of keys per node? I guess you size it to just about fit into available memory, in order to minimize the number of expensive database queries." If the keys were of consistent size, you could just divide the memory you want to allocate by that size, but in most contexts I can see how it would be more practical to divide keys by size rather than by count.
Do you get problems if e.g. you have adjacent large keys? Would that result in "wasted" space in a node?
That makes sense, thanks!
Quite useful comment to be honest. Real life scenarios are super useful in pair with the academic explanation!
Such a simple but beautiful animation! So many channels do such complex animations but they do not realise simple animations can be so beautiful.
Has a Wall-E kind of feeling 😊
I found it confusing that the nodes left undimmed were the ones we were supposed to pay attention to. Didnt you?
@@SantiagoArizti no, but I found it confusing that he would undim parts of the tree as he said the sentence rather than just undimming the whole section together
Lol that's actually my first time hearing about a b tree, looks awesome and elegant and simple
Elegant indeed but far more complex than a binary tree. In this case you're trading simplicity for faster search time.
@@thacium Some say that a special case of b-tree is equivalent to the red-black tree, another strategy which is, however, a binary search tree as well.
Knock on wood before your up at 2 AM on a Red Bull and adderall fueled binge trying to figure out how to shave off 1 ms of process time.
Ain't nowhere near simple but magnificent for sure
I have an exam on algorithms and data structures in 3 days and this video manages to break down hours of lectures into 12 minutes... Incredibly helpful
I didn't understand B-trees from university classes but now I do. 3 days before exam. Thank you!
how did it go?
@@vastabyss6496 I passed it but I only got a kettes/elégséges so I'm going to attend the repair exam
Fed up with my living room being a mess, I decided to watch this. Oddly think it's going to help
Keep the mess in a b-tree?
use c trees
Well you're gonna need to sort it in an ascended or descended order first
@@azizayari252 What for? That happens automatically when putting it in the tree.
Christmas?a@@Y2Kvids
Just saying thank you from behalf of the community for those amazing visualization teaching vids and for the quality you put in them
Wow! Great explanation!
Basically the same as how we learned it 45 years ago, but with these animations it takes much less time. Back than, the professor was running around with chalk to change the diagrams on the chalkboard ...
OG programmer
All respect
So good, the easiest explanation for such a common data structure I've seen. It's crazy how we teach all these other trees in computer science classes but leave out the most used one.
I especially appreciated the performance advantages against binary trees being explained.
I agree; it's a good example of how different measures of efficiency lead to different solutions.
You're awesome bro, I couldn't understand this concept from 1 week of lectures, but you just explained it in 10 minutes.
This channel has been a massive help for me in understanding the concepts in my algorithms class well enough to actually pass the assignments.
MIT opencourseware also do a great series on algorithms plus the professor is SUPER CUTE he's the dude into origami
Very elegant explanation and animation, you cleared up my misunderstandings from when I just read about b-trees. Your little blob dudes are great communicators!!
"B-trees" please. Capitalized.
Man, I remember having balancing B-Trees as assignments in my CS exams and failing to get the right answers. This video does a much better job of explaining it than any class I took in university in a fraction of the time we spent learning it. Thanks for finally getting me to understand👍
This is a great video. Normally, databases do not store data in internal nodes, only leaf nodes and function as intermediate pointers. Leaf nodes on the other hand contain the actual data entries or pointers to those data entries. Additionally, most databases have pointers to neighboring leaf nodes. By the way, these pointers to neighboring leaf nodes help lot with solving concurrent operations on the B-Tree.
yeah those are known as B+ Trees
A different approach to concurrency is copy-on-write, where instead of changing a node in place you make a copy of it with the changes. This means changes don’t affect any concurrent readers since nodes are immutable. However, any nodes that point to the modified node also have to be updated to point to the new one, which means a change spreads up to the root node.
(This may sound wasteful, but in practice you can modify new nodes in place because no one else can see them. So a node only has to be copied once during any transaction.)
In this type of tree you can’t have links between siblings because you’d end up having to copy all the siblings too. Instead, when you’re going down the tree you keep a breadcrumb trail, your path, and to get to a sibling you look up, move one child, and go back down.
This type of tree is used by CouchDB and LMDB, among others.
when do you get concurrent operations on the same DB? How is it not queued?
@@nikilragav There is a paper titled "Efficient locking for concurrent operations on B-trees" by Bing Yao & Philip Lehman that was the break through
@@esra_erimez Thanks. Is the idea that you have multiple threads accessing the same db? No queue?
I more or less knew how B-trees work already, but this was just such a neat refresher that I now want to implement it (maybe together with a couple of formal verification proofs, to show that all these properties are always maintained)
Awesome explanation. I always had the feeling that I almost understand how B-trees work, but I wasn't quite there yet. This video showed me the things that I was missing. Thank you!
I've just started the CS50 AI course as background learning at work, and imagine my surprise to hear this voice again. I really enjoy your teaching style, and I'm so glad to have found more content from you!
There's an additional bonus -- nodes tend to be able to fill in some special size, like a disk block or a cache line, which allows further efficiency when doing comparisons for a search. That's why for some applications it's objectively better than even the red-black tree.
This is a REALLY good series and I wish Spanning Tree would post more content.
They have a fantastic layout, information is conveyed well, and it's thorough but not so technical that people don't understand it.
Please make more content. It's hard to find solid video series like this that explain important programming concepts.
10/10
Ah, I always struggled with B-trees during my Database Design class. If I had this video, I wouldn't have struggled at all! Fantastic work, very informative.
Awesome video! Just one clarification: database systems typically use B+-trees rather than B-trees to allow for ISAM (range search on leave nodes).
This is literally the best data structure explanation video I’ve ever seen
Absolutely Brilliant Explanation and Animation. I had never encountered B-Trees and became interested in them because they are not in A-level Computer Science Exam Boards. This video taught me B-trees in just 12 minutes and encouraged me to build my own in C++.Thanks!
I'm listening to "designing data intensive applications", and wasn't quite able to visualize a b-tree through audiobook only. This really cleared everything up. Thanks!
Bravo for traversing the complexity into simplicity 🎉
This is the best visual learning channel for CS on UA-cam
Fantastic illustration for one of the most complex topic in computer science education. I wish I get to learn this video when I was at university .
I love your videos. You somehow always manage to hit the right amount of details. Great job!
you published this at the exact right time I have a final exam today and this will be on it and I needed to study up on it.
I've been a professional developer for 17 years now. A really good one. I always knew that BTrees are used to store database indexes and that they reduce search complexity to logarithm. I knew how binary trees and AVL trees work. I always wondered how BTrees work. I was too lazy to do the reading. Now I know. Thank you.
Something similar happened to me, I knew the tool and used it but i didn't fully grasp it. Now that I'm making my own SQLite in Rust is really helpful!
Unless you’re writing a DBMS, it’s super unlikely you need to know how b-trees work
So well animated, so simply and informatively explained. Thank you!
Ha! Thanks for this. I remember coding b trees waaaay back in the early 80s. Gawd I'm old.
Which language do you used to code in 80s?
_"early 80s"_ OK boomer! I'm approaching my "early 80's" and I remember coding stuff waaaay waaaay back in the "early 70's" on IBM punch cards.
@@crosswalker45 vb, ada , assembler (for 6809), vulcan then dbase and clipper, kman, pascal... likely a few others i am too to recall.
@@docjoesweeney I always wonder how people in 80s, used to write the code. I'm assuming there won't be any proper documentation or persistent internet connection for that matter.
@@crosswalker45 dig up old editions of Dr Dobbs magazine. I am sure someone would have scanned them for prosperity.
Proud to say that I am following this channel since when it has less than 100K subscribers.
Dude, I just found it yesterday! It's a great channel and deserves the growth.
Thank you a lot for the beautiful explanation of this topic! I always encounter the question about indexing algorithms and used data structures in databases for a data engineer position. So this video helps a lot in understanding b-trees. Thanks again!
is the first time i subscribe to a channel after i saw 5 seconds and skipped a couple of minutes after other 3 seconds. All your effort needs to be encouraged
The best video out in the internet for B Tree!! Thank you so much!!
Excellent video. Well done on articulating this in such an easy to understand way.
This is a great explainer video. Very easy to follow and I love the little robot guys.
You may have just saved my exam. Incredible video
i wish i could have a teacher that utilizes simple animations like these in class, the visualisation makes it so clear
This helped me a lot! The Animations are really well done and help explaining the thing you're talking about. This is how explanatory videos should be done!
Amazing video and explanation.
I was looking to implement btrees a while back and all the literature on it were conflicting and varying. I like how you handled all the variances subtely.
This is a great video, the definitive one on btrees for sure. Cheers.
The term is "B-tree."
It's really an awesome way of explanation ruling out drawing the trees and explaining...no words to appreciate...thank you for making such wonderful videos...great work
Thank you for directly going into the subject.
Best explanation of btrees I’ve ever seen
With your explanation, it looks really simple and elegant. Thanks a lot!
every other video explaining B trees must be deleted to save space and time. This is perfect explanation.
That video was so well made that even I could understand. Code this algorithm is another story, though.
awesome, never understood this tree until now. Thanks a lot :)
You're a legend bro... Hope u live 100 more years
Stumbled upon this video by accident and I knew I heard that voice before 😂Nice to know that you have your own UA-cam channel now, Brian!
Why didn't I find this great channel with unique presentation earlier?
Thank you man , it’s my first time to understand this
Very good explanation!! Congrats!! Keep it going!
Phenomenal teaching! please do more computer programming related concepts, love this!
Really well done. Wish I had access to content like this when I went through school
I have no words, just amazing!
thank you sir for most wonderful explanation for b trees i have ever seen i well share ur vid to friends it deserves to be seen
Beautiful animations with a very clean explanation! Thank you sir!
Important timestamps.
**Insertion Scenarios:**
5:08 - Basic Insertion
6:20 - Another Basic Insertion with Root partially full
6:50 - Insertion with full root
**Deletion Scenarios:**
7:48 - Basic Deletion
8:15 - Deletion of Key in Node w/Min Count (take sibling key - it becomes new separator - old separator is moved to original node where deletion occurred)
9:42 - Deletion of Key when Sibling Nodes are also at Minimum Count (merge sibling and separator together into single full node)
10:10 - Same as above, but causing Parent node to fall below Minimum Count as well
10:28 - Deletion of key from middle of tree (new separator needed - either take largest val in left subtree, or smallest val in right subtree - may need to take key from sibling or merge 2 nodes together if it causes child node to fall below min key count)
Brilliant video! I had no idea that it was so closely related to binary trees.
the best channel i've ever visited...all you need is some promotion..and then lets see where you will be!!All the best!!
im not joking when i say, i was literally looking into making my own db 2 days ago, but then the optimization algorithms discouraged me because it looked so complex. however, this video was so easy to grasp that i might just end up committing to the project
Curious. What is your motivation to create your own DB? If you are creating DB for learning or teaching, then it is fine.There are already many open source databases, that are quite good. If you are creating your own DB, you will also need to consider caching, and a lot more features.
@@MK-lh3xd just to learn stuff tbh. In my serious projects i use postgres
Best B-Tree explanation ever!
the expression was really simple and understandable thank you!
2:40 Searching
4:00 B-Tree Rules
4:36 Insertion
7:50 Deletion
Thank you so much, i have an exam that includes b-trees, but ALL the explanations are plain text with a lot of nonsensical expressions and variables. This is so simple and straightforward compared to that c:
Your videos are awesome. Thanks, Brian!
Great explanation. I really left feeling like I understood.
Andy is happy. Also "spanning tree" (protocol) is my friend in my role as systems administrator
Awsome expl🙏🙏🙏
For me personaly at first was confusing that you dim not related part, I would intuitively expect get brighter the part where change was happening instead of dimming the rest..
Great work, thx for expaining 🙏
Dude, these animations are sick.
Beautiful animations and explanation. Really really thanks for this
the way you teach is magnificent. keep it up buddy
Nice! It would be interesting to see an explanation on red-black trees
The "monitos" (cute animations) got me engaged to watching, among the interesting information presented/narrated.
Great video!🎉😊
Great video. I will probably implement B-tree in GO, as I begin to study this language.
Fantastic! I’d love a video on LSM trees and SSTables next!
Nice and easy explanation. Great job !!
Animation is amazing!
absolutely beautiful explanation
Very great video! Thank you for sharing, I love the style
I bet, if this video was available while I was doing my bachelor's in CS back then in 2009, I would have got two Gold Medals instead of one. Life was so hard at that time, black & white text was so boring and everything had to be imagined, and I often fancied myself no less than Einstein himself performing some thought experiments on DS. I remember, when I took hold of basic DS, I started tutoring my friends in exchange for fast food in our favourite restaurants. Fondly enough, even my workstation PC today is named after my favourite Physicist, Einstein, just because of how I mastered DS and that paved my way to my role as a Backend Architect.
2:36 When designing a b-tree, how does the programmer determine what the maximum amount of keys in a node should be? For which applications is it appropriate to set the limit to 2, and which applications should have a limit of hundreds? There's a happy medium somewhere, but how is that happy medium calculated?
It depends on your application. For example for a database system I'd go with a node size that can fit in L1 cache of the processor. Or maybe use the filesystem block size? And then delete the size by the size of an element to figure out the number of elements in a block. And then test, modify and repeat.
The best way is to simply try them and see what works best.
In practice comparisons tend to be very cheap compared to main memory access, combine that with the fact that the processor loads memory in lines of cache (64 bytes) you want to pick a node size at least 64 bytes large. The sequential prefetcher makes 128 bytes a good option too
There are a variety of locality effects that can come into play. tactic00l mentioned cache line size which is a good minimum for in-memory b-trees. You might also size them to fit nicely into a block on disk, so you can make more comparisons for each disk read operation. There are subtler impacts from prefetch logic and alignment that might be worth measuring, depending on your microarchitecture or allocator respectively. some considerations might not be just about latency of accessing memory; depending on your concurrency model, likelihood of paging, data structures you are storing as values etc you might find reason to increase or decrease the number of keys stored in a node.
It's "B-tree." The point of a B-tree is to reduce accesses to secondary storage. If your B-tree resides on disk, then the node certainly should be as large as can fit in a sector, because you don't want to be reading disk space that you don't use. Whether it should be two sectors, or four, or something else, depands on the characteristics of the device and your judgment about what will work well. If your entire B-tree will reside in RAM, then you might as well use a 2-3 tree, which is the minimum B-tree.
Surer clear and simple explanation! Thank you very much for your time and effort!
This is so simple it's almost magical!
Your homework assignment: write a b tree in python that handles inserts and deletions and unit tests that validate it works correctly
I had to make a goddamn Red Black tree in C without any external library for a single almost worthless point in university. I didn't know a data structure could have that many bugs.
That thing python b tree doesn't scare me.
@@lorenzosotroppofigo1641damn, these rotations are evil 😂
run redundant calc and compare?
Early in your development you must study machine code, assembly language, if you want to assess the actual processor cost of architecture choices, such as designing search trees. This is an excellent presentation, but in reality there are many essential layers below what is discussed here.
Thanks ! From Université Paris Dauphine students ! Luc, Silva, Faudel, Barbara, Diao, Adlene and Djena
That's the great explanation I have ever seen. Thank you so much!
Much better then any course I took on the subject 🎉
amazing video. wish we could have this for litterally everything.
There is also the B-Tree with prefix compression, which is particularly relevant if you need to store long strings as keys in the B-Tree. Once a leaf page is overflowing after a key insert, simply split it to create two leaf pages. Promote a copy of the first key from the right new page up one level, but only take as many characters from that key as are necessary to differentiate it from the last key in the left page. Do this only when a leaf page needs to be split. This approach will help save space in the upper pages up to the root.
Additionally, keep in mind that if you feed a B-Tree with already sorted keys, you will end up with half-filled pages, wasting space. In this situation, the point where you split the pages will give you a choice on how to balance the B-Tree effectively.
So the 'B' in 'B tree' DOES NOT stand for 'Binary'. Definately a chance for confusion there!
I name it Beatrice 😂
It stands for balanced tree, then there's also b+ trees that are just advanced version of it
I consider it to stand for "Bayer," its inventor, but it could be something else. Bayer didn't say. It definitely is not "Binary," though. Euler did not call "e" Euler's number, but everyone else does.
@@heyythereIt doesn’t stand for balance or binary but it is a type of binary
@@heyytherethere's actually no clear meaning
I prefer an offset tree. Say you have a buffer or file full of nodes. Every node has a fixed offset from the head. As long as you know the head's pointer or file id you can get any node by it's offset. This causes the offset to work as a node ID which is useful for fast record access.
Sticking to only offsets/indices can create an issue around adding/removing nodes at random places. This is fixed by building 2 pairs of "next" & "previous" members into the nodes. One pair is for tracking removed nodes, the other pair is for tracking assigned nodes. In code the node access would look something like this:
next = head + node.assigned.nextOffset
or
prev = head + node.assigned.prevOffset
You can attach the pairs directly to the node or via a separate list - in which case you use the determine the offset of the data by the offset of the node like this:
index = (node - nodeList) / sizeof(node)
data = dataList + (index * sizeof(data))
An example of where you'd want to do the latta is text. You'd want to perform common string operations on the text such as finding a sequence of characters. There're ready made functions for that sort of thing but they assume continuous data, not nodes. Identifying the node from the data can also be done easily by a similar process:
index = (data - dataList) / sizeof(data)
node = nodeList + (index * sizeof(node))
This gives the best of all worlds. Fast adding/removing of nodes via linked list format, fast node lookup via indices/offsets, easy to perform common operations via preexisting functions. Sorting can be fast too simply by making a list of ID & integer value pairs to be sorted by integer value. Then simply going through the actual list and 1 by 1 swapping the existing data with the data that should be there. For a file the swapping would look something like this:
read( nodeList, offsetA, nodeA, sizeof(node) )
read( nodeList, offsetB, nodeB, sizeof(node) )
read( dataList, offsetC, dataA, sizeof(data) )
read( dataList, offsetD, dataB, sizeof(data) )
write( nodeList, offsetA, nodeB, sizeof(node) )
write( nodeList, offsetB, nodeA, sizeof(node) )
write( dataList, offsetC, dataB, sizeof(data) )
write( dataList, offsetD, dataA, sizeof(data) )
The integer values is a case by case thing that can be calculated upon creating the data so the sortable integer list can just be kept as a 3rd list along side the node list and the data list. I've never had a reason to use binary trees as of yet.
Great video please continue making these
Great video! You made it very understadable.
Please keep uploading more content like this