I argue by making us visualize the graph he cemented the ideas further as instead of just looking at the graph for a moment we had to store it in our heads based on his descriptions and that increases the retention.
lol, tbh though, most comp sci assignments aren't about real world applications. They're more there for you to learn how a computer works. The programming experience is a plus.
Linked lists were developed in 1955-1956 by Allen Newell, Cliff Shaw and Herbert A. Simon at RAND Corporation as the primary data structure for their Information Processing Language.
Optimizing cache accesses and memory traversal patterns is by far the most important optimization you can make (beyond, of course, choose the lowest-complexity algorithm). And this is more true today than ever before. You should look into the latest research on "cache-oblivious data structures and algorithms". In my personal experience, reducing indirections, favoring memory locality, and making traversal patterns predictable will typically improve speed between 10x or 100x on a modern arch..
I have experienced this in real life while optimizing frame caches for video editing and tile sorting for tiled rendering. Switching to arrays made both operations immensely faster. For tile sorting, an operation that took half an hour was reduced to a few seconds.
Linked lists are NOT good at linear access. Theoretically they should be fine. However due to CPUs being orders of magnitude faster at processing memory that can reside in the cache, and doesn't need to fetched, linked lists fail to perform in a real world environment. Unless you only ever insert into a linked list it's probably never the right collection to use. I can't remember the last time I only ever randomly inserted into, but never read or iterated, data from a collection.
Yes, preallocating all the memory that your program needs at once should be the default way of programming. There are exceptions, but they need to be proven first.
One time, for a hashmap implementation (that i needed to be really fast, faster than the STL hashmap) i created a vector of linked lists as the buckets. HOWEVER, that was fairly slow to both create and delete and the data was spread out, so instead i packed all the linked lists into a single vector, where each element of the vecor is a node of the linked list (KeyHash,Key,Element,Next) And another vector, which keeps the beginning element for each bucket (this one is resized manually when the buckets become too full (>75%) and the other one is resized whenever it decides) This sped up the creation/deletion of the hashmap by a lot and (probably) reduced the cache misses due to all the elements being stored in 2 vectors (the largest that the hashmap got was around 150 million keys, the task was "determinization of finite automata", mathematically solving that task is of the order of 2^n, but i needed a fairly quick solution for a university project)
It's not a bad example to illustrate a largely fictitious dilemma. But, it's pitting two solutions of about a dozen against one another as if they are the only things going. But it does somewhat show that a lot of optimisation comes down to 'know your hardware'. What are the things to avoid with linked lists? -Apparently, having a system with a cache. (this is near universal today, but it was actually the exception, not the norm in the 80's) -Having to search the list for specific entries - Having a trivial amount of data per node, such that the link pointers themselves dominate the storage requirements. Obviously, searching is a very big issue in general. Of course, in terms of teaching things, linked lists have several important relatives, most notably the tree. A tree has a lot in common with a linked list on some level, but because the organisation is different, the result is also quite different. Again though, know your problem space, know your hardware.
Do *not* pay attention to this advice if you are developing for CPU(s) without cache. The linked list was invented in 1955-1956, when no CPU had cache. Without CPU cache, a vector shoving each of its elements after the insertion/deletion point over by one location is very costly, and the random memory access itself of the traversal of the linked list incurs no performance hit versus incremental access.
@@RetroDawn I will have to disagree with you on that, most CPUs (at least modern ones not sure about older ones) have many cache layers. Maybe the ones you work on don't but the majority of them certainly have cache
@@ayaanqui Smaller microcontrollers are cacheless, as are many but not all embedded controllers. I use everything from 9S08 (8 bit), 9S12 (16 bit), PIC16, PIC18, ARM, PowerPC. Only the ARM and two of the three PowerPCs I use have caches. There are actually three timing constraints. One is presence or absence of a instruction cache, another is the presence or absence of a data cache, other depends on the memory type. Smaller systems like embedded controllers that are run-in-place (code is in FLASH) often have static RAM so there is a small latency for page changes. Larger RAM has a precharge/rewrite cycle and crossing a boundary incurs a very large delay, often 70ns or more.
When you get to the point where you are starting to worry about insertion, deletion and search time, its time to use a b+tree which by their very nature are indexed.
That missing chart was too much of a tease. I had to repeat his experiment. See the charts and the source code here. marketmovers.blogspot.com/2016/04/how-to-make-c-code-run-90x-faster.html
Sorry, I know what you mean. That popup comes from a 3rd party library, and sometimes it goes crazy. I'll ask our web people if they can tweak the settings on that window.
Thanks for providing this. I want to point out two things. Firstly, the formatting on your page is messing up the code. It's inserted angled quotes where there should just be normal quotes. I'm not complaining, I just figured you'd want to know. Secondly, after seeing this code, I'm kinda dumbfounded. This one's not on you, so please don't feel insulted. Randomly iterating through a linked list and deleting the node you land on? Of course performance is going to be exponentially worse than an array! That's not what people use lists for! lol... Lists (hopefully) aren't used in contexts where you need to randomly access data. They're used when you have a sequence of data that you want to iterate through, applying some function on all entries of the list, one after the other, in order, not randomly. Stroustrup should have known better. Maybe he's just an old man, and with one of his slides missing, he was flustered and forgot to mention this detail. I just hope he doesn't believe this is in any way proof that vectors are always better than lists. Certainly, they often are. On certain CPUs, maybe they even always are. Computerphile made a decent video comparing the two datastructures: ua-cam.com/video/DyG9S9nAlUM/v-deo.html
Linked lists are great! Pointers are great too! When I got my first c++ compiler in the 1980s I said what should I write? My conclusion was to rewrite my c linked list code as a c++ linked list object. I have used that object uncountable times, but never once have I needed to keep a list of integers. Real life applications often involve managing data records of 1000 bytes or larger. Mr. Stroustrup was also correct in pointing out that a lot of linked lists aren't large. For example, a patient's chart does not typically exceed 1000 records. While I still try to program efficiently as if I only had 640K of memory, I don't worry about the overhead of pointers and their speed because today's computers are so fast and c++ is so efficient that everything done in memory takes an instant. It is displaying stuff, accessing files, and doing communications that slows things down.
>I don't worry about the overhead of pointers and their speed because today's computers are so fast and c++ is so efficient that everything done in memory takes an instant. 5 years later: software has become even slower and more buggier than ever before.
@@youtubeenjoyer1743 Microsoft's original DOS was like 100K or less if you deleted utilities. Now Windows is hundreds of megabytes maybe even gigabytes. You have to wonder the level of inefficiency and kludge they used to produce so much code to do so little, with a lot of bugs. A lot of the code is just alternate proprietary approaches to lock you into Windows. The bugs are really a management and not software problem. Today's computers and the Internet are so fast that the only way you can be slow is to build in slow technology.
So I'd just like to know if I understand this right, and I would appreciate if anyone would let me know if I got it wrong but: arrays should be used mainly structuring numbers that will soon be used to calculate with, and linked list should hold a list of elements such as a list of an array of numbers or an array of strings or some data in general and iterate through them to do said calculations. Thanks in advance for the clarifications.
The linked list advantage is, theoretically, on mid element inserting/removing because you iterate to the position, and for removing/inserting you just change 2 pointers. While on a contiguous memory (array/vector) to insert/remove an element mid position you iterate to it but you have to shift several elements to right/left: if you want to insert a new element on position 6 you have to shift to right all elements from 6 until the end of the array/vector. (That is, considering you want to preserve order ). The thing is, on the practice the pointer chasing penalty of linked lists destroy - on most cases - any advantage they have. On a contiguous memory data structure (vector/array) the processor will be very fast iterating doing: "get value, get value, get value" while on a heavy pointer structure it will be a lot slower on iterating doing: "get pointer, follow pointer, get value, get pointer, follow pointer, get value, get pointer, follow pointer, get value". This will be still worse because the processor has a small "super ultra fast memory" (the cache) where on the contiguous memory case it will do "get value, get value, HMM - I SEE! HE NEEDS THIS REGION OF MEMORY - LET ME MOVE IT TO THE CACHE!". While on linked lists each value is scattered across memory, with pointers pointing to distant parts of memory, and the processor will be "OMG - I CAN'T MAKE SENSE OF THIS" and will not be capable of using the cache. There's a great write-up about this, applied to games, here: gameprogrammingpatterns.com/data-locality.html Both this video, and that write-up is saying: Careful when using pointer-heavy data structures (like linked-lists) because they might have very bad performance due to pointer chasing.
This is the keynote for Going Native 2012, the full video can be found here: ua-cam.com/video/OB-bdWKwXsU/v-deo.html (the above part can be found at the 47 minute mark).
@@josephp.3341 It's a trade-off, of course, depending on data size etc. I myself prefer hash approaches: really don't like B-trees for two reasons. One, stuff has to get moved from place to place all the time, which is unnecessary data (& even IO) traffic. And two, the code is always a lot more complex, which leads to a much higher probability of bugs. I HATE complicated code...
People don't seem to understand that O(1) is not FASTER than O(n), it just SCALES better; there's a breaking point. Imagine a "hashcode as a service" rest call vs doing it in memory: it'd still scale better, but it'd take an awfully large 'n' to get there. I see people missing this in PRs, streaming/looping an array is often faster with small arrays than something like a `HashSet`. And in practice, its really unlikely either makes a tangible difference in most use cases.
Video title should be: Why you should avoid **Doubly** Linked Lists Or sub-titled: Stroustrup learns how L1 Cache usage is critical for performance sensitive code. "What every programmer should know about memory" * www.akkadia.org/drepper/cpumemory.pdf
Julien Cossette I was simply stating the **facts:** Stroustrup was not aware of how modern cache usage effects performance. He is not the only one who lacks _pragmatic_ knowledge about Data oriented Design: * research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf This is OOPs biggest weakness -- it does not scale to high performance due to programmers thinking about the uncommon case (one) instead of the common case (many). There was no ad hominem insinuate or implied.
Sorry i just perceived it that way. But you're right programmers are getting worst and worst at understanding the real impact on the hardware of whatever they are coding.
If we can make assumptions about maximum size then Ring buffer (array based) would be faster. Especially if it is power-of-2 sized, you only need a single AND operation to wrap the indices.
It has to do with temporal and spatial locality--if you are traversing sequential memory in a predictable pattern as in a vector's memory layout, the cache can prefetch large chunks of this memory and access it with little overhead. Meanwhile, a linked list operation involves following pointers to random locations with no locality. So Stroustrup is arguing that even if the forward shift for a middle-of-vector insertion or deletion may look bad from a time complexity standpoint, it's not that bad thanks to hardware optimizations that linked lists are less able to exploit.
The inventors of Simula; Specifically, Kristian Nygaard. He was also part of the inspiration for the original C With Classes that was Bjarne’s project preceding C++
If performance is your biggest concern, you say "hello, assembler." As a practical matter, I would expect both graphs to be linear (for inserting/deleting 1 element in a structure that has N) or quadratic (for inserting N elements starting from 0.) The algorithm on an array is dominated by moving over the elements (which comes to a linear operation.) And the algorithm on a linked list is dominated by the linear search (also a linear operation.)
Same but both will be slower and the difference less significant. But the list would load all your pointers to heap in fewer cache lines, so you get your indirections faster
See my other reply, if you are interested in my response. And "my way of thinking" about hardware limitations seems to be the general trend of thought for the last 50 years - else why all the effort to make bigger, faster machines? It's great to avoid indirection, sure. This is as true of my AMD64 as it was on my 8080A. But there are many tasks for which the indirection itself avoids many more cycles, such as inserting into an array as opposed to a list. Bjarney is ignoring the overall task.
I didn't fully watch the vid, but from stack overflow: In general, use vector when you don't care what type of sequential container that you're using, but if you're doing many insertions or erasures to and from anywhere in the container other than the end, you're going to want to use list. Or if you need random access, then you're going to want vector, not list. Other than that, there are naturally instances where you're going to need one or the other based on your application, but in general, those are good guidelines.
In practice 99.9% of the time you want a vector. Random access is a way more common case than random insertion. And even the cases that need random insertion usually end up needing std::map anyways.
For small collections (up to 5-10 elements) lists can do better since their structure (to be initialized) is very simple. If your language of choice makes use of that often then linked lists are a way to go. Besides that, it all depends on application. If random access is important use vectors. If sequential access is a fav then use lists.
Oh, these artificial examples with linked lists that only contain one integer. If you have a more typical data structure, the pointers are a much smaller percentage of the allocated memory. Also, the memory blocks in the heap are usually in large blocks, and are close together, so the caching is usually not that big of a factor. From a programming standpoint, having to shuffle things around in order to insert something in the middle is much more likely to have overruns because people won't test all the edge conditions. So, i am unconvinced by the arguments in this video.
For large types, the overhead is mitigated substantially, especially if they're cache-line aligned along with the list pointers. Also, serious uses of linked lists like in the Linux kernel back them with efficient allocators like free lists which tend to allocate the lion's share of nodes contiguously.
Also if you care about order, you adapt the list to a tree so finding the insertion/deletion point becomes a binary search. If you don't care you can usually use FIFO/LIFO. It is true though that modern processors are astonishingly fast at sequential memory accesses because of cache pre-fetching and should be able to pipeline most array processing, so initial implementation ought be simple to save developer time.
Performance-wise, I can see how a vector be better than a list. But I'm curious about what the comparison is memory-wise. Does a vector require far more memory during resizing, or is there some crazy trick that the runtime library uses to shift around data and reallocate in place? The latter seems unfeasible to me. I would imagine if I had a 2KB vector inside 4KB of ram, and then I tried to add 1 more element to vector, I would get an out of memory exception.
Apoorv Kumar memmove() is for shuffling elements over on insert & delete, but when the allocation for the vector needs to grow, there's no way around the fact that you have to make a new allocation & copy (or move) elements from the old allocation to the new one, and memory usage will spike up during that operation. With a list, you don't have to do this re-size operation, but on most machines the list is going to use an extra 16 bytes per stored element (in addition to the per-allocation overhead which comes from the memory allocator). Quite often the stored elements are 8 bytes in size, so the list is at least 3x the allocated memory on a continuous basis, whereas the vector occasionally spikes up to 2x the memory requirement. So vector wins on memory usage in the common case of machine-word-sized elements, even when we're talking about maximum memory use as opposed to continuous. Another thing to mention: when you use a vector, whenever possible you want to call reserve() when you begin an operation that will add a number of elements (even if you only have a rough upper-bound on that). Calling resize() actually helps code readability, in addition to helping performance by reducing the number of re-size operations.
To those saying he chose a terrible example, remember his stated point of the example. Lots of people see this problem and think "this is a good use case for linked lists". That's it.
They have good amortized performance. They have a performance penalty when they regrow, and usually use more space. Since they regrow at a factor varying from 1.6 to 2, that is: when they get full they allocate a new region 1.6-2.0x bigger, copy all elements to there, and free the old region. They do not have the "pointer chasing" problem because it's at-most one pointer, that points to a contiguous region. That is, on almost all implementations of queues/stacks - since they use an array/vector implementation. If you use a linked list as the backbone for a regrow-able stack/queue then it will have the same problems pointed on the video (too much pointer chasing while iterating the container).
Why not have a mixture of array and linked list by preallocating an array and when the array is full, allocate a new array and have a pointer to it? Could you get the best of both worlds?
Matthew Mitchell regrow-able stacks/queues/arrays work more or less like this. They allocate an amount of heap memory, and have a pointer/reference to it. When it's full they allocate a new, bigger, amount of heap memory, copy the contents of the old region to the new one, and change the pointer/reference to this new region. The linked list advantage is, theoretically, on mid element inserting/removing because you iterate to the position, and for removing/inserting you just change 2 pointers. While on a contiguous memory (array/vector) to insert/remove an element mid position you iterate to it but you have to shift several elements to right/left: if you want to insert a new element on position 6 you have to shift to right all elements from 6 until the end of the array/vector. (That is, considering you want to preserve order ). The thing is, on the practice the pointer chasing penalty of linked lists destroy - on most cases - any advantage they have. There are several good data structures that have better performance than both vectors and linked lists, depending on your use case. A heap, normally implemented using an array/vector, has O(logN) for inserting/removing, and can be used if you have logically ordered elements.
Linear search to insert on a linked list? Well there is your issue. I sped something up going from a vector to a linked list. Insertion sort (even using a binary search) works until the data becomes too large. Then you use merge sort.
okay, i know i'm a little late here....but can someone tell me that does in reality this hold true for arrays too? i.e. does array implementation will be faster than list?
Yes, but arrays cannot grow dynamically so the whole point with insertions and removals is pointless. Everything you do with a raw array is in constant time; Well searching aside, which can be done exactly the same as a vector; Which uses an array behind the scenes anyway. It’s basically a resizable array
+Arun Prasath Elements in a linked list are scattered through memory, and for accessing the nTh element in a linked list you have to follow N pointers. While elements in an array (or vector) are contiguously in memory, and you do not need to keep following pointers. You can access them directly. Due to cache memory, an array (or vector) is a lot more efficient since when you access one element the entire array/vector will be put on the cache. While on linked lists this can't happen, since the elements are not together in a contiguously memory space. Note that a linked list is different from the List abstract data type. A List is just an abstraction for a sequence of elements, which you can use a fixed size array for that, or a auto-growable array (vector), or you can link the elements with pointers (a linked list). Python List, Java ArrayList, and C++ Vector are all the same data structure: an auto-growable array. Which is a great data structure. Linked lists on the other hand, have very niche uses (but still have its uses) and most of the times are better substituted by arrays/vectors.
+Arun Prasath You should use a tree. An (a, b)-tree (with a big b if you're mainly inserting) would be good, here. But if you have to implement it yourself and don't want to put a lot of time into it, just use a binary tree. Trees have insertion ∈ O(log(n)) where n is the number of elements you already inserted. His stupid approaches both are ∈ O(n) (much, much, much worse) with a linear factor between them. Btw.: He claimed you needed a doubly linked list to do the insertion in an arbitrary place, which of course is plain wrong.
zerphase So you still have to look up the address of the next block? What happens when something is deleted. A better solution to your problem would be an unbounded array.
+Christoph Michelbach you shouldn't need to. It's a stack allocator. Can delete in order of being placed on or roll back to the begging of the stack. Personally, I'm used to doing game programming and keeping as close as possible to C for performance and compile times.
Thought-provoking video and comments. Very edifying for me, still learning the ropes with data structures, but savvy enough with performance analysis & architecture to follow and judge the arguments (both in the video and in the comments) fairly & profitably. Especially worthwhile (for me at least) are a few sage comments on how stl gets stuff done.
Just when I thaught I found the Holy Grale by using linked lists and its child structures, this video popped up on the screen :) Most of the tutorial sides for teaching beginners like me the basics of C++ apparently get this point wrong.
Basically, Stroustrup is saying: If you use Linked Lists with a Schlemiel the Painter algorithm then they suck. Well, he is right about that. But the argument is a strawman. You don't have to used Linked Lists like that and you shouldn't. He fails to prove that Linked Lists are bad, he only proves that certain ways of using them are bad.
But a cache miss is still a linear operation. The linked list algorithm should still not go exponential. Strictly speaking, a cache miss IS a linear memory access. The processor misses the cache so it accesses main memory. 50 to 100 times seems a little high unless you think it needs to access the page directory and page table every time. But disabling the cache outright should only cause a factor of four slowdown.
The linked list has to be the most overly hyped data structure in the history of computer science! Linked lists are WORTHLESS! You can do *everything* that you need to do by using a vector/array.
Sure (i hope you are being sarcastic)! Even if you are serious about it, that's nothing wrong with Visual Basic (6 or .net)? I have done Visual basic 6 in combination with C/C++ (making the DLL), where VB6 calling those C declaration in high level. Not much difference in speed since binary dll doing the hardyard. My point is you won't get the performance like C/C++ with other languages except Assembly, especially if there are huge floating point data e.g. Some high order differential equation
If we have very large objects stored and they frequently have writes in the middle and need sorting frequently then linkedList will perform better than vector because in vector we need to physically move/copy the elements and in this case size of the objects will become a constraint. This is not a problem in linked list. Swapping two large resource owning objects in linkedlist we can simply swap them :) unlike in vector where we need to store them contiguously and hence will need a costly copy/move operation.
This is a very interesting video, not because of his conclusion, but because this CS guru has chosen a misleading example to falsely make his point. For this example, both structures take Theta(n^2) time. No matter his claim, this is NOT an example where Linked Lists should thrive. It has indexed access, which kill LinkedLists. This example might show that the Vector has much better constants, and that for examples where they both have the same asymptotic runtime, Vectors will do better, but there are certainly simple examples where Vectors will take Theta(n^2) while LinkedLists will take Theta(n). For those, you probably don't need google sized lists to see LinkedLists do better. For more in-depth commentary, see: ua-cam.com/video/K8H4LGWB5vQ/v-deo.htmlm40s
+Algorithms with Attitude yes, you're right! Nice Algorithms videos in your channel by the way. Always cool to see more Algorithms & Data Structures material freely available.
+Algorithms with Attitude your video is more theory than actuality. In theory it's true that a LinkedList is better for insertion and deletions than an ArrayList but this is not the case in reality heres why. 1 cycle to read a register 4 cycles to reach to L1 cache 10 cycles to reach L2 cache 75 cycles to reach L3 cache and hundreds of cycles to reach main memory with that in mind lets continue, in order the avoid the cost of fetching data from main memory i.e avoid cache misses the hardware tries to hide the main-memory latency via a number of techniques. Basically three major bets are taken on memory access patterns: Temporal: Memory accessed recently will likely be required again soon. Spatial: Adjacent memory is likely to be required soon. Striding: Memory access is likely to follow a predictable pattern. An arrayList's datastructure has a one to one mapping with hardware memory which means it benefits a lot from the above memory access patterns above hence less caches misses. on accessing the first element of the arraylist the rest will be prefetched into the cpu's caches so cpu cycles are spent more on performing the operations required for inserting and deleting data rather than fetching data from memory, LinkedList on the other hand don’t fair so well as it's implemented using pointers which doesn’t take advantage of the memory access patterns stated above as a result there is a higher probability for cache misses using a LinkedList so a lot of cpu cycles are wasted trying to retrieve data for the next operation.
+Josiah Ebhomenye I understand this, but it can really be summarized as "The constants hidden by asymptotic notation are much worse for linked lists than for arrayLists." I have no problems with that. My problem is that he implies (everything but outright states) that for his given example, linked lists are asymptotically faster, when for this example, they are not. Saying that maybe for examples big enough that google would care about them strongly implies that for big enough N, asymptotic behavior would matter, but even for the large N he shows, it doesn't. That is really misleading because, for this example, both structures take order N^2 time, and nobody will argue that N^2 with big constant will beat N^2 with a small constant, if only N were larger. He has a great point to make, and this is only one small part of a much longer talk. In the course of a talk, it is really hard not to slip up on a sentence or two. The example is nice, it shows the difference in performance, it is all good, except the part when he implies that this example is asymptotically better. That leaves the viewer with the impression that linked lists are always worse, unless you have an absurd number of items, when this example doesn't show that. (He knows this. It is poorly worded sentence or two in a talk. It happens.) With 100K+ views, many by new students who won't think it all the way through, I thought I would point out the problem with those sentences. Walking away with "ArrayLists have much better constants" is good. Walking away with "never use linked lists" is not justified by this example. That isn't a controversial statement, Stroustrup would agree with it, and in fact does: isocpp.org/blog/2014/06/stroustrup-lists shows his more organized thoughts on it. (I didn't see this until after I had made my video and post.)
Scias In no way did I intend to become "the great defender of linked lists", and I do understand that they are perhaps best used just as a teaching instrument during an introduction to data structures. But of course you can come up with uses when linked lists are better than balanced trees too, such as for a stack, queue, or deque. (I also understand that for those 3 cases, array based approaches can also do everything in O(1), with a much better constant than linked lists.) Given a choice of one, a balanced tree is more useful than a linked list, but it is easier to come up with cases when a linked list is better than a balanced tree than cases when it is better than an array based list. And of course, they are much simpler to code and understand than a balanced tree structure.
If that was true, why is Google, Facebook, Microsoft, and other very large and successful companies investing so heavily in C++? (and in fact, sending their employees to GoingNative 2012 to listen to Bjarne Stroustrup?)
Pointers => Cache misses, so you get bad performance traversing your objects anyway. Large data structures then you aren't using numberchrunching anway.
problem here is that Stroustup completly ignores the problem/cost of memory allocation. sure you can push items around in an array really fast as long as you can pre-allocate a big enough linear chunk of memory in advance to hold that array. But when you can't predict the size in advance and you find yourself needing to expand the array it can get hugely expensive. Other problem is assumption that linear access is the norm. I seldom need to do that usually want specific list item via ptr
Uhh, preallocation is a solved problem. std::vector simply doubles the allocation size every time it runs out. And, yeah, it recopies everything - but this is still faster than whatever list does because it's a linear access (=the prefetcher does a great job) and it can use SSE operations (=16 bytes per instruction) and read and write on the same cycle.
This is a really bad example. (For large values of N) neither vectors nor linked-lists are a good choice for the scenario he proposes. Not to mention that the curve he describes as "exponential" is certainly only quadratic.
No, I got the point. My problem is that a large fraction of the programmers who watch this video (or went to this talk) will think "If I have a problem that requires random access including insertions and removals in a sequence then Bjarne told me that a vector is the best data structure to use, so I'll use that!"
Levi Burton His spacial locality argument fails. Say I have his linked list of 100,000 elements. I insert in the middle of it. As he says, I have to traverse to the middle, risking a lot of cache misses. Conversely, using his vector of 100,000 elements & a binary search, I'll hit less than 20 elements scattered over the list, getting a few cache misses. Then when I shove 50,000 elements over by 1, they'll *all* get cache misses. Either way, i'm looking at cache misses for (on average) 1/2 of the list for every insertion and deletion. And his vector requires read *and* write for all those elements. In any case, his whole argument is predicated on random access to the list/vector, which is not what linked lists are best for. They're great for a queue, and vectors are horrible for that. In short, use the right tool for the job.
jursamaj Actually, the number of cache misses for the vector is more like 50,000/L where L is the width of the cache line. Usually L is quite big, which is what makes the vector faster. But you're right; neither a linked lists nor a vector is the right tool for this job.
Wait... like what? Which languages are both higher level(or more convenient) and faster than C++? also, you know that Google uses C/C++ as their main programming language right? Until people actually start seriously to invest time into run time optimization(which is kinda harder to implement on C/C++/Java/C# than languages like Python/Ruby), C/C++ will continue to be the standard for speed.
The claim is deceptive. First it sounds like he is telling why you shouldn't use linked lists at all, but then he only really explains that you shouldn't use linked lists if you need to search random elements out of it. You know, there are situations where you only need to traverse through the list linearly without searching any particular element, because something has to be done to all of the elements, but along the way you might also want to remove some of the elements out of list smoothly.
Data locality and cache-friendliness also matters (a lot), which is the point of this video imho. When you access a position in an array, all values that sit in the same block are brought to the cache if they're not there already. If using a vector, these are the next values you would be looking at next in a linear search, so all you get are cache hits. When using a linked list they probably aren't though, and this means cache fails = bad performance. So linear search in a vector will usually be much faster than linear search in a linked list, due to data compactness. In his slide he says he didn´t have to use linear search for the vector example (of course he didn't have to, random access would have been much faster) but he did, just to be fair when comparing performance with the linked list.
graph expotential forms,Quadriatic forms in Linear Vector ,ot Tensor Algebraic Equations ,Can be deduced from basic Euler forms deriv from the Natural exponent e", and Pi !
oh im sorry. i cant hear you over my computer running ubuntu thats written in c++ with a game engine i wrote in C++ that is not a text rpg and the numerous programs i wrote with wxWidgets that are written in C++.
Can someone please explain how the memory requirements @4:47 were deduced. I am off by a factor of 2 for both the vector and linked lists (I get 0.8MB and 3.2MB). Ex for list: (64x4)/8 * 100,000 = 3.2 MB.
This talk only explains why linked lists suck at handling random access when you have enough continuous free memory to allocate an array of the right size. Random access memory is designed for fast random access using pointers and the fastest general purpose structure in that environment is the array. So anything that can be done with an array should be done with an array. All other structures (lists and trees) are performance trading fallback solutions to get around situations where it's impossible to use an array. Mainly when lots of allocations and deallocations have happened and memory is so heavily fragmented that you can't find a single continuous free memory block large enough to hold your array.
Memory fragmentation is not really an issue anymore. The cpu can reorder all the 4kb pages in any order necessary (and use fixed-size buckets for smaller allocations), so it can always build a contiguous block of memory to map your array (unless you're doing no-MMU embedded stuff, or you're doing 32bit builds instead of 64bit and you exhaust the address space).
As someone who uses python a LOT. Whenever he says list, I am thinking about array..... so an array is just as good as another array. Trolling aside, this is quite a nice talk.
You haven't done any programming required a combination of high and low level programming right? E.g. device driver, graphical scientific modelling etc etc?
Of course i do. Also it sounds like i was asking a question, it could be responding to someone's comment. I must say i forget what the comment is about. That comment may have been gone, it is 7 years ago. It is more than 2000 days+. I reckon it could be one of these C and C++ comparison comment.
@@gerjaison Not being able reply to a comment stopped in 2009 or something - not 2013. Sooo... this comment is most likely directed at the creator of C++ xDDD PS: "I reckon it could be one of these C and C++ comparison comment." - dafuq? Stop doing drugs man
@@47Mortuus No, simple as that. I wasn't directing at the video. I was asking a question, and that was directing to someone. Whatever it was, I can't remember. But looking at my comment again, i am certain i was asking on whether that particular person has high level programming experience like graphics, and low level programming experience like device driver development etc. Which is nothing related to this video. It's got conversational tone. I have another comment here about using DLL in VB6. That was definitely directing to someone. Read below, and that wasn't linking to anybody.
"Imagine this is a graph"
He handled that well! I felt badly for him. Good talk.
Nevertheless, it would have been a much better talk if he hadn't accidentally deleted his slide.
I argue by making us visualize the graph he cemented the ideas further as instead of just looking at the graph for a moment we had to store it in our heads based on his descriptions and that increases the retention.
Does anyone have the graph which he was trying to present? Just to confirm whether I got it right.
you can find it here : www.stroustrup.com/Software-for-infrastructure.pdf
lol, tbh though, most comp sci assignments aren't about real world applications. They're more there for you to learn how a computer works. The programming experience is a plus.
Love how I'm searching for videos trying to understand linked lists and here's a video of the creator himself saying not use them.. motivating lol
:))
Vishal Nishad sorry **creator of the language itself
I fell upon this same video, and I was wondering if I should share this on my class forum... slap in the fast to the Professor :D.
noobenstein, you have the IQ of an average gerbil.
Linked lists were developed in 1955-1956 by Allen Newell, Cliff Shaw and Herbert A. Simon at RAND Corporation as the primary data structure for their Information Processing Language.
Optimizing cache accesses and memory traversal patterns is by far the most important optimization you can make (beyond, of course, choose the lowest-complexity algorithm). And this is more true today than ever before. You should look into the latest research on "cache-oblivious data structures and algorithms". In my personal experience, reducing indirections, favoring memory locality, and making traversal patterns predictable will typically improve speed between 10x or 100x on a modern arch..
I have experienced this in real life while optimizing frame caches for video editing and tile sorting for tiled rendering. Switching to arrays made both operations immensely faster. For tile sorting, an operation that took half an hour was reduced to a few seconds.
Holy shit, that's something!
Vendicar Decarian What?
Linked lists are NOT good at linear access. Theoretically they should be fine. However due to CPUs being orders of magnitude faster at processing memory that can reside in the cache, and doesn't need to fetched, linked lists fail to perform in a real world environment. Unless you only ever insert into a linked list it's probably never the right collection to use. I can't remember the last time I only ever randomly inserted into, but never read or iterated, data from a collection.
@noobenstein Exactly. I use them as intended and they're great.
and that's why you use the right tool for the right job. all data structures have pros and cons
always allocate a 5 GB array, that way you don't have to use linked lists
What? Did you heard of vectors?
@@KP-md3oe ok
this is some big brain stuff right here
@GeklmintendontofAwesomewhat? How is it going to be used if its not allocated?
Yes, preallocating all the memory that your program needs at once should be the default way of programming. There are exceptions, but they need to be proven first.
When my C++ professor asks me why i didn't complete the project on linked list.
i'll show him this video
You should still know how to create them though.
@@valizeth4073 stfu
@@valizeth4073 Only idoits try to know how to create a linked list or red black tree. Useless data structures and should be avoided like plague.
@@coshvjicujmlqef6047 You don't have a clue what you're talking about kid.
@@NeilRoy I have destroyed all your rants with rational arguments. You are a loser sorry.
I just love how he explained that imaginary graph. 😂
One time, for a hashmap implementation (that i needed to be really fast, faster than the STL hashmap) i created a vector of linked lists as the buckets. HOWEVER, that was fairly slow to both create and delete and the data was spread out, so instead i packed all the linked lists into a single vector, where each element of the vecor is a node of the linked list (KeyHash,Key,Element,Next)
And another vector, which keeps the beginning element for each bucket (this one is resized manually when the buckets become too full (>75%) and the other one is resized whenever it decides)
This sped up the creation/deletion of the hashmap by a lot and (probably) reduced the cache misses due to all the elements being stored in 2 vectors
(the largest that the hashmap got was around 150 million keys, the task was "determinization of finite automata", mathematically solving that task is of the order of 2^n, but i needed a fairly quick solution for a university project)
Word graphs - I prefer them - I can choose my own colours
6:43 "True OO" = ArrayList in Java : )
This is from "GoingNative 2012 - Day 1 - C++11 style" at 0:45:00 if you want the full lecture. ua-cam.com/video/m0H5bUPfwn8/v-deo.html
Did he ever post that graph somewhere?
It's not a bad example to illustrate a largely fictitious dilemma. But, it's pitting two solutions of about a dozen against one another as if they are the only things going.
But it does somewhat show that a lot of optimisation comes down to 'know your hardware'.
What are the things to avoid with linked lists?
-Apparently, having a system with a cache. (this is near universal today, but it was actually the exception, not the norm in the 80's)
-Having to search the list for specific entries
- Having a trivial amount of data per node, such that the link pointers themselves dominate the storage requirements.
Obviously, searching is a very big issue in general.
Of course, in terms of teaching things, linked lists have several important relatives, most notably the tree.
A tree has a lot in common with a linked list on some level, but because the organisation is different, the result is also quite different.
Again though, know your problem space, know your hardware.
Do *not* pay attention to this advice if you are developing for CPU(s) without cache. The linked list was invented in 1955-1956, when no CPU had cache. Without CPU cache, a vector shoving each of its elements after the insertion/deletion point over by one location is very costly, and the random memory access itself of the traversal of the linked list incurs no performance hit versus incremental access.
I'm not even sure that CPUs without cache exist. But even if they did 99% of CPUs have cache so not sure what your comment is supposed to mean.
@@ayaanqui Most CPU models have no cache. Almost all the CPU models I develop for have no cache.
@@RetroDawn false. The majority have cache.
@@RetroDawn I will have to disagree with you on that, most CPUs (at least modern ones not sure about older ones) have many cache layers. Maybe the ones you work on don't but the majority of them certainly have cache
@@ayaanqui Smaller microcontrollers are cacheless, as are many but not all embedded controllers. I use everything from 9S08 (8 bit), 9S12 (16 bit), PIC16, PIC18, ARM, PowerPC. Only the ARM and two of the three PowerPCs I use have caches. There are actually three timing constraints. One is presence or absence of a instruction cache, another is the presence or absence of a data cache, other depends on the memory type. Smaller systems like embedded controllers that are run-in-place (code is in FLASH) often have static RAM so there is a small latency for page changes. Larger RAM has a precharge/rewrite cycle and crossing a boundary incurs a very large delay, often 70ns or more.
And the lessons learned:
1. linked lists make shitty associative arrays.
2. if you need ordered associative array, just use the fucking thing.
Bjarne looks like my uncle lol. Interesting and informative video!
is your uncle also smart :) ?
in the full video he basically says Linked list should not be your default data structure.
When you get to the point where you are starting to worry about insertion, deletion and search time, its time to use a b+tree which by their very nature are indexed.
Of course use B+ tree. Linked lists and RB tree are harmful
That missing chart was too much of a tease. I had to repeat his experiment. See the charts and the source code here. marketmovers.blogspot.com/2016/04/how-to-make-c-code-run-90x-faster.html
Sorry, I know what you mean. That popup comes from a 3rd party library, and sometimes it goes crazy. I'll ask our web people if they can tweak the settings on that window.
I'm glad someone on this thread has a functional brain and takes performance seriously. Also good work on providing the missing slide.
Thanks for providing this. I want to point out two things. Firstly, the formatting on your page is messing up the code. It's inserted angled quotes where there should just be normal quotes. I'm not complaining, I just figured you'd want to know. Secondly, after seeing this code, I'm kinda dumbfounded. This one's not on you, so please don't feel insulted. Randomly iterating through a linked list and deleting the node you land on? Of course performance is going to be exponentially worse than an array! That's not what people use lists for! lol... Lists (hopefully) aren't used in contexts where you need to randomly access data. They're used when you have a sequence of data that you want to iterate through, applying some function on all entries of the list, one after the other, in order, not randomly. Stroustrup should have known better. Maybe he's just an old man, and with one of his slides missing, he was flustered and forgot to mention this detail. I just hope he doesn't believe this is in any way proof that vectors are always better than lists. Certainly, they often are. On certain CPUs, maybe they even always are. Computerphile made a decent video comparing the two datastructures: ua-cam.com/video/DyG9S9nAlUM/v-deo.html
Title is incorrect. It should be "WHEN you should avoid linked lists".
then why not store pointers in a vector?
Linked lists are great! Pointers are great too! When I got my first c++ compiler in the 1980s I said what should I write? My conclusion was to rewrite my c linked list code as a c++ linked list object. I have used that object uncountable times, but never once have I needed to keep a list of integers. Real life applications often involve managing data records of 1000 bytes or larger. Mr. Stroustrup was also correct in pointing out that a lot of linked lists aren't large. For example, a patient's chart does not typically exceed 1000 records. While I still try to program efficiently as if I only had 640K of memory, I don't worry about the overhead of pointers and their speed because today's computers are so fast and c++ is so efficient that everything done in memory takes an instant. It is displaying stuff, accessing files, and doing communications that slows things down.
>I don't worry about the overhead of pointers and their speed because today's computers are so fast and c++ is so efficient that everything done in memory takes an instant.
5 years later: software has become even slower and more buggier than ever before.
@@youtubeenjoyer1743 Microsoft's original DOS was like 100K or less if you deleted utilities. Now Windows is hundreds of megabytes maybe even gigabytes. You have to wonder the level of inefficiency and kludge they used to produce so much code to do so little, with a lot of bugs. A lot of the code is just alternate proprietary approaches to lock you into Windows. The bugs are really a management and not software problem. Today's computers and the Internet are so fast that the only way you can be slow is to build in slow technology.
So I'd just like to know if I understand this right, and I would appreciate if anyone would let me know if I got it wrong but: arrays should be used mainly structuring numbers that will soon be used to calculate with, and linked list should hold a list of elements such as a list of an array of numbers or an array of strings or some data in general and iterate through them to do said calculations. Thanks in advance for the clarifications.
The linked list advantage is, theoretically, on mid element inserting/removing because you iterate to the position, and for removing/inserting you just change 2 pointers.
While on a contiguous memory (array/vector) to insert/remove an element mid position you iterate to it but you have to shift several elements to right/left: if you want to insert a new element on position 6 you have to shift to right all elements from 6 until the end of the array/vector. (That is, considering you want to preserve order ).
The thing is, on the practice the pointer chasing penalty of linked lists destroy - on most cases - any advantage they have.
On a contiguous memory data structure (vector/array) the processor will be very fast iterating doing: "get value, get value, get value" while on a heavy pointer structure it will be a lot slower on iterating doing: "get pointer, follow pointer, get value, get pointer, follow pointer, get value, get pointer, follow pointer, get value".
This will be still worse because the processor has a small "super ultra fast memory" (the cache) where on the contiguous memory case it will do "get value, get value, HMM - I SEE! HE NEEDS THIS REGION OF MEMORY - LET ME MOVE IT TO THE CACHE!".
While on linked lists each value is scattered across memory, with pointers pointing to distant parts of memory, and the processor will be "OMG - I CAN'T MAKE SENSE OF THIS" and will not be capable of using the cache.
There's a great write-up about this, applied to games, here: gameprogrammingpatterns.com/data-locality.html
Both this video, and that write-up is saying: Careful when using pointer-heavy data structures (like linked-lists) because they might have very bad performance due to pointer chasing.
Thank you!
@@Aleh_Loup Wonderful summary, thanks Alessandro 🙏
Great summary
nice video. link for the full talk, anyone?
This is the keynote for Going Native 2012, the full video can be found here: ua-cam.com/video/OB-bdWKwXsU/v-deo.html (the above part can be found at the 47 minute mark).
How about skip lists? Those seem like a reasonable compromise for fairly large data sets.
@@josephp.3341 It's a trade-off, of course, depending on data size etc. I myself prefer hash approaches: really don't like B-trees for two reasons. One, stuff has to get moved from place to place all the time, which is unnecessary data (& even IO) traffic. And two, the code is always a lot more complex, which leads to a much higher probability of bugs. I HATE complicated code...
People don't seem to understand that O(1) is not FASTER than O(n), it just SCALES better; there's a breaking point. Imagine a "hashcode as a service" rest call vs doing it in memory: it'd still scale better, but it'd take an awfully large 'n' to get there.
I see people missing this in PRs, streaming/looping an array is often faster with small arrays than something like a `HashSet`. And in practice, its really unlikely either makes a tangible difference in most use cases.
Yeah, and I assume all Call of Duty games are written in the resource-hogging Java too? Hell no. Minecraft would be better had it been written in C++.
Video title should be: Why you should avoid **Doubly** Linked Lists
Or sub-titled: Stroustrup learns how L1 Cache usage is critical for performance sensitive code.
"What every programmer should know about memory"
* www.akkadia.org/drepper/cpumemory.pdf
There was no need for the veiled ad-hominems.
Nice link tho thx for that.
Julien Cossette I was simply stating the **facts:** Stroustrup was not aware of how modern cache usage effects performance. He is not the only one who lacks _pragmatic_ knowledge about Data oriented Design:
* research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf
This is OOPs biggest weakness -- it does not scale to high performance due to programmers thinking about the uncommon case (one) instead of the common case (many).
There was no ad hominem insinuate or implied.
Sorry i just perceived it that way. But you're right programmers are getting worst and worst at understanding the real impact on the hardware of whatever they are coding.
But then again, sometimes you need only a queue data structure, and what can be more efficient than a linked list for that particular purpose?
If we can make assumptions about maximum size then Ring buffer (array based) would be faster. Especially if it is power-of-2 sized, you only need a single AND operation to wrap the indices.
Why are caches good at shifting elements? I couldn't find a good explanation for this via Google.
It has to do with temporal and spatial locality--if you are traversing sequential memory in a predictable pattern as in a vector's memory layout, the cache can prefetch large chunks of this memory and access it with little overhead. Meanwhile, a linked list operation involves following pointers to random locations with no locality. So Stroustrup is arguing that even if the forward shift for a middle-of-vector insertion or deletion may look bad from a time complexity standpoint, it's not that bad thanks to hardware optimizations that linked lists are less able to exploit.
I am just starting out in C, C++, and Python, and I hope to be this educated about programming in a few years.
Well, how are you now?
In fact, how are you by now?
Shame the graph didn't show up. The slides must have been stored as a linked list.
I wanna know who gave advice to the creator of cpp to write more oop style...
The inventors of Simula; Specifically, Kristian Nygaard. He was also part of the inspiration for the original C With Classes that was Bjarne’s project preceding C++
So when exactly should I use Linked list ?
This make sense after seeing the video, all the branch predictors in the modern CPU will fail with random access, good talk !
If performance is your biggest concern, you say "hello, assembler." As a practical matter, I would expect both graphs to be linear (for inserting/deleting 1 element in a structure that has N) or quadratic (for inserting N elements starting from 0.) The algorithm on an array is dominated by moving over the elements (which comes to a linear operation.) And the algorithm on a linked list is dominated by the linear search (also a linear operation.)
Thank you Mr. Stroustrup! I learned a lot from your lecture!
@jqbtube Ooh, stop crying.
Agree with you. But I think Linus Torvalds wrote the Linux kernel in C.
The graph was just at the end of a dangling reference
i got you fam, have a like
What if you have complex heap allocated objects stored in the containers? How does it compare then?
Same but both will be slower and the difference less significant. But the list would load all your pointers to heap in fewer cache lines, so you get your indirections faster
Is the rest of this talk available somewhere?
The Linux kernel is written in C and assembly. Applications can be written in whatever as long as they produce an executable binary.
From this video I learned: you get a lot of salty comments from confused people when you misrepresent the content of your videos.
what do you mean?
See my other reply, if you are interested in my response. And "my way of thinking" about hardware limitations seems to be the general trend of thought for the last 50 years - else why all the effort to make bigger, faster machines?
It's great to avoid indirection, sure. This is as true of my AMD64 as it was on my 8080A. But there are many tasks for which the indirection itself avoids many more cycles, such as inserting into an array as opposed to a list. Bjarney is ignoring the overall task.
I didn't fully watch the vid, but from stack overflow: In general, use vector when you don't care what type of sequential container that you're using, but if you're doing many insertions or erasures to and from anywhere in the container other than the end, you're going to want to use list. Or if you need random access, then you're going to want vector, not list. Other than that, there are naturally instances where you're going to need one or the other based on your application, but in general, those are good guidelines.
In practice 99.9% of the time you want a vector. Random access is a way more common case than random insertion. And even the cases that need random insertion usually end up needing std::map anyways.
For small collections (up to 5-10 elements) lists can do better since their structure (to be initialized) is very simple. If your language of choice makes use of that often then linked lists are a way to go. Besides that, it all depends on application. If random access is important use vectors. If sequential access is a fav then use lists.
@@randomseed I would never use a dynamic data structure like a list for such a small amount of data.
"My graph disappeared!" Must be a really fast implementation of a data structure whatever it was. ;-)
they must have not been using linked lists
Is there the entire talk somewhere?
Oh, these artificial examples with linked lists that only contain one integer. If you have a more typical data structure, the pointers are a much smaller percentage of the allocated memory. Also, the memory blocks in the heap are usually in large blocks, and are close together, so the caching is usually not that big of a factor. From a programming standpoint, having to shuffle things around in order to insert something in the middle is much more likely to have overruns because people won't test all the edge conditions. So, i am unconvinced by the arguments in this video.
For large types, the overhead is mitigated substantially, especially if they're cache-line aligned along with the list pointers. Also, serious uses of linked lists like in the Linux kernel back them with efficient allocators like free lists which tend to allocate the lion's share of nodes contiguously.
Also if you care about order, you adapt the list to a tree so finding the insertion/deletion point becomes a binary search. If you don't care you can usually use FIFO/LIFO.
It is true though that modern processors are astonishingly fast at sequential memory accesses because of cache pre-fetching and should be able to pipeline most array processing, so initial implementation ought be simple to save developer time.
Performance-wise, I can see how a vector be better than a list. But I'm curious about what the comparison is memory-wise. Does a vector require far more memory during resizing, or is there some crazy trick that the runtime library uses to shift around data and reallocate in place? The latter seems unfeasible to me. I would imagine if I had a 2KB vector inside 4KB of ram, and then I tried to add 1 more element to vector, I would get an out of memory exception.
There is a neat trick for that. See: man7.org/linux/man-pages/man3/memmove.3.html . Gives you the illusion that you have sufficient temp store.
Apoorv Kumar memmove() is for shuffling elements over on insert & delete, but when the allocation for the vector needs to grow, there's no way around the fact that you have to make a new allocation & copy (or move) elements from the old allocation to the new one, and memory usage will spike up during that operation. With a list, you don't have to do this re-size operation, but on most machines the list is going to use an extra 16 bytes per stored element (in addition to the per-allocation overhead which comes from the memory allocator). Quite often the stored elements are 8 bytes in size, so the list is at least 3x the allocated memory on a continuous basis, whereas the vector occasionally spikes up to 2x the memory requirement. So vector wins on memory usage in the common case of machine-word-sized elements, even when we're talking about maximum memory use as opposed to continuous. Another thing to mention: when you use a vector, whenever possible you want to call reserve() when you begin an operation that will add a number of elements (even if you only have a rough upper-bound on that). Calling resize() actually helps code readability, in addition to helping performance by reducing the number of re-size operations.
vonkruel - Right. I answered out of the context of a vector. Thanks for the correction :)
To those saying he chose a terrible example, remember his stated point of the example. Lots of people see this problem and think "this is a good use case for linked lists". That's it.
Did he ever posted the missing slide?
Linked Lists. Why you should avoid Bjarne Stroustrup.
No linked list. Why we should avoid garbage programmers (who do not understand computer architecture) like you.
is this the guy behind c++... Man i remember cheating in my exam to just to write the founder of c++
wait what
@@Lastrevio there was a question on his exam about who created c++. He didnt know it so he cheated.
What about arbitrarily scalable queues and stacks?
They have good amortized performance. They have a performance penalty when they regrow, and usually use more space. Since they regrow at a factor varying from 1.6 to 2, that is: when they get full they allocate a new region 1.6-2.0x bigger, copy all elements to there, and free the old region.
They do not have the "pointer chasing" problem because it's at-most one pointer, that points to a contiguous region.
That is, on almost all implementations of queues/stacks - since they use an array/vector implementation. If you use a linked list as the backbone for a regrow-able stack/queue then it will have the same problems pointed on the video (too much pointer chasing while iterating the container).
Why not have a mixture of array and linked list by preallocating an array and when the array is full, allocate a new array and have a pointer to it? Could you get the best of both worlds?
Matthew Mitchell regrow-able stacks/queues/arrays work more or less like this. They allocate an amount of heap memory, and have a pointer/reference to it. When it's full they allocate a new, bigger, amount of heap memory, copy the contents of the old region to the new one, and change the pointer/reference to this new region.
The linked list advantage is, theoretically, on mid element inserting/removing because you iterate to the position, and for removing/inserting you just change 2 pointers. While on a contiguous memory (array/vector) to insert/remove an element mid position you iterate to it but you have to shift several elements to right/left: if you want to insert a new element on position 6 you have to shift to right all elements from 6 until the end of the array/vector. (That is, considering you want to preserve order ). The thing is, on the practice the pointer chasing penalty of linked lists destroy - on most cases - any advantage they have.
There are several good data structures that have better performance than both vectors and linked lists, depending on your use case. A heap, normally implemented using an array/vector, has O(logN) for inserting/removing, and can be used if you have logically ordered elements.
If I want fast search, insertions and deletions I use a tree structure.
Matthew Mitchell While I agree that searching a tree can be very fast (especially B trees), inserting and deleting is not fast...at all
Linear search to insert on a linked list? Well there is your issue. I sped something up going from a vector to a linked list. Insertion sort (even using a binary search) works until the data becomes too large. Then you use merge sort.
But then how can I even reverse one?
okay, i know i'm a little late here....but can someone tell me that does in reality this hold true for arrays too? i.e. does array implementation will be faster than list?
Yes, but arrays cannot grow dynamically so the whole point with insertions and removals is pointless. Everything you do with a raw array is in constant time; Well searching aside, which can be done exactly the same as a vector; Which uses an array behind the scenes anyway. It’s basically a resizable array
@@casperes0912 ofc LL has that advantage. But atleast now i can use arrays in competitive coding more often!? Thank you for replying though!
Sir,
Instead of Linked List,
What should i use ? May i use arrays ? Anyone please explain me.
+Arun Prasath Elements in a linked list are scattered through memory, and for accessing the nTh element in a linked list you have to follow N pointers.
While elements in an array (or vector) are contiguously in memory, and you do not need to keep following pointers. You can access them directly.
Due to cache memory, an array (or vector) is a lot more efficient since when you access one element the entire array/vector will be put on the cache. While on linked lists this can't happen, since the elements are not together in a contiguously memory space.
Note that a linked list is different from the List abstract data type. A List is just an abstraction for a sequence of elements, which you can use a fixed size array for that, or a auto-growable array (vector), or you can link the elements with pointers (a linked list).
Python List, Java ArrayList, and C++ Vector are all the same data structure: an auto-growable array. Which is a great data structure.
Linked lists on the other hand, have very niche uses (but still have its uses) and most of the times are better substituted by arrays/vectors.
+Arun Prasath You should use a tree. An (a, b)-tree (with a big b if you're mainly inserting) would be good, here. But if you have to implement it yourself and don't want to put a lot of time into it, just use a binary tree. Trees have insertion ∈ O(log(n)) where n is the number of elements you already inserted. His stupid approaches both are ∈ O(n) (much, much, much worse) with a linear factor between them. Btw.: He claimed you needed a doubly linked list to do the insertion in an arbitrary place, which of course is plain wrong.
+Christoph Michelbach just write a custom allocator that allocates contiguously with the linked list. A char * would work.
zerphase So you still have to look up the address of the next block? What happens when something is deleted. A better solution to your problem would be an unbounded array.
+Christoph Michelbach you shouldn't need to. It's a stack allocator. Can delete in order of being placed on or roll back to the begging of the stack.
Personally, I'm used to doing game programming and keeping as close as possible to C for performance and compile times.
Thought-provoking video and comments. Very edifying for me, still learning the ropes with data structures, but savvy enough with performance analysis & architecture to follow and judge the arguments (both in the video and in the comments) fairly & profitably. Especially worthwhile (for me at least) are a few sage comments on how stl gets stuff done.
Just when I thaught I found the Holy Grale by using linked lists and its child structures, this video popped up on the screen :) Most of the tutorial sides for teaching beginners like me the basics of C++ apparently get this point wrong.
This dude seems to know what he is talking about. He should create his own programming language or give courses on udemy
Fuck!!! Bro he created the c++ programming language 😂😂😅
@@samarthtandale9121 u can't get jokes can u?
@@Person-hb3dv Oh ! sometimes I guess ... lol 😅
Basically, Stroustrup is saying:
If you use Linked Lists with a Schlemiel the Painter algorithm then they suck.
Well, he is right about that. But the argument is a strawman. You don't have to used Linked Lists like that and you shouldn't.
He fails to prove that Linked Lists are bad, he only proves that certain ways of using them are bad.
But a cache miss is still a linear operation. The linked list algorithm should still not go exponential.
Strictly speaking, a cache miss IS a linear memory access. The processor misses the cache so it accesses main memory. 50 to 100 times seems a little high unless you think it needs to access the page directory and page table every time. But disabling the cache outright should only cause a factor of four slowdown.
That's why you would use a vector of pointers
I wish people would avoid linked lists, namely, Facebook.
Surely, linked lists are better when you have big elements that need to be inserted and deleted often, right?
The linked list has to be the most overly hyped data structure in the history of computer science! Linked lists are WORTHLESS! You can do *everything* that you need to do by using a vector/array.
Cool. Now implement a cactus stack using vectors/arrays.
Why he didnt used paint
Sure (i hope you are being sarcastic)! Even if you are serious about it, that's nothing wrong with Visual Basic (6 or .net)?
I have done Visual basic 6 in combination with C/C++ (making the DLL), where VB6 calling those C declaration in high level. Not much difference in speed since binary dll doing the hardyard.
My point is you won't get the performance like C/C++ with other languages except Assembly, especially if there are huge floating point data e.g. Some high order differential equation
If we have very large objects stored and they frequently have writes in the middle and need sorting frequently then linkedList will perform better than vector because in vector we need to physically move/copy the elements and in this case size of the objects will become a constraint. This is not a problem in linked list.
Swapping two large resource owning objects in linkedlist we can simply swap them :) unlike in vector where we need to store them contiguously and hence will need a costly copy/move operation.
What about vector of unique pointers that point to struct? It holds small amounts of data even if it has 1000 elements so re-allocations are fast.
I guess the slides of that presenation were implemented as linked lists :-)
This is a very interesting video, not because of his conclusion, but because this CS guru has chosen a misleading example to falsely make his point. For this example, both structures take Theta(n^2) time. No matter his claim, this is NOT an example where Linked Lists should thrive. It has indexed access, which kill LinkedLists. This example might show that the Vector has much better constants, and that for examples where they both have the same asymptotic runtime, Vectors will do better, but there are certainly simple examples where Vectors will take Theta(n^2) while LinkedLists will take Theta(n). For those, you probably don't need google sized lists to see LinkedLists do better.
For more in-depth commentary, see: ua-cam.com/video/K8H4LGWB5vQ/v-deo.htmlm40s
+Algorithms with Attitude yes, you're right! Nice Algorithms videos in your channel by the way. Always cool to see more Algorithms & Data Structures material freely available.
+Algorithms with Attitude your video is more theory than actuality. In theory it's true that a LinkedList is better for insertion and deletions than an ArrayList but this is not the case in reality heres why.
1 cycle to read a register
4 cycles to reach to L1 cache
10 cycles to reach L2 cache
75 cycles to reach L3 cache
and hundreds of cycles to reach main memory
with that in mind lets continue, in order the avoid the cost of fetching data from main memory i.e avoid cache misses the hardware tries to hide the main-memory latency via a number of techniques. Basically three major bets are taken on memory access patterns:
Temporal: Memory accessed recently will likely be required again soon.
Spatial: Adjacent memory is likely to be required soon.
Striding: Memory access is likely to follow a predictable pattern.
An arrayList's datastructure has a one to one mapping with hardware memory which means it benefits a lot from the above memory access patterns above hence less caches misses. on accessing the first element of the arraylist the rest will be prefetched into the cpu's caches so cpu cycles are spent more on performing the operations required for inserting and deleting data rather than fetching data from memory, LinkedList on the other hand don’t fair so well as it's implemented using pointers which doesn’t take advantage of the memory access patterns stated above as a result there is a higher probability for cache misses using a LinkedList so a lot of cpu cycles are wasted trying to retrieve data for the next operation.
+Josiah Ebhomenye I understand this, but it can really be summarized as "The constants hidden by asymptotic notation are much worse for linked lists than for arrayLists." I have no problems with that. My problem is that he implies (everything but outright states) that for his given example, linked lists are asymptotically faster, when for this example, they are not. Saying that maybe for examples big enough that google would care about them strongly implies that for big enough N, asymptotic behavior would matter, but even for the large N he shows, it doesn't. That is really misleading because, for this example, both structures take order N^2 time, and nobody will argue that N^2 with big constant will beat N^2 with a small constant, if only N were larger.
He has a great point to make, and this is only one small part of a much longer talk. In the course of a talk, it is really hard not to slip up on a sentence or two. The example is nice, it shows the difference in performance, it is all good, except the part when he implies that this example is asymptotically better. That leaves the viewer with the impression that linked lists are always worse, unless you have an absurd number of items, when this example doesn't show that. (He knows this. It is poorly worded sentence or two in a talk. It happens.)
With 100K+ views, many by new students who won't think it all the way through, I thought I would point out the problem with those sentences. Walking away with "ArrayLists have much better constants" is good. Walking away with "never use linked lists" is not justified by this example. That isn't a controversial statement, Stroustrup would agree with it, and in fact does: isocpp.org/blog/2014/06/stroustrup-lists shows his more organized thoughts on it. (I didn't see this until after I had made my video and post.)
+Algorithms with Attitude Sure, but that's no reason to use linked lists. In those cases a binary tree would win every time.
Scias In no way did I intend to become "the great defender of linked lists", and I do understand that they are perhaps best used just as a teaching instrument during an introduction to data structures. But of course you can come up with uses when linked lists are better than balanced trees too, such as for a stack, queue, or deque. (I also understand that for those 3 cases, array based approaches can also do everything in O(1), with a much better constant than linked lists.) Given a choice of one, a balanced tree is more useful than a linked list, but it is easier to come up with cases when a linked list is better than a balanced tree than cases when it is better than an array based list. And of course, they are much simpler to code and understand than a balanced tree structure.
If that was true, why is Google, Facebook, Microsoft, and other very large and successful companies investing so heavily in C++? (and in fact, sending their employees to GoingNative 2012 to listen to Bjarne Stroustrup?)
Pointers => Cache misses, so you get bad performance traversing your objects anyway. Large data structures then you aren't using numberchrunching anway.
This is why you should avoid 360p videos
problem here is that Stroustup completly ignores the problem/cost of memory allocation.
sure you can push items around in an array really fast as long as you can pre-allocate a big enough linear chunk of memory in advance to hold that array.
But when you can't predict the size in advance and you find yourself needing to expand the array it can get hugely expensive.
Other problem is assumption that linear access is the norm. I seldom need to do that usually want specific list item via ptr
Uhh, preallocation is a solved problem. std::vector simply doubles the allocation size every time it runs out. And, yeah, it recopies everything - but this is still faster than whatever list does because it's a linear access (=the prefetcher does a great job) and it can use SSE operations (=16 bytes per instruction) and read and write on the same cycle.
Google chrome, Firefox and Safari are mainly written in C++. I guess those web browsers are, as you like to call it "noobs".
This is a really bad example. (For large values of N) neither vectors nor linked-lists are a good choice for the scenario he proposes. Not to mention that the curve he describes as "exponential" is certainly only quadratic.
I think you missed the point. This is about spatial locality.
No, I got the point. My problem is that a large fraction of the programmers who watch this video (or went to this talk) will think "If I have a problem that requires random access including insertions and removals in a sequence then Bjarne told me that a vector is the best data structure to use, so I'll use that!"
Pat Morin It would appear I missed YOUR point. =)
Levi Burton His spacial locality argument fails. Say I have his linked list of 100,000 elements. I insert in the middle of it. As he says, I have to traverse to the middle, risking a lot of cache misses.
Conversely, using his vector of 100,000 elements & a binary search, I'll hit less than 20 elements scattered over the list, getting a few cache misses. Then when I shove 50,000 elements over by 1, they'll *all* get cache misses.
Either way, i'm looking at cache misses for (on average) 1/2 of the list for every insertion and deletion. And his vector requires read *and* write for all those elements.
In any case, his whole argument is predicated on random access to the list/vector, which is not what linked lists are best for. They're great for a queue, and vectors are horrible for that. In short, use the right tool for the job.
jursamaj Actually, the number of cache misses for the vector is more like 50,000/L where L is the width of the cache line. Usually L is quite big, which is what makes the vector faster.
But you're right; neither a linked lists nor a vector is the right tool for this job.
Wait... like what? Which languages are both higher level(or more convenient) and faster than C++? also, you know that Google uses C/C++ as their main programming language right? Until people actually start seriously to invest time into run time optimization(which is kinda harder to implement on C/C++/Java/C# than languages like Python/Ruby), C/C++ will continue to be the standard for speed.
unrolled linklists :)
nice to hear him explain 'true OO style' isnt a good thing, with all the extra pointer chasing killing modern computers
He used a link list for inserting that graph that's why it disappeared!
Nice for queues though…
MFW the binary search for such cheap comparisons is actually pessimization over linear search of the vector.
The claim is deceptive. First it sounds like he is telling why you shouldn't use linked lists at all, but then he only really explains that you shouldn't use linked lists if you need to search random elements out of it. You know, there are situations where you only need to traverse through the list linearly without searching any particular element, because something has to be done to all of the elements, but along the way you might also want to remove some of the elements out of list smoothly.
Data locality and cache-friendliness also matters (a lot), which is the point of this video imho. When you access a position in an array, all values that sit in the same block are brought to the cache if they're not there already.
If using a vector, these are the next values you would be looking at next in a linear search, so all you get are cache hits. When using a linked list they probably aren't though, and this means cache fails = bad performance. So linear search in a vector will usually be much faster than linear search in a linked list, due to data compactness.
In his slide he says he didn´t have to use linear search for the vector example (of course he didn't have to, random access would have been much faster) but he did, just to be fair when comparing performance with the linked list.
graph expotential forms,Quadriatic forms in Linear Vector ,ot Tensor Algebraic Equations ,Can be deduced from basic Euler forms deriv from the Natural exponent e", and Pi !
So... imagine this to be a graph...
this video terminates before Bjarne finishes.
And what data structures are used in implementing the Dictionary? Your hash leads to an index... which is an address offset... in your vector.
I believe he's the creator of C++
Are you kidding me? Linux is coded in C. Linus himself said that he hates C++.
oh im sorry. i cant hear you over my computer running ubuntu thats written in c++ with a game engine i wrote in C++ that is not a text rpg and the numerous programs i wrote with wxWidgets that are written in C++.
Can someone please explain how the memory requirements @4:47 were deduced. I am off by a factor of 2 for both the vector and linked lists (I get 0.8MB and 3.2MB). Ex for list: (64x4)/8 * 100,000 = 3.2 MB.
This talk only explains why linked lists suck at handling random access when you have enough continuous free memory to allocate an array of the right size.
Random access memory is designed for fast random access using pointers and the fastest general purpose structure in that environment is the array.
So anything that can be done with an array should be done with an array.
All other structures (lists and trees) are performance trading fallback solutions to get around situations where it's impossible to use an array.
Mainly when lots of allocations and deallocations have happened and memory is so heavily fragmented that you can't find a single continuous free memory block large enough to hold your array.
Memory fragmentation is not really an issue anymore. The cpu can reorder all the 4kb pages in any order necessary (and use fixed-size buckets for smaller allocations), so it can always build a contiguous block of memory to map your array (unless you're doing no-MMU embedded stuff, or you're doing 32bit builds instead of 64bit and you exhaust the address space).
As someone who uses python a LOT. Whenever he says list, I am thinking about array..... so an array is just as good as another array. Trolling aside, this is quite a nice talk.
List in python is not an array, its literally a list. Unless you use numpy
i love all 3 pixels in this video
You haven't done any programming required a combination of high and low level programming right? E.g. device driver, graphical scientific modelling etc etc?
Do you know who this is?
Not sure if you're trolling or just an absolute idiot
Of course i do.
Also it sounds like i was asking a question, it could be responding to someone's comment. I must say i forget what the comment is about.
That comment may have been gone, it is 7 years ago. It is more than 2000 days+.
I reckon it could be one of these C and C++ comparison comment.
If you look down, i could be responding to this Alex Reidy, he looks more trolling then i do.
@@gerjaison Not being able reply to a comment stopped in 2009 or something - not 2013.
Sooo... this comment is most likely directed at the creator of C++ xDDD
PS: "I reckon it could be one of these C and C++ comparison comment." - dafuq? Stop doing drugs man
@@47Mortuus No, simple as that. I wasn't directing at the video. I was asking a question, and that was directing to someone. Whatever it was, I can't remember. But looking at my comment again, i am certain i was asking on whether that particular person has high level programming experience like graphics, and low level programming experience like device driver development etc. Which is nothing related to this video. It's got conversational tone.
I have another comment here about using DLL in VB6. That was definitely directing to someone. Read below, and that wasn't linking to anybody.
Linux is written in C.
If your array holds 50 items and you need to add 1 item.. your screwed. So what have a linked list of arrays?
but... but i love my linked lists =(
I also love ray casting but math is about 1.000.000 times faster, even if it's as dry and boring as arrays :*(
Shifting elaments of array after INSERT is one thing. Memory re-allocation is another. He didn't mention that.
+Jacek Migacz to be fair, this isn't the full lecture.
It won't re-allocate memory. vector has remain spaces after the last elements. It will just shift the element to right and do no memory allocation.