The Simple and Elegant Idea behind Efficient Dynamic Arrays

Поділитися
Вставка
  • Опубліковано 1 гру 2024

КОМЕНТАРІ •

  • @sainagachaitanyaporanki3589
    @sainagachaitanyaporanki3589 4 роки тому +198

    you are 3b1b of programming, keep going 👍.

    • @Kevin-hp5uo
      @Kevin-hp5uo 3 роки тому +9

      ​@@NStripleseven he's using 3b1b's manim: www.3blue1brown.com/faq#manim

    • @hamitdes7865
      @hamitdes7865 3 роки тому +5

      Nope bro he is reducible of programming 😇👍🏻

  • @gabriellasso8808
    @gabriellasso8808 4 роки тому +158

    Only a little improvement in your implementation:
    You said that when you remove an element and the array gets less that half its max size, you shrink it.
    But imagine the case that the array has 4 elements with max size of 4, and then you add, an element, remove, that element, add other element, remove, etc (common use case that acts like this is an stack).
    In this case you resize your array every operation.
    It is better to shrink you array in half only when it drops to 1/4 of its max size
    Btw, awesome channel :)

    • @Reducible
      @Reducible  4 роки тому +70

      Yeah there are tons of optimizations to the scheme I present here and that's one good one!

    • @6754bettkitty
      @6754bettkitty 3 роки тому +28

      That is what my friend told me. He said that there is a proof that increasing size by x2 and decreasing size by x1/4 is the most efficient.

    • @moiquiregardevideo
      @moiquiregardevideo 3 роки тому +15

      @@6754bettkitty if you know in advance the mean value of the size of large items, you can save valuable memory. For example, if the size is 260 items most of the time, instead of jumping from 256 to 512, one could tentatively increase to 260. For the rare outliers that exceed 260, plain old double size algorithm then apply.

    • @j3r3miasmg
      @j3r3miasmg 3 роки тому +6

      When you finish the video and decide to comment something nice about, but the comment already exists. The main idea is to consider inserts and deletes together to see this edge cases, where the worst case could appear if you find cyclic operations where it takes amortized O(n). That's why the ideal is to double the size when reachs N and shrinks the size to half and it reachs one quarter of occupation.

    • @flam1ngicecream
      @flam1ngicecream 2 роки тому

      Came here to say this! I worked on this problem at one point in university and this was my solution.

  • @adnanemezrag3809
    @adnanemezrag3809 2 місяці тому

    Wow, I subscribe just after watching the previous one. Once I watched it I directly moved to this one. I never understood arrays better than today. Thank you mate.

  • @phlaxyr
    @phlaxyr 3 роки тому +39

    4:12 another way to think about it is to convert to base 2.
    1+2+4+8+16+32+64+128+256+512+1024... is nothing but
    0b11111111111, and if we add one, it's trivial to see that all of the digits roll over and we get 0b100000000000. Therefore, the sum must be 0b100000000000 minus 1.

    • @paulstelian97
      @paulstelian97 3 роки тому +6

      If we expand this you can figure out how geometric series work in general.

  • @arithex
    @arithex 3 роки тому +46

    I've often daydreamed of making a 3blue1brown-style youtube channel but more focused on software and computer science topics .. very glad to discover someone beat me to it! lol Subscribed!
    It might be nice to add third chapter on this a topic, to cover what real runtimes and libraries do, in pragmatic practice. Especially modern, garbage-collected runtimes... but even when I was writing C code 20 years ago, I remember almost never wanting to deal with copy-on-resize expanding arrays. Eg. knowing the OS virtual memory page size of 4kb, we just allocated linked lists of (4kb/sizeof(T)) slabs, and encapsulated all access to the "array" behind an interface, which could walk the list to find the right memory address quickly enough.
    Sure that means lookups are technically O(n) instead of O(1) but in practice it just doesn't matter much for vast majority of cases where n < 1000 and you're not iterating over random-accesses repeatedly.
    And if you do an expanding-array of slab-pointers instead of a linked list, you get better RAM locality (fewer page faults) walking through the list.. in practice these things matter! And it also lends itself well to a specialized implementation of merge-sort, and especially so, if each slab fits within the L1 cache line of your CPU.
    (And if you continue to generalize that array of slab-pointers .. you end up with a thing called a Skip List, which is probably a data structure worth doing a video on.)

    • @strictnonconformist7369
      @strictnonconformist7369 2 роки тому +5

      That’s something that irks me in so many software interviews: so much talk on Big O thinking, which in so many cases breaks horribly in real hardware environments.
      A prime example: a doubly-linked list may work great in theory with n access time for any given element, and the common iteration pattern of sequential access, with no concerns for random access. Well, how big is each element in that list? Where are they allocated? How big of a chunk of memory for each item is used? Are those linked items used in sequence? What’s the lifetime relationship of the list items? The fun part about linked lists is that each item in the list could exist on unique memory pages for each, possibly more than one page per list item, and now iterating through the list swaps pages in and out, or (at least) in, one at a time. 100 items? 100 possible calls into the OS to swap. Nobody wants to compute the Big O cost on THAT!
      At times, the most efficient storage size and access time WILL be with segmented arrays, even if they’re not full, even accounting for moving chunks around to insert/delete elements. There are patterns and data structures that can be used to effectively make a dynamically-resized array that, while not theoretically-ideal for access times, certainly has less overhead than a mutation of a linked list (internal or external) and even makes it easier to implement undo/redo.

  • @alan2here
    @alan2here 4 роки тому +121

    That's interesting, I was expecting the answer to be a linked list. This seems especially good for storing 1 or 2 byte items re space required.

    • @kaishang6406
      @kaishang6406 4 роки тому +30

      I was expecting similar. But I don't think linked list has a O(1) for getting the element at m^{th} index.

    • @alan2here
      @alan2here 4 роки тому +11

      @@kaishang6406 That's true, it takes worse case linear time.

    • @samu6982
      @samu6982 4 роки тому +37

      Linked list are also very inefficient, the access time is linear and also they don't play nice with the cache

    • @electra_
      @electra_ 4 роки тому +11

      Linked lists can also solve this problem but they have their own disadvantages, in that they can't be indexed efficiently (plus they can't be optimized using a cache.)
      So I think this went with array lists because they're more general.

    • @iamtrash288
      @iamtrash288 4 роки тому +1

      linked list has O(n) access element time which is not ideal for our needs

  • @jillianonthehudson1739
    @jillianonthehudson1739 4 роки тому +5

    My new fave channel

  • @dmnd2806
    @dmnd2806 2 роки тому

    Your visual geometric summations are gold

  • @goldenalt3166
    @goldenalt3166 4 роки тому +9

    2:06 when you said 1 million here my first thought is that virtual addressing dramatically alters this at scale above the page size. Once you exceed the page boundary items never need to be copied when the array is resized. This means you can freely add pages with O(1) overhead and the 2x is unnecessary (Though it is also unimportant as the allocated memory is also not wasted).

    • @Jack-hd3ov
      @Jack-hd3ov 4 роки тому +2

      I thought of this too, I guess we're in theoretical CS land here though where we care more about worst case.

    • @paulstelian97
      @paulstelian97 3 роки тому +1

      Virtual memory though. Resizing may be unable to just allocate new pages as they are allocated to something else in the same process. Issue still stands. Sure, with clever implementation of mmap the issue can be reduced by quite a lot but doesn't mean that it's gone.
      Now to be fair, a reallocate that will move the pages could be nice especially if large pages are used or if you move page tables themselves outright.

    • @marcoottina654
      @marcoottina654 2 роки тому

      in order to manage pages, you need a new layer of software that manages those pages
      the O(1) access is no longer provided, because you need to access to the page first. For example, by storing the pages in a linked list, or in a self-balancing tree where the pages are ordered depending on their range of virtual indexes
      root
      / \
      node node
      / \ / \
      1..4 5..12 13..14 15..20
      the last layer are nodes with no children (null pointers) BUT non-null array pointers
      note: each node has to store the "left size" and "right size" in order to compute the overall size of that sub-tree size and therefore discard __that__ subtree (preferring the right-successor ones) upon a random access:
      i.e.:
      getAt( thisNode, index )
      if index < thisNode.left.upperVirtualIndex() )
      return getAt( thisNode.left, (index - thisNode.left.upperVirtualIndex())
      if index > thisNode.upperVirtualIndex()
      return getAt( thisNode.right, (index - thisNode.upperVirtualIndex())
      index -= thisNode.lowerVirtualIndex()
      return thisNode.array[index]
      something like that
      the "virtual index" should consider the left and right size:
      lower = left.upperVirtualIndex()+1
      upper = lower+ array.length
      at each insertion, increment the left and right size of the node that you are "leaving behind as you descend the tree" and remember to adjust the sizes upon rotating the subtrees
      and you are done :D more or less

    • @StefanNoack
      @StefanNoack 2 роки тому +1

      @@marcoottina654 I think modern processors have a page table to do this in hardware :)

  • @rabiaaghafoor
    @rabiaaghafoor 10 місяців тому

    please keep posting consistently on data structures and algorithms

  • @TheOneMaddin
    @TheOneMaddin 3 роки тому +18

    To avoid memory crawling one should not grow by a factor 2 every time, but more like 1.3.

    • @Asdayasman
      @Asdayasman 2 роки тому +2

      Googled "memory crawling" and didn't find much. What do you mean?

    • @TheOneMaddin
      @TheOneMaddin 2 роки тому +14

      @@Asdayasman You are right, googling that term leads to nothing.
      Let me explain briefly: suppose your list has size 10 and grows to size 20, and suppose that the new memory chunk is allocated immediately after the previous memory chunk. Later you grow to size 40, again allocating a memory chunk immediately after the one of size 20. So the memory chunks "crawl forward" to higher and higher addresses, in theory using up all available addresses. You might find that at some point it is a good idea to instead of always allocating after the previous chunk, you could place your new chunk back at the start where the size 10 chunk has been initially. After all, you know that there is a lot of free memory behind that one now, namely the previous memory used for the chunks of size 20, 40, 80, ... .
      However, by the nature of powers of 2 you will find that the cumulative memory used by all previous chunks is never enough to store the new chunk. One can show that this can be avoided by growing by a smaller factor each time, namely at most x, where x is the root of x^3=1+x.

    • @Asdayasman
      @Asdayasman 2 роки тому +2

      @@TheOneMaddin Hmm, interesting. Well-explained, too.

    • @NejakejTomas
      @NejakejTomas 2 роки тому +1

      @@TheOneMaddin You're not going to run out of address space in 64 bit systems. And because probably we won't have computers with 16 exbibytes of physical memory, ever (because of physical properties of materials and well, universe), there will always be enough of address space to map to physical memory. This problem only arises when you have more than half of the "virtual" address space available as physical memory (or memory swapped on disk but then you have other problems) which is still too much.

  • @HamStar_
    @HamStar_ 2 роки тому

    Got me thinking, and I came up with a solution that is always O(1) for appending new elements, but results in 3N writes, and takes 3N

  • @khengari77
    @khengari77 3 роки тому

    Really it was the first idea that came to my mind when I watched the last video

  • @jelleverest
    @jelleverest 4 роки тому +74

    Imagine having to resize an array when inserting new elements.
    This comment was made by the linked list gang.

    • @nukelet
      @nukelet 4 роки тому +46

      Imagine not taking advantage of cache locality
      This comment was written by the dynamic array gang

    • @jelleverest
      @jelleverest 4 роки тому +21

      @@nukelet imagine wanting fast read and write speeds
      Made by the sad coder gang.

    • @igorswies5913
      @igorswies5913 4 роки тому +1

      @@jelleverest so why do you even use linked lists

    • @alexray4969
      @alexray4969 3 роки тому +3

      @@igorswies5913 They are used for algorithmic interview questions. They have no real use case.

    • @avashurov
      @avashurov 3 роки тому +2

      Imaging searching things in array, especially in reverse order.
      This comment was made by binary search gang.

  • @GregoryTheGr8ster
    @GregoryTheGr8ster 4 роки тому +12

    contiguous storage FTW!

  • @jlewwis1995
    @jlewwis1995 3 роки тому +9

    I had a different idea for dynamic arrays, my idea is that you have a "segmented" system where instead of reallocating and copying the array every time you want to resize it you just take the pointer to it an add it to an auxiliary data structure that contains some more information about the array at large like how many "segments" there are, how large each one is (based on what its size was when you allocated it) etc. Like let's say for example when you create the array originally you want it to have 4 elements, so when you create it you could allocate n*2 like in the video (in this case 8) elements then take the pointer to that array and add it to the array segment table or whatever you would call it, along with the size and the current index into that segment (for adding/removing elements) which at the start would be 0 ofc. Every time you want to increase the size of the array you just get the new chunk/segment and repeat the process over and over until the array of segment information is full in which case THEN you would reallocate and collapse the array fully then start the process over again. The pros of this idea would be that resizes would be pretty fast because most of the time you dont have have to copy anything, and insertions/deletions would be pretty fast because you dont need to move around the values in the entire array, only the ones in the segment of the value you want to remove/insert, in fact you could accelerate insertion and deletion even further by adding an array of bitmasks for each segment in the information part that tell whether each slot is open or not so you can just do in place insertion and deletion and not have to move anything unless you explicitly need to in which case you could just make another function for that. Of course the main downside to this idea is it would take a lot of memory and disproportionally favors large allocations and disadvantaging small ones which doesnt exactly help. And resizing the array down would be a bit more finicky than sizing up because you would be tied to the size of the chunks you allocated originally so your best option would probably be to take the slow route and collapse it if you actually need to free up space and downsize and not just move stuff over while keeping the chunks.
    Maybe this is a really bad idea but it's what I came up with 😂 its probably not even original either, I don't think it's a linked list because I'm pretty sure in a linked list each entry contains information about the next whereas in my idea the data and information are separate, but idk much about the different types of dynamic arrays so maybe it is

    • @jlewwis1995
      @jlewwis1995 3 роки тому +4

      Yep just like I thought after doing some research it looks like someone else has already came up with this idea lol docs.huihoo.com/symbian/s60-3rd-edition-cpp-developers-library-v1.1/GUID-35228542-8C95-4849-A73F-2B4F082F0C44/html/SDL_93/doc_source/guide/Base-subsystem-guide/e32/ArraysAndLists/DynamicArrays/DynamicArraysGuide1/FixedSegmented.guide.html

    • @jlewwis1995
      @jlewwis1995 3 роки тому

      @Mustache Merlin well not every system uses virtual memory ;) really old game consoles like the SNES/NES/genesis etc use hardware paging/bank swapping instead of virtual memory, usually for ROM but sometimes for RAM as well like on the c128

    • @arithex
      @arithex 3 роки тому +2

      Yes in practice almost everything works that way .. I just wrote a separate comment attempting to explain why and how. In practice it's pretty rare that you need super-strict O(1) lookup-by-index performance, for a use case with large enough N for O(N) to matter... and esp. when it's really O(N/100) because you're storing chunks of 100 at a time!

  • @elmomertens
    @elmomertens 2 роки тому

    It is also worth noting that some programming languages support manual designation of sizes for dynamic arrays. In Swift, for example, you can call reserveCapacity(_:) method on arrays to manually set the minimum size of a dynamic array. This is favourable when you know the exact required size of the array because after the reserveCapacity call, it will always be O(1) operation when inserting elements up until you run out of the reserved capacity.

  • @WhiteDragon103
    @WhiteDragon103 2 роки тому +1

    As far as I know, (in C# anyway,) the List class doesn't reduce the size when elements are removed. I think this is done under the assumption that if the list became a certain size, it should be expected that it could reach that size again.

  • @davidshawver5243
    @davidshawver5243 3 роки тому

    This video is excellent.

  • @rabiaaghafoor
    @rabiaaghafoor 10 місяців тому

    Would like to see videos on BSTs, Topological Sort, and Minimum Spanning Trees!

  • @pedrovelazquez138
    @pedrovelazquez138 3 роки тому

    This method is brilliant. But I did not know that the runtime of allocating memory is independent of the size of the allocation. Great!!

  • @masondaub9201
    @masondaub9201 2 роки тому +1

    Looks like my half assed attempt at dynamic arrays when I needed them in C wasn't so bad after all. I always assumed the real implementations did something more clever

  • @fredericmazoit1441
    @fredericmazoit1441 3 роки тому +15

    I like what you do, and you might find me needlesly picky but "on average" and "amortized time" are not the same thing. Being O(f(n)) amortized time is stronger that being O(f(n)) on average.
    For example, if you start from an empty binary search tree and insert elements one by one, if the elements are chosen "at random", then each insertion will take, on average, O(log(n)) time. But this does not rule out the case in which you are really unlucky, and each insertion takes O(n) time, thus the total time is O(n²). This case is just very unlikely.
    If your binary search tree had O(log(n)) amortized time, then the total running time will be O(n.log(n)) regardless of the data.
    As a matter of fact, in the scheme that you describe for both insertion and deletion, every operation if O(1) on average but not in amortized time. Indeed, suppose that you have an array of size 1024 which contains 1023 elements.
    - You then insert 2 elements. You thus have to create an array of size 2048 which take O(n) time.
    - You then remove 2 elements. You end up only using 1023 element in an arry of size 2048. You thus create a new array of size 1024.
    - Then you insert 2 elements, then you remove 2 elements...
    Of course, this is very unlikely to happen but this cannot be ruled out. The operations are in O(1) time on average.
    Let N be the size of the array and n the number of elements used.
    If you create a bigger array of size 2N only when the array is full and you create a smaller array of size N/2 only when n

  • @pragalbhawasthi1618
    @pragalbhawasthi1618 3 роки тому

    An excellent one!

  • @kaltwarraith5172
    @kaltwarraith5172 2 роки тому

    In my experience, its not so much the number of insertions that takes the most time, but rather waiting on the memory allocation function to return. even so, I like the general idea here and I ran into a similar problem when implementing my own dynamic arrays years ago. my solution was rather than doubling, to simply make the array capacity equal to the nearest power of 2 that can contain every element in the array. i.e an array containing 0 or 1 elements has a capacity of 1, an array containing 2,3, or 4 elements has a capacity of 4, an array containing 5, 6, 7, or 8 elements has a capacity of 8, etc. This has the advantage that for large array sizes the array always takes up a whole number of pages since pages are of a size that is a power of 2

    • @waldtrautwald8499
      @waldtrautwald8499 2 роки тому

      Yeah, there is an implicit assumption that memory allocation takes constant time (which is not quite but sort of true in practice). As long as the array is large enough, the time needed for copying will dominate the time for the allocations. Big O notation only considers very large values for n.

    • @kaltwarraith5172
      @kaltwarraith5172 2 роки тому +1

      @@waldtrautwald8499 yes, i understand how big O works and i have no contention with that aspect of the algorithm as presented. However, in practice, if you are only concerned about Big O complexities, you will still end up with poor code. I don't think it is fair to say that even in practice allocations occur in constant time. the time an allocation takes will vary from call to call INDEPENDENT of the input(unless you count the internal state as input) though with some contributions from the input size. this is especially true when you do large or even medium sized allocations in a memory space that is sparsely populated.
      for experiment pre-allocate a large buffer and perform as many insertions on it as he did in the previous video where he did a brute force approach(500 billion insertions) but without allocating any new memory, just do those insertions on a single buffer. you will find that the time is negligible compared to the time he spent doing brute force resizing. from this you can conclude that the vast majority of the time his brute force approach took was not from copying elements, but from waiting on the allocator. and it makes sense, in the video it took him 12 hours to do those insertions, but your processor runs at several gigahertz and your ram is usually a bit slower which means the ram is your bottleneck. think about how slow your ram or processor would have to be for 500 billion insertions to take 12 hours.

  • @mychannel-te5ke
    @mychannel-te5ke 4 роки тому +14

    6:30 It doesn't quite work. If We have an array of size n = 2^k and then delete an element and then insert an element again, we'll perform O(n) operations. We can continue this delete-insert pattern O(n) times which will result in O(n^2) time complexity.
    The better approach is to resize the array by half of its size when there are not < capacity/2 elements but < capacity/4.

    • @riccardoorlando2262
      @riccardoorlando2262 4 роки тому +1

      Indeed! There's also the simpler option of never shrinking the array.

    • @mychannel-te5ke
      @mychannel-te5ke 4 роки тому +1

      @@riccardoorlando2262 Yes. Of course! But in this case we use a lot of memory we don't need.

    • @keonix506
      @keonix506 4 роки тому

      @@mychannel-te5ke If array is big enough (bigger than page size), we could just hand over free page back to operating system without doing any shrinking, thus saving space without incurring run-time penalties. Win-win

    • @khatharrmalkavian3306
      @khatharrmalkavian3306 4 роки тому

      @@mychannel-te5ke In practice it's better to allow the user to specify the cases where shrinking is advantageous. In most use cases the actual amount of "wasted" allocation is quite low, and when a container is being emptied there's a very good chance that it will soon be either destroyed or refilled.

    • @mychannel-te5ke
      @mychannel-te5ke 4 роки тому

      @@khatharrmalkavian3306 Yeah, but we're talking here more about computer science, not practice.

  • @oliver99999-e
    @oliver99999-e Рік тому

    great channel

  • @elidoz9522
    @elidoz9522 4 роки тому

    I had something other in mind to make a dinamic array: we have a chosen size that is the minimum size of the array, and another chosen size that is the size of every space added if the space isn't enough.
    It starts creating the space using the formula (M+KA): M is the minimum size of the list, A is the size of the space added every time, and K is an integer required to make the length of the array to at least the number of how many variables it contains.
    K can be calculated too with this formula ((L+1-M)/A): L is numbers of the variables to store.
    Note: if the value of K comes out as not an integer it has to be rounded up (the +1 will be explained later)
    This can be used instead of trying every value of K, since the M and A variables are chosen earlier after doing some testing.
    This doesn't work by copying all the array but bigger and adding something, instead
    Option 1: it has an extra space (that explains the +1 of before) that is a terminator.
    Option 2: the first value stores K or the length of the array, but another extra space at the end becomes the terminator, so needeng a +2 instead of the +1 mentioned before (this to ensure that some variable stored in there is not believed by the program as being the terminator).
    When the list still doesn't need more space the terminator is just a value telling the array at the end, but if the array is full and you try adding another value, the terminator becomes a pointer and points to another part of the memory, where an array of length A is saved, ready to save more variables in it.
    The new array has the same properties as the first one, but now you access the data trough the first one.
    The way to delete a value and add a value in the array is pretty much the same you said in the video.
    The big O notation would be this:
    O(1) to add a variable.
    O(N) to access a variable.
    O(1) or O(N) to delete a variable.
    O(N) to add a variable at a specific index.
    O(N) or O(2N) to delete a variable at a specific index.
    All the parts I didn't specify are because I tought of them as they are explained in the video.
    edit: A(the size of the space added every time) has not to be constant, like in the video, but it could be powers of 2.

    • @Robert-zc8hr
      @Robert-zc8hr 4 роки тому +2

      O(N) for accessing a variable is what too much. If A are powers of 2 you actually can do it in O(logN), winch is almost O(1) at all effects.

    • @elidoz9522
      @elidoz9522 4 роки тому

      @@Robert-zc8hr the O(N) for accessing a variable is where N is the number of added arrays

    • @matthewmiles1230
      @matthewmiles1230 2 роки тому

      Often times accessing an index of an array is the most common operation, and often times you access that index in order (looking through a list of objects to render, or a list of grocery items and adding them) - because of that when people choose to use an array, they are signaling to the system - I want lookups to fast.
      That is why you see arrays built as contiguous memory in most languages - there are a number of advantages to this approach, its very simple to understand first of all, and the index operation itself is a very small amount of code without any branching, but more likely the biggest contributor to performance when using a contiguous memory array is that you tend to load all the data you need to look through the list into memory at once.
      when you ask the system for a specific address in memory, it will actually grab a decent sized chunk from memory (as its more efficient that way) and load it into a special, super fast cache memory on the processor. When you attempt to find another piece of data, it will check the cache first - if it does not find it in cache, the system will need to go back to main memory and get the data - which is slow.
      When the array is contiguous in memory you reduce the change of a cache miss and can see giant increases in performance.
      That said: most things you program don't need blazing speed and your idea is perfectly workable!

  • @parasjain9416
    @parasjain9416 4 роки тому +2

    Dhanyawad

  • @koustuvganguly
    @koustuvganguly 2 роки тому +1

    In Java a arraylist increases by 3/4 size . That's the default. However nice explanation. ❤️

  • @hunarahmad
    @hunarahmad 4 роки тому +2

    thanks

  • @Grand_Priest_Goku
    @Grand_Priest_Goku 3 роки тому +8

    or just be an omega galaxy brain chad and simply create an array big enough that you won't ever have to resize it :P

    • @matthewmiles1230
      @matthewmiles1230 2 роки тому +2

      "I have created a machine that can travel through time and find the answer to any question!"
      "what did you ask it?"
      "How big should I make this array?"
      Science.

  • @Tumbolisu
    @Tumbolisu 3 роки тому +4

    Here is an interesting problem with doubling the array size: You can never re-use the memory that was used previously. No matter how many times you move your data forward, the new array is always larger than all the previous space you ever used. If your dynamic array is the only used memory on the whole computer, you will therefore at most reach 50% efficiency.
    The answere to this is to increase the array by a factor that is smaller than 2. The amortized runtime is still O(n), but the array grows slow enough that whatever memory you used previously will eventually leave a gap large enough that your new array can fit in it.
    Of course, this is not a problem for most software. If you are dealing with a truly massive amount of data and still require a dynamic array, using a factor such as 1.5 should be worth trying out. (1.5 can be easily achieved by multiplying the size by 3 and then dividing by 2. Make sure the multiplication by 3 doesn't accidentaly create an overflow.)

    • @paulstelian97
      @paulstelian97 3 роки тому

      The golden ratio would be optimal for this issue.

    • @Tumbolisu
      @Tumbolisu 3 роки тому +6

      @@paulstelian97 I just wrote a little program to simulate this phenomenon. The stats might surprise you:
      Factors between 1.01 and 1.32 allow memory re-use about 33% of the time.
      Factors between 1.33 and 1.46 allow memory re-use about 25% of the time.
      Factors between 1.47 and 1.53 allow memory re-use about 20% of the time.
      Factors between 1.54 and 1.57 allow memory re-use about 16% of the time.
      Factors 1.58 and 1.59 allow memory re-use about 14% of the time.
      Factor 1.60 allows memory re-use about 13% of the time.
      Factor 1.61 allows memory re-use about 10% of the time.
      Factor from 1.62 and above NEVER allow memory re-use.
      The golden ratio is therefore the largest possible factor that still just barely allows the old memory addresses to be re-used. However, if re-use is a serious concern, you should use a smaller factor instead.

  • @davidmurphy563
    @davidmurphy563 2 роки тому +2

    Wonderful video, brilliantly explained but the answer is a bit anticlimactic. After the first vid is was obvious that copying the array with every insertion was expensive so I just shrugged and right the obvious answer was to double but I wonder what brilliant solution is in place. Doubling. Ok.
    Don't get me wrong, this was a super valuable insight into performance analysis and big O notation. Just the solution is what I think every programmer's first instincts would have been if they had to put it together with a tight deadline.

    • @esquilax5563
      @esquilax5563 2 роки тому +2

      True! I was thinking "hmm, doubling is the obvious answer, maybe since linear growth of the array causes quadratic growth in the number of insertions, some clever analysis will show quadratic growth of the array is the most efficient"

  • @TheOneMaddin
    @TheOneMaddin 3 роки тому +10

    I don't think that most implementations of dynamic arrays shrink automatically. You usually have a trim operation for this.

    • @LegendLength
      @LegendLength 2 роки тому

      i just leave them in cold weather without a towel

  • @sathishraj1
    @sathishraj1 4 роки тому

    Pls share more videos...Thanks !!!!

  • @nilswicktor9145
    @nilswicktor9145 4 роки тому +2

    Is increasing size by a factor of 2 the most efficient? Would another factor, like 4, make a difference?

    • @Reducible
      @Reducible  4 роки тому +5

      From a theoretical standpoint, no, it makes no difference. Inserts are still amortized O(1). Practically, we actually want to avoid doing larger resizes because there's no guarantee that the extra unused space is going to be occupied. So that's why a factor of 2 is ideal.

    • @Rattlehead15
      @Rattlehead15 4 роки тому +3

      @@Reducible But if that's the case, why not resize it by a factor of 1.5, or 1.25? And also, isn't there a point where you're just adding one block every time, but since your formula depends on the original N its complexity stays at O(N)?

    • @Reducible
      @Reducible  4 роки тому +5

      @@Rattlehead15 I forgot to mention the other extreme where if we resize by too small of a factor, more resize operations may be necessary. Empirically, a factor of 2 is the standard since it seems to strike the balance between the two extremes and we have a nice invariant that after the first resize, at least half the array is occupied with elements.
      The key thing here is with a multiplicative factor, the amount of new memory that you are creating will change dynamically based on the size of the array. So you will never reach the case where you just add one block every time.

    • @Rattlehead15
      @Rattlehead15 4 роки тому +2

      @@Reducible Yeah I figured 2 was probably the standard out of convenience more than anything. Anyways thanks for the detailed response! I'm really loving your videos so far btw

    • @user-sl6gn1ss8p
      @user-sl6gn1ss8p 4 роки тому

      @@Rattlehead15 The "Big O" may be the same, but the closer you get to 1, the faster the multiplicative constant on N rises. 1.5 is thus considerably worse than 2 in this sense. Also, this is not taking in account that requesting memory may itself be slow.
      Going over than 2 has the disadvantage of growing up too fast, but also diminishing returns on this multiplicative constant: from 2 in the case of doubling it only approaches 1 at "multiply by infinity".
      2 really is kind of a breakpoint in this sense (as going over or below 1 can be for exponents, but less drastically), so I'm not sure I'd say its simply out of convenience. I'm sure there's a more formal way to state this : p

  • @nahasn7679
    @nahasn7679 4 роки тому +2

    we know the formula to calculate geometrical progression . ar^(n-1) 4:25

  • @LUMEN_science
    @LUMEN_science 2 роки тому

    Um dos melhores do youtube

  • @znefas
    @znefas 2 роки тому +1

    I thought the solution would've been a mix of arrays, linked lists, and modular arithmetic, but it was much simpler than I had imagined :D

  • @zzador
    @zzador 2 роки тому

    I have seen many professional implementations of a dynamic array that use a resize-factor of 1.5 instead of 2. Really sure 2 is the best resize-factor? What's with "memory-wasting"?

  • @TheHasanJr
    @TheHasanJr 2 роки тому

    Does this all happen in the stack memory?

    • @decb.7959
      @decb.7959 Рік тому

      Since you are allocating things with an unknown compile-time size, the array will have to be stored in the heap.

  • @alexsere3061
    @alexsere3061 4 роки тому +4

    I am wondering what would happen if you use a number different than 2, but that's a question for tomorrow when I have had some sleep

    • @Robert-zc8hr
      @Robert-zc8hr 4 роки тому

      You'll still get the same complexity, O(N), so long as you use numbers >1. For numbers between 1 and 2 you'll reduce space but increase time, for numbers greater than 2 you'll reduce time (only by a factor, so doesn't change the complexity) but increase space.

    • @khatharrmalkavian3306
      @khatharrmalkavian3306 4 роки тому

      I think there was some study that determined that the golden ratio was the most efficient, but std::vector is usually implemented with 1.5 or 2 because that can be done with simple integer math.
      size += size >> 1
      or just
      size >>= 1

  • @jeroenodb
    @jeroenodb 4 роки тому

    Isn't the maximum space the table uses 3*N units? Because when you are copying you have allocated both the old and new array simultaneously?

    • @aragonnetje
      @aragonnetje 4 роки тому +2

      Technically that would be the space complexity of the insertion algorithm, as opposed to the steady state space complexity of the datastructure (i think). But you are absolutely right that that would be the peak memory usage

  • @canaDavid1
    @canaDavid1 4 роки тому +3

    6:13 can't you just deallocate the unused half of the array?

    • @samu6982
      @samu6982 4 роки тому +5

      this can't be really done, maybe in some system, but probably will throw an error or deallocate the whole block
      in C and C++ calling free() in the middle of a memory block is undefined behaviour
      stackoverflow.com/questions/4589033/does-the-pointer-passed-to-free-have-to-point-to-beginning-of-the-memory-block

    • @khatharrmalkavian3306
      @khatharrmalkavian3306 4 роки тому

      If you're talking C then realloc() can be used. In C++ you can write a custom allocator that uses malloc()/free() and then sneak some realloc() in there in a standard-compatible sort of way, but that would actually require rewriting member functions of std::vector or just rolling your own vector class.
      Here's why you shouldn't:
      Algorithmically reducing the allocation size is actually not a great idea from a heuristic standpoint. In most cases a vector that has contained a large number of elements and is now being emptied of many of them will soon be either destroyed or refilled. In either case it's best to avoid reallocating and just hold on to the existing allocation. The user can indicate if there's a special case and the allocation should shrink:
      myvec.shrink_to_fit();
      Even with that said, you should think about whether or not it's worth the effort. Unless you're working with a tightly constrained micro-controller or something then you don't need to worry about a few MB here or there. You really only want to bother with something like this if you're developing something that consumes a lot of resources on a complex system, like allocations in threads on a cloud server or something. If you have 100 threads each hanging on to half a GB that they don't need at the moment then yeah, definitely worth the trouble. Those cases are pretty confined, though, and there will likely be memory management policies in place in those kinds of environments.
      Far more often you can get an easy but good optimization by specifying a decently sized capacity before filling a vector, since it can skip some fraction of reallocations. For instance, if you're going to be loading some subset of values from an arbitrarily-sized X by Y grid, you can do something like:
      myvec.reserve(std::max(x, y));
      Which is super-cheap and will probably skip a decent number of reallocations, depending on the grid size and how dense your sampling is. (Obviously you can get more precise with it, but you generally want to aim for something that's really simple to calculate and probably won't overestimate.)

  • @christianschneider4926
    @christianschneider4926 2 роки тому

    why not use fibonnaci instead of **2?

  • @strangelpeaceful
    @strangelpeaceful 2 роки тому

    Why not just use fixed arrays of say, 16 and when an array becomes bigger than 16, create a new empty set of 16? Wouldnt this greatly improve mutation speed?

    • @APaleDot
      @APaleDot 2 роки тому

      Typically arrays are contiguous blocks of memory. What you are describing is essentially a linked list of arrays, which _is_ a way to have a dynamically growing collection, but you lose the O(1) runtime to read an arbitrary element from the collection.

  • @JyrkiKoivisto
    @JyrkiKoivisto 3 роки тому +1

    Actually any memory pointed by an array address pointer isn't "real memory", it's all virtual. Nominally it's 4k smallest size nowadays.
    Array pointers are virtual addresses and the cpu's mmu takes care of converting it to a real memory address.
    No copying is needed when the page limit (say 4k) is exceeded, kernel just allocates a new page from the real memory and appends it to the arrays virtual memory, mmu takes care of all the rest and it seems contiguous to the program.

  • @rhysperry111
    @rhysperry111 4 роки тому

    My idea for dynamic arrays would be:
    Every item in the array takes up two pieces of memory, one to store the item, and then one that stores the address of the next item in the array (a nullptr if it is the last element)
    This should mean that it uses 2N memory, but wouldn't have to do much moving around. To insert a new element, it can just go to the end, change the pointer to a free space in memory, and then put the new element there. For inserting and deleting elements, it would be a bit more complicated, but not too complicated.
    Sorry if I didn't explain this well.

    • @prostomaxym
      @prostomaxym 4 роки тому

      I think there is a problem in your approach. Arrays stores data in sequentive memory locations. And it is allocated before use. So you cant just increase size by one in the way you tell, because the program then will try to access unallocated memory. You need to reallocate memory when you try to increase the size of an array, for reallocation you need to allocate a new bigger array, copy elements from old one, and delete old. In other words, this and previous video exactly tell about approach to reallocate memory smarter. Structures like lists could allocate memory in the way you said though. Adding new element to list could allocate memory location that is not sequent to previous element's memory location

    • @luminolic_black8286
      @luminolic_black8286 4 роки тому +5

      This exists as a data structure: a linked list. It can do O(1) and delete, but the problem is accessing elements takes O(N) time. To find an element, you have to go to the first element and follow the next pointer until you find it, which takes O(N) worst case.

    • @khatharrmalkavian3306
      @khatharrmalkavian3306 4 роки тому

      @@luminolic_black8286 It's worth pointing out that unless you pre-allocate for the list (leading us back to the original problem) then insertions always require allocation, which is painfully slow.

  • @nontth5355
    @nontth5355 3 роки тому

    but what about array with different data type for each element?

    • @decb.7959
      @decb.7959 Рік тому

      In that case you will have to have an array of pointers to the elements, since a pointer is always the same size. This is what languages such as python, c#, etc do under the hood. You could try storing a size flag at the beginning of each element instead, but then the access time would become O(n) instead of O(1), since you would have to step through the entire array one element at a time rather than just doing a multiplication.

  • @minhthonglai
    @minhthonglai 2 роки тому

    Why the factor of 2? What about factor of 3 or you know any number 1.5, 1.2,… Will it still work or work better?

    • @tiarkrezar
      @tiarkrezar 2 роки тому

      It's a tradeoff between wasting more memory than you need and wasting extra time because you need to copy the arrays more often. No matter what factor you choose, the time and space complexity will still be O(N), only the constants will change. With a factor of 2, you'll use 2N memory in the worst case scenario, and the worst case number of insertions will also be 2N. With a factor of 3, you're using up to 3N memory, and max 1.5N time. Or in general, for a factor of k, you get k*N worst case memory usage, and 1/(1-1/k) * N time.
      So, in the end, the best choice depends on what you're doing, and the constraints of the hardware. 2 seems like a reasonable middle-ground default, but I'm sure some implementations pick other default factors too.

    • @minhthonglai
      @minhthonglai 2 роки тому

      @@tiarkrezar thanks dude!

  • @JivanPal
    @JivanPal 2 роки тому

    I am sad that nobody said "page tables"...😢

  • @MattWyndham
    @MattWyndham 2 роки тому

    In my data structures class (or was it discrete math?) my teacher off-hand said "oh yeah, for a dynamic array you want to always double the space if you reach the limit." and that stuck with me because he said it so casually but it represents so much math

  • @heinrichdertote149
    @heinrichdertote149 4 роки тому

    But why not Arraylist?

    • @6754bettkitty
      @6754bettkitty 3 роки тому

      Technically, an array list is a dynamic array under-the-hood.

    • @heinrichdertote149
      @heinrichdertote149 3 роки тому

      @@6754bettkitty Not if you create it this way.

  • @DavidNitzscheBell
    @DavidNitzscheBell 4 роки тому

    1:28 Insertions 12? Huh? You don't show that or explain the insertions count.

    • @supremedevice1311
      @supremedevice1311 3 роки тому

      The original array started with a size of four, with all four spaces being filled. Here, we have 4 insertions.
      Then, we try adding more terms to the array. Using the method in this video, we double the array size, so we go from four to eight spaces. First, we copy the original four values from our first array, and then we add four more as we fill out the remaining spaces. Filling this array of size eight took 8 insertions.
      Adding up the two insertion counts (4 + 8) gives us the '12'. Hope that helped!

  • @secret6365
    @secret6365 2 роки тому

    What about *pointers

  • @Yutaro-Yoshii
    @Yutaro-Yoshii 4 роки тому

    Can we do O(1) on insert in the middle?

    • @carlosarcia5714
      @carlosarcia5714 4 роки тому +3

      Not with any types of array. But it would be possible with a more complex data structure, for example a linked list. But theb you lose constant time for accessing any element

    • @Yutaro-Yoshii
      @Yutaro-Yoshii 4 роки тому

      @@carlosarcia5714 What about using hash map in conjunction with linked list?

    • @carlosarcia5714
      @carlosarcia5714 4 роки тому

      @@Yutaro-Yoshii uhmm, can you be a bit more specific with what you mean? But i dont think you can achieve both things. The best is probably a tree which would give you logarithmic search and constant insertion anywhere i think

    • @carlosarcia5714
      @carlosarcia5714 4 роки тому

      Actually, thinking about it a tree would be a combination of hashmaps and linked lists.

    • @Yutaro-Yoshii
      @Yutaro-Yoshii 4 роки тому

      @@carlosarcia5714 Guess you are right, I was thinking about tree for a while, but it seems that the search would be logarithmic. Maybe we could do log(logn) if we do some trick (like multi root tree or increasing the connectivity etc) idk still thinking about it.

  • @Zex-4729
    @Zex-4729 4 роки тому +3

    ha jokes on you i learned all this from C++ since you need to learn how a vector works.

    • @Zex-4729
      @Zex-4729 4 роки тому +2

      but still a great educational video!

    • @johnsherfey3675
      @johnsherfey3675 3 роки тому

      I learned it from c as you have to make your own dynamic list.

  • @anonymous-in5fp
    @anonymous-in5fp 3 роки тому +1

    Can someone explain me the difference between dynamically allocated array in C using Realloc and the one demonstrated in the video.

    • @maxguichard4337
      @maxguichard4337 3 роки тому +1

      Realloc doesn't deal with the logistics of expanding the arrays and such. If you were to implement this scheme in C, you would certainly used realloc in its implementation. For example, you would have an expand function taking a pointer to the array, or perhaps some array struct (containing the current length, datatype and such), which would use the data about the array's current length to realloc a new memory segment with the appropriate size (e.g. 2*current_size). If you'd like an amateur implementation in C, I can send a link to my own code.

  • @JMPDev
    @JMPDev 3 роки тому +2

    While technically correct, it really bothers me that you use the verb ‘insert’ for the operation of appending a new value at the end.
    At least in my mind the verb ‘insert’ is reserved for inserting at a given index, requiring the relocation of later elements.

    • @mina86
      @mina86 3 роки тому +1

      Appending is just inserting at the index equal length of the array and moving all elements past that index (which just happens to be no elements).

  • @7th_CAV_Trooper
    @7th_CAV_Trooper 3 роки тому

    How about an array of arrays. When you need to grow beyond the boundary, you only have to grow and perform inserts on the the parent array plus memory allocation for a new page. Find the page offset by dividing the index by the page size and get the page index by taking the remainder of the previous division operation. Performance is dependent on the page size. Adding is faster, reading almost as fast. Insert and removing an element is still problematic, but you only have to shallow clone the outer array before the point of insert/removal and then perform inserts on pages beyond the insert/removal point.

  • @Celastrous
    @Celastrous 4 роки тому +3

    Am I not seeing something restricting an obvious answer here?
    How about: You start with K elements. You want to add N elements to it. You just make a new array that is K+N long, and insert your K+N elements into it, resulting in K+N insertions.
    This results in a space requirement of N+K, which is O(N), and a time requirement, based on insertions, of N+K, which is also O(N), and performs better than the one in this video that uses 2N time. Also, only one numerical calculation is required for the entire process, computing N+K. No multiplications are done.
    How is that not the real final answer to this?

    • @Reducible
      @Reducible  4 роки тому +10

      Good question: what you're missing here is we won't actually know how many elements we need to add ahead of time. If we knew we will eventually have N elements to add, then yes we might as well make the array have size N and call it a day. The question with dynamic arrays is what is a general scheme to handle any number of elements, where we don't know how many elements we could be adding, but we do know elements are going to be added to end of the list one at a time.

    • @Celastrous
      @Celastrous 4 роки тому +2

      @@Reducible Ah ok, I must have missed that nuance. Thanks

    • @julian_handpan
      @julian_handpan 4 роки тому

      Owned!

  • @oamioxmocliox8082
    @oamioxmocliox8082 2 роки тому

    ;)

  • @therealsuper5828
    @therealsuper5828 2 роки тому

    What if instead of when you make an array and you clone everything over, you don't clone everything over?
    You just make a new array, and start over but from index (amount of arrays + 1) not including the new array.
    I guess the new array size can be the current full array size to double it, as described how this is efficient in the video.
    Then you memorize where all the arrays are stored.

    • @MrMutebe
      @MrMutebe 2 роки тому +1

      That is a thing, called a linked list, basically each array also stores the location of the next one

  • @ferociousfeind8538
    @ferociousfeind8538 2 роки тому

    Wait... it is that simple?

  • @frostburnspirit9065
    @frostburnspirit9065 4 роки тому

    I have absolutely no clue what this is a about

  • @miteshkumar5557
    @miteshkumar5557 3 роки тому

    just do realloc lmao.