Binary Search Algorithm - Computerphile

Поділитися
Вставка
  • Опубліковано 21 лис 2024

КОМЕНТАРІ • 269

  • @craftmechanics6483
    @craftmechanics6483 Рік тому +386

    I love teaching this to my students as a guessing game.
    I will think of a number from 1 to 100 and give them 8 guesses, but after each guess tell them if their guess was correct or lower/higher.
    After a few rounds of playing, most people figure out that the optimal strategy is guessing somewhere in the middle, and essentially discover the algorithm on their own.
    After that, we implement the gussing game on a computer.
    I find this a very fun way to learn about binary search.

    • @DavidLindes
      @DavidLindes Рік тому +21

      i'm imagining doing that in a classroom, and... yeah, sounds fun! Cool!

    • @Armageddon2k
      @Armageddon2k Рік тому +19

      I'm totally stealing this idea!

    • @mariolol8333
      @mariolol8333 Рік тому +16

      imagine your teacher naming himself craftmechanics6483 with a minecraft pfp on youtube XD

    • @DarrenYung
      @DarrenYung Рік тому +1

      I've been doing this exact same lesson for years too! Once you've introduced variables, conditionals and looping then you have the basic building blocks of algorithms. Binary Searching with the guessing game is a nice easy way to introduce this fundamental algorithm.

    • @nashsok
      @nashsok Рік тому +2

      I audited MIT's 6.0001 (Introduction to Computer Science and Programming in Python) course through OCW and the professor did the exact same demo to introduce binary/bisection search!

  • @user-zl4kk7wi5u
    @user-zl4kk7wi5u Рік тому +38

    Dr. Pound is excellent and entertaining as always. I found myself smiling and giggling throughout the whole video due to the way he handles his small mistakes and jokes around. Yet he also imparts wisdom!

  • @KurtisRader
    @KurtisRader Рік тому +13

    The point made at 14:30 is arguably the most important part of this video. Specifically, it is less important to know how to write, from scratch, a particular algorithm than it is to know that different algorithms have different tradeoffs. Knowing how to pick the best algorithm (and data structures) for a particular situation is more important than being able to implement an algorithm on a whiteboard. One of my interview questions explores whether an applicant understands that there are many sorting and searching algorithms and the nature of the data being manipulated, and other constraints, is important in deciding which algorithm to use.

  • @emrekantar5003
    @emrekantar5003 Рік тому +398

    These men make me feel as if I am in a coffee house with my best friends talking

    • @queueeeee9000
      @queueeeee9000 Рік тому +10

      Dr. Pound and Dr. Grimes from the numberphile videos are my absolute favorite!

    • @justawatchin2
      @justawatchin2 Рік тому +8

      Dr Pound sounds and acts identically to someone who went to my secondary school in London, cheeky lad in IT class who aced all his assignments while getting his computer to run games we weren't supposed to have access to. So, yes.

    • @njd9828
      @njd9828 Рік тому +2

      You spelt pub wrong.

    • @aaabucus3104
      @aaabucus3104 Рік тому

      Anybody who sits around and talks like this has no friends in the first place and that's just science!

    • @queueeeee9000
      @queueeeee9000 Рік тому +3

      @@njd9828 you spelt pubic wrong

  • @genelee7869
    @genelee7869 Рік тому +82

    I'd love to see a video explaining the concepts behind how the different sorting algorithms work (Heap, Merge, Shell, etc)

    • @JanB1605
      @JanB1605 Рік тому +3

      I'd love a video on hash maps!

    • @Loki-
      @Loki- Рік тому +1

      ​@@JanB1605I loved my prof I had for computer science 2 which was lots of this. He was like this video, plus comedy, and simplified PowerPoint.

    • @jimwinchester339
      @jimwinchester339 11 місяців тому

      They exist here on UA-cam. I've seen them.

  • @konstantinrebrov675
    @konstantinrebrov675 Рік тому +103

    We need more videos explaining in low level details how various common algorithms work, such as file I/O, B trees, image compression, files representation (PDF, audio, video, archive, ELF), Huffman coding, hashing algorithms, md5sum, dynamic programming. I mean walk through the pseudocode line by line explaining what it does, and draw the memory diagrams, like what is happening to the data, how the bits or bytes are stored, loaded, moved around.

    • @ross951
      @ross951 Рік тому +7

      Textbooks are your best bet for finding and understanding (easy to quickly reread sentences) that kind of material.

    • @konstantinrebrov675
      @konstantinrebrov675 Рік тому +3

      @@ross951 Lectures are better because then we would be able to see exactly how all the bits and bytes are moved around in the memory.

    • @Aidiakapi
      @Aidiakapi Рік тому +4

      They've done videos on many of the topics you've mentioned.

    • @chessthecat
      @chessthecat Рік тому

      That's not what Computerphile is. Sounds like you should start your own channel.

    • @konstantinrebrov675
      @konstantinrebrov675 Рік тому

      @@chessthecat Who are you to say?

  • @Gudsfralsare
    @Gudsfralsare Рік тому +146

    In these times when AI is all the hype, these are the videos we need. The fundamental algos from the 60's are here to stay and run the world.

    • @thatchessguy7072
      @thatchessguy7072 Рік тому +3

      AI was used to help find an improvement on binary search recently. It’s covered in keynote talk of the 2023 CPP con.

    • @theblinkingbrownie4654
      @theblinkingbrownie4654 11 місяців тому

      ​@@thatchessguy7072wHAT?

    • @philroo1
      @philroo1 11 місяців тому

      ​@@thatchessguy7072 🤞 all watched over by machines of loving grace

  • @daveingerslev
    @daveingerslev Рік тому +3

    Finding theft (or damage) on CCTV footage is a fantastic real world use for binary search. As long as something visually changes (i.e. goes missing or gets broken), then it turns the task of watching the cause into a 5 min task instead of hours/unjustifiable.

  • @vatbub
    @vatbub Рік тому +16

    One thing to add: Once the data is sorted, and you need to add new data to it, binary search can actually tell you where you need to insert the new data in order to keep the data sorted. Therefore, the real issue is getting the data sorted in the first place (as mentioned in the video), but once it is sorted, binary search helps you find stuff fast and helps you keep your data sorted even when you modify it.

    • @NihongoWakannai
      @NihongoWakannai Рік тому +8

      Adding a new piece of data to the middle of a giant array would require changing a lot of memory though. Which is why we might use something like a binary tree instead.

    • @pacoalsal
      @pacoalsal Рік тому

      Depending on the problem the performance benefit of keeping the data adjacent in memory might offset the overhead of shifting it around.

  • @Richardincancale
    @Richardincancale Рік тому +15

    One of my favourite books from almost 50 years ago - Sorting and Searching by Donald E Knuth - tells you all you need to know!

  • @doomanime61
    @doomanime61 Рік тому +6

    I can sit and hear Dr Mike Pound talking all day

    • @michaelwilson5742
      @michaelwilson5742 11 місяців тому +1

      Yeah, everyone on the corridor can sit and hear him talking all day

  • @SparkysWidgets
    @SparkysWidgets Рік тому +5

    I changed a linear search on a calibration step for a product on the production line to binary searching. This small change netted millions of minutes of production line weekly(test used to take minutes and we produced over 1million units a week) back to the CM. The main point being is it doesn't have to be a large data set it could be a small known dataset but that each takes time to search or in this case "test".

  • @traywor
    @traywor Рік тому +93

    10:25 just wanna note here for anyone trying to implement this algorithm for very large arrays, be aware of the integer overflow. (l + r) // 2 will first calculate l + r and then try to divide it by two. however if your list is lets say 2/3rds of the size of your integer and if you then add l + r and l and r are quite large, then they will overflow. Then your m will be messed up and your binary search will be incorrect. And this is not a contrived scenario, there was this exact kind of bug in the java SDK, for 9 years!
    I was actually surprised that Mike didn't mentioned this :O

    • @jlappala
      @jlappala Рік тому +23

      We try to teach the l+(r-l)//2 -method to avoid this.

    • @solqr6521
      @solqr6521 Рік тому +22

      It was in Python, where there wouldn’t be overflow

    • @johnvriezen4696
      @johnvriezen4696 Рік тому

      @@solqr6521He should still write the code the l + (r-l)//2 way because code will inevitably be found and copied to a platform that has 32 bit integers (e.g. C) and then break.

    • @raphaelfranzen9623
      @raphaelfranzen9623 Рік тому +1

      @@solqr6521 Why wouldn't there be overflow in Python ? An integer has only a finite size , so if you add more than it can hold , it should wrap around (overflow).

    • @krajsyboys
      @krajsyboys Рік тому

      @@raphaelfranzen9623 Integer overflows don't happen if you are using just python because they allocate arbitrary amount of data to the variables, so if a number should have overflowed, python allocates more momory to store the variable in

  • @olafurfo89
    @olafurfo89 Рік тому +7

    Currently doing my second year in computer science and you made me realize when I'd want to use BST in projects.
    Sort in morning, many lookups throughout the day.
    Makes so much sense but I hadn't realized it, Well done !!!

  • @robblake8999
    @robblake8999 Рік тому +6

    It's an excellent point to remember that sorting is O(n log n), wheras a linear search is O(n). Linear search could definitely prevail for a small number of lookups.

    • @IceMetalPunk
      @IceMetalPunk Рік тому

      That's *if* your data isn't already sorted, of course :)

    • @turdwarbler
      @turdwarbler 11 місяців тому

      @@IceMetalPunkNo, for a small number of items, the exact number is hardware dependant a linear search will beat everything apart from a direct index access. linear searches on arrays are very cache friendly, as you walk the array, the CPU is already pre fetching 64 bytes ahead of you. We do this all the time for fast lookups of small tables.

    • @IceMetalPunk
      @IceMetalPunk 11 місяців тому

      @@turdwarbler Cache lookups aren't considered in algorithmic runtime analysis.

    • @turdwarbler
      @turdwarbler 11 місяців тому

      @@IceMetalPunk Well thats where theory and reality differ. In HF LL trading we use linear lookups all the time because they are faster period, for small arrays. Even without a cache they would still be faster, no set up time, no complicated logic, simples.

  • @coolwarfare
    @coolwarfare Рік тому +2

    Thank you this channel is going to save me before GCSEs

  • @agoatmannameddesire8856
    @agoatmannameddesire8856 Рік тому +5

    More Dr Pound videos on search / sorting!

  • @GordonjSmith1
    @GordonjSmith1 Рік тому +4

    This was great! It really highlighted the need for the development of 'logical' learning, and not just 'proceedual' or 'fact' learning. Actually a very 'key vlog', in all the excellent vlogs that have been made.

  • @Lion_McLionhead
    @Lion_McLionhead Рік тому

    Pounder can make even this basic algorithm sound complicated.

  • @-parrrate
    @-parrrate Рік тому

    for most problems, I prefer a strict L

  • @PowerShellWizard
    @PowerShellWizard Місяць тому

    I wish I had this guy as an instructor when I was going to college. I may have stood a chance with some of the data structures and algorithms classes I was taking at the time :)

  • @logosimian
    @logosimian Рік тому +1

    I already knew everything in this video, but I still watched it.

  • @IceMetalPunk
    @IceMetalPunk Рік тому +2

    Wow, I can't believe there's never been a binary search Computerphile video before now! 🤯

  • @atkelar
    @atkelar Рік тому +3

    I love to show these examples in my programming courses (day job); one of the arguments that sometimes is held against more complicated algorithms is that "yes, less steps but each step is significantly more complicated"; but then you do the math and compare 100M steps vs. 23 and see that they would have to be quite a bit more complicated to make the 23 step version perform worse than the linear search one. And I also encounter people who would solve a "find in 10 items" problem with such an algorithm... Cost/Benefit analysis gets quite complicated if you don't have an idea about the number of items involved, or the number of executions required.

    • @NihongoWakannai
      @NihongoWakannai Рік тому +1

      I think it's always important to learn that there is more to performance than just big O. With webdev being so popular, people focus a lot on scalability. But there are lots of situations with very predictable dataset sizes thus optimizing functions themselves becomes more important

  • @erichobbs4042
    @erichobbs4042 Рік тому +2

    The point about knowing when to use an algorithm is the most important bit of knowledge to take away from this video. I've forgotten how to exactly write plenty of algorithms, but those things can always be looked up.

  • @thuokagiri5550
    @thuokagiri5550 Рік тому +14

    prof Pound is the Richard Feynman of Computer science

  • @Loki-
    @Loki- Рік тому +1

    14:15 My analogy for going through computer science is that we add a bunch of tools to a toolkit and learn when to use which tools.

  • @wybren
    @wybren Рік тому +5

    I love binary search. If look up times are really low binary search is as good as unbeatable because of computational simplicity.

  • @GilesBathgate
    @GilesBathgate Рік тому +12

    Don't forget you can also use binary search to implement a container that is sorted on insert. (although there are other approaches)

    • @tiarkrezar
      @tiarkrezar Рік тому +1

      Constructing a list that way is really inefficient. Even if the search is O(log(n)), each insertion will be O(n). At that point, just use some kind of a tree data structure instead.

    • @rmsgrey
      @rmsgrey Рік тому +1

      @@tiarkrezar It depends how your data is stored. For example, if you normally only store data in even-numbered positions, but insert into odd-numbered positions, then redo the entire list when you hit a collision, then insertion is only O(n^0.5).
      But, yeah, a naive array implementation of a sorted list will have problems with insertion speed.

    • @GilesBathgate
      @GilesBathgate Рік тому +2

      @@tiarkrezar I did say there are other approaches. Insertion sort, Self-balancing BSTs like AVL trees and Red-Black trees, etc...

  • @sid6645
    @sid6645 Рік тому +1

    Binary searches over some state space with a non trivial checking condition (like some monotonic equation) is also fascinatingly useful. Also if you rectify the state by changing basis or something. One of my favourite algorithms!

  • @chrisogonas
    @chrisogonas 9 місяців тому

    Simply and elegantly illustrated! Thanks, folks 👍

  • @Salara2130
    @Salara2130 11 місяців тому +1

    Well that was a nice refresher of the basics ^^

  • @kevinramadhan9871
    @kevinramadhan9871 Рік тому

    I used to watch you all the time when i was in uni around 3-4 years ago. Just stumbled upon this video and glad you're still going strong! keep up the videos! it's always a treat to watch, even if i'm already familiar with the topic

  • @andrewharrison8436
    @andrewharrison8436 Рік тому +1

    Built into COBOL as the SEARCH ALL command - often overlooked but fast if you know about it.

  • @conorstewart2214
    @conorstewart2214 11 місяців тому

    This is also how a type of analogue to digital converter (ADC) works, it is successive approximation (SAR). Interestingly when used for ADCs it corresponds directly to binary. The first step where you have a comparator with half the max voltage and the input voltage, the result from the comparison is the most significant bit (MSB) of the value, then when you perform the next comparison it becomes the next bit and that continues on until it reaches the closest that the ADC can get. This method relies on a way to create all the intermediate voltages for the comparison so SAR ADCs contain digital to analogue converters (DAC) too.
    Also due to the nature of analogue voltages, similar to real numbers, the search will only very rarely be able to end early, since a comparison between two analogue voltages or between two real numbers is very unlikely to result in them being equal and may be impossible if the comparator only output less than or greater than. This does mean that these ADCs and real numbers will pretty much always take the same number of steps to complete.

  • @gerokatseros
    @gerokatseros Рік тому

    I really like the way you explain everything ... making it so simple

  • @LiarsPendulum1
    @LiarsPendulum1 Рік тому +1

    I am teaching myself programming and was literally practicing writing this kind of code yesterday, working on data sorting algorithms. So glad this is being written in Python. I started with C and this is a great window into how C maps on to Python.

  • @patientson
    @patientson Рік тому

    First of all, thank you for agreeing that the mere fact one is enhancing their skills makes it a career move. Most employers are notorious for despising early beginnings.

  • @martijn3151
    @martijn3151 Рік тому +9

    In theory, yes binary search is faster. In practice a linear search might actually be much much faster. The one thing that makes a big difference is cache. Computers read ahead memory. So it greatly helps if all entries are stored sequentially. If the binary search is implemented in a tree, the entries probably are all over the place in memory, not sequentially, causing cache miss after cache miss. This will severely impact the performance.
    Obviously when searching billions of entries, binary is most likely to win. But in a lot of real life situations, the lists are much smaller. Small enough to fit in a cache line. Making the search almost instant.
    So never just assume what you learn in computer theory is fact, use different setups for your problem and measure(!) to determine what solution works best.

    • @Quarky_
      @Quarky_ Рік тому +6

      Well, if someone thinks binary search is always faster, then they didn't really learn anything did they? The algorithm clearly gives you a speedup by halving the search space at every step, then obviously it is faster only for large numbers. One might even try to quantify this with: for n = 2^m, if m

    • @Pyrazahn
      @Pyrazahn Рік тому +1

      Another reason why linear search can be faster for small lists is that modern cpus heavily rely on pipelining and branch prediction. Basically for every if-else, the cpu guesses which of the two paths will be executed next and already executes the first instructions of that path in the background. If the guess was correct, then the results are already there for free. But if the guess was wrong, it is much more expensive to take the other path.
      In a linear search, if the cpu predicts "element not found, move to next item in list", then that will be correct in all but one case. But in a binary search, the expected results are 50% left and 50% right - so branch prediction fails 50% of the time instead of just once.

    • @martijn3151
      @martijn3151 Рік тому

      @Quarkly_ No need to be pedantic here. Log n is always smaller than n. Also for small numbers. Using binary search on a list of 10 items, worst case is 3 steps. Worst case linear search is 10 steps. Yet, as I pointed out, linear might be much faster in that case even though theory says binary search is at least 3 times faster.
      So, at least to me, this isn’t trivial.

    • @AAjax
      @AAjax Рік тому

      Interesting! So what would the break-even size, between binary search and linear search? I had a look and the x64 cache line is typically 64 bytes, which wouldn't seem to allow for a very useful amount, but this is all new to me.

    • @krzychxyz8358
      @krzychxyz8358 Рік тому

      Well, the break-even size depends on a lot of things. From your code efficiency, the programming language, the compiler you use all the way to the specifics of the data set (some data types use more computation power to compare than others). But i've never seen a case where that is above 60, so I guess it won't ever get big.

  • @JuliusUnique
    @JuliusUnique Рік тому

    simple yet efficient way of searching, doing it myself that way since I basically can think

  • @Lampe2020
    @Lampe2020 Рік тому +1

    If the list is in order you can even make it simpler to determine whether you search further by opening the first and last box and if the target number isn't between those in value it's not going to be inbetween those boxes either.

  • @CoreyOgburn
    @CoreyOgburn Рік тому

    Fun fact: if you're searching through a list of unknown size (maybe a stream of data) that's ordered then you can do a binary search by checking powers of 2 in a similar way. If the value is greater than m, m *= 2.

  • @jakoon_the_manic
    @jakoon_the_manic Рік тому

    Always happy to see a Mike video! Great refresher.
    😀

  • @LucasundAaron
    @LucasundAaron Рік тому +1

    in python i prefer using l=0 r=len(arr)
    now right is exclusive and points on invalid greater parts.
    the only +1 indexing will happen in the main loop now.
    you only check the end of valid array is 1 pos further with:
    „while l+1

  • @allenkwan8310
    @allenkwan8310 Рік тому +3

    This is the CS equivalent of asking a band to play the classics. Nice.

  • @nosuchthing8
    @nosuchthing8 Рік тому +2

    The phone book analogy is SO powerful

  • @Chocoholic1337
    @Chocoholic1337 Рік тому +1

    Love the algorithm explanation videos! Mooooore!!

  • @heliospace
    @heliospace Рік тому +1

    This is the first time an algorithm seemed useful

  • @astropgn
    @astropgn Рік тому

    If you try it, don't convert to list and then sort it, like it was done in this video. Instead, sort the array first using numpy (np.sort()). It is way faster because numpy has less overhead. It always knows the type of every element so it doesn't make type comparisons like python does in the background.

    • @pettere8429
      @pettere8429 Рік тому +1

      Or just made the random numbers smaller and run a cumulative sum.

  • @artofcomputing-art
    @artofcomputing-art Рік тому

    Hell yes! Here comes another great Computerphile video 🎉

  • @cannaroe1213
    @cannaroe1213 11 місяців тому

    If you use the values of l and r (call them L and R), then you can do a weighted binary search which usually performs a little bit better than one that defines m = (l+r)//2
    So at 3:13 a weighted binary would say a = 17 , l = 4 (0-based), L = 16, r = 7, R = 9000 (unchecked in video so i made it up)
    Because a is very close to L and very far from R, you find m as usual but then bias it more in favour of L than R to the point it shakes out as "5" and so finds a immediately, and you don't have to make L = 10 :P

  • @jeromethiel4323
    @jeromethiel4323 Рік тому

    You don't need to sort all of the time. Before you put new data into the list, you search first, then insert the data where it belongs in the list. So you are using a binary search to insert an item into an already sorted list. Which is much more efficient than adding data, then resorting.
    Heck, i implemented this in basic on an Apple ][ back in the day, because sorting routines were just slow. Searching and copying to make a hole for the new data was faster.

    • @neilmehra_
      @neilmehra_ Рік тому

      You don't really want to be using a list if you have a large number of insertions, since what you're describing has a time complexity of O(q(n + logn)). A better approach is using a Red-Black Tree or Balanced BST, for O(log(n)) insertion and query.

  • @BrankoDimitrijevic021
    @BrankoDimitrijevic021 Рік тому +1

    Interestingly enough, linear search is likely to be faster than binary search for a small array of integers, due to prefetching and branch prediction mechanisms of modern CPUs.

  • @gsestream
    @gsestream Рік тому

    make it faster by giving ranges for each of the boxes, thats then only one step to get to the right bix, which can be sub-divided to another range boxes, its better than log(n) of binary search, having ranges is better than just having the numbers in order, if you have different boxes for all different numbers, then the search is O(1), its bucket search, essential is that you pre-calculate the ranges and not during the search. works more like heap, which does the processing on the background and not during the searches/updates. if you have unique box for each different number, at a known index, which you can jump to, directly, then it works like radix sort for sorting. radix search for buckets might be super duper fast.

  • @mikefochtman7164
    @mikefochtman7164 8 місяців тому

    Somewhat related, "How do you keep the list sorted?" Why, use a binary tree of course! Somewhat like picking up a hand of playing cards one at a time. Find the spot where it belongs using binary search, and insert the new item where it belongs.
    Yes, I know there are more sophisticated trees, (i.e. red-black, avt, etc...), but the crux of the matter is, if you insert items in order in the first place, then you never have to sort the list.

  • @felipeabraga
    @felipeabraga Рік тому

    Nice, was studying about it last monday... Thanks

  • @diagorasofmel0s
    @diagorasofmel0s Рік тому +3

    This was so informative and capturing, surprisingly my attention span gets a bit longer when prof Mike Pound speaks lol.

  • @bishboria
    @bishboria Рік тому +2

    I taught this to my 1st year students less than a week ago.

  • @georgelza
    @georgelza Рік тому

    love these discussions.

  • @hashtagPoundsign
    @hashtagPoundsign Рік тому +8

    There are 10 types of people, those who watch Binary Search videos, and those who don't.

    • @housellama
      @housellama Рік тому +5

      There are 10 types of people in the world: people who understand binary, those who don't, and those who know this joke is trinary.

    • @nosuchthing8
      @nosuchthing8 Рік тому +1

      Clever

  • @willhendrix86
    @willhendrix86 Рік тому +2

    As a business intelligence professional I can conclusively state how important by-section of data is to resolving problems. You discard half the data from the issue each iteration.

  • @BeesAndSunshine
    @BeesAndSunshine Рік тому +4

    One way you could make a presorted list of random numbers is to use randint to increase an int at the index in an array equal to the generated number. Then once you have generated the quantity of integers you want you go through that array to set the values of a new array inserting x 0s where x is the value at index 0, x 1s where x is the value at index 1, etc. This would be slower than generating an unsorted list but is significantly faster than generating an unsorted list and then sorting it, because technically it's still an O(n) algorithm, and no sorting algorithm is O(n).

    • @Yewtewba
      @Yewtewba Рік тому

      Assuming int size 4 bytes, generating ints up to 2 billion, it looks like you're trying to create about a 7.5 gigabyte array. It's infeasible for large numbers. It would run on O(m+n), where m is the size of the largest element, whereas generating and sorting would most likely run on nlog(n), where n is the number of elements.
      It's important to recognize you're still essentially just sorting a random list, just with your own custom sorting algorithm.
      Edit: I didn't know it had a name, it's called "counting sort", and it's apparently commonly used as part of Radix Sort. Both of those might provide you with an interesting read.

    • @idonno87
      @idonno87 Рік тому +1

      My take would be to create a list starting with a random number between 0 and, say 100, then add a number by adding a random number from 0 to 100 to the previous list. Runs in O(n), not great, but hey.

    • @Yewtewba
      @Yewtewba Рік тому

      @@idonno87 add numbers one at a time in the correct order? Should be O(nlogn), binary search to find the insertion point over n elements.

    • @Tumbolisu
      @Tumbolisu Рік тому

      No *comparison-based* sorting algorithm is O(n). Radix-Sort does not use comparisons and achieves O(n). It can't be used for every type of data, but it works perfectly on integers.

  • @sandervdrie9182
    @sandervdrie9182 Рік тому

    thanks Dr Mike and thanks the Computerphile team, for and good video would, have loved to have had a lesson from Dr Mike

  • @Rayer24
    @Rayer24 Рік тому

    Before he showed the code I went and tried it out myself to see if I could do it and my code became roughly 100,000 times faster searching the number 888,456,123 in a list of 1 billion numbers ranging from 1 to 1 billion.

  • @LupinoArts
    @LupinoArts Рік тому +7

    The box next to "16" could contain 16 again...

  • @TwoTeaTee
    @TwoTeaTee Рік тому +1

    Calculating M can cause overflow. So, a better version could be -> M= FLOOR(L+ (R - L) / 2)

  • @FalcoGer
    @FalcoGer Рік тому

    @7:00 l+r might be an integer overflow. Depending on your language that might be undefined.

  • @Mode_na
    @Mode_na Рік тому +4

    Shouldn't you check the first and last item in a sorted list to make sure the number you're searching for is in range? It should be more efficient by potentially circumventing the need of running a search algorithm all together.

    • @f.f.s.d.o.a.7294
      @f.f.s.d.o.a.7294 Рік тому +2

      Only if that is frequently the case.

    • @ligmoes
      @ligmoes Рік тому

      Indeed. when its out of range it shrinks in a linear fashion until the ranges swap and returns false but also works for when is in range and is not on the list. However, that is a matter of implementation.
      Here's same function with your solution:
      def binary_search(lst, a):
      l = 0
      r = len(lst) - 1
      if a > lst[r] or a < lst[l]: return print("Out of Range")
      while l lst[m]:
      l = m + 1
      elif a < lst[m]:
      r = m - 1
      else:
      return True
      return False

    • @neilmehra_
      @neilmehra_ Рік тому +1

      It doesn't really impact performance as log time complexity is minuscule.

  • @LuigiElettrico
    @LuigiElettrico 11 місяців тому

    Nice explanation, thanks.

  • @bolonabolona
    @bolonabolona 26 днів тому

    One the use case of binary search is for finding a squre root or cube root.

  • @KeyboardSourceError
    @KeyboardSourceError Рік тому

    Thank you Dr. Pound!

  • @gwaptiva
    @gwaptiva Рік тому

    As a programmer of 15 years, I've never once needed to know how sorts work internally. That's what libraries are for.

    • @mgord9518
      @mgord9518 Рік тому +2

      Libraries which are created by programmers...
      It's useful to know even if you have no interest in implementing them yourself

    • @xybersurfer
      @xybersurfer Рік тому

      there is definitely some truth to that. but sometimes you can't bring your data into a format that the libraries can interact with. like if you have a huge file, then you may not be able to fit the relevant parts into memory so that you can hand it off to the library for example to do a search. i have encountered this myself. it's those kinds of situations where it's useful, but i admit is it's not something most people have to deal with

  • @garcipat
    @garcipat 11 місяців тому

    If you use a random function it probably will have numbers evenly distributed of order them by the numbers. that means you could also just number them from 1-1000000. it would take the same amount of time more or less i think.

  • @emilemil1
    @emilemil1 Рік тому

    Could have made a fake list that just returns a value y(x) when accessing index x. That way you're not memory bound since the values can be calculated as they are accessed.

  • @justovision
    @justovision Рік тому

    I think you need to follow up with bubble sort. Recursion is maybe the most important thing to learn.

  • @mjbirdClavdivs
    @mjbirdClavdivs Рік тому +1

    I think you get the color on your hands from touching the paper before the ink has dried. How humid are the rooms you're in?

  • @laser-sj
    @laser-sj Рік тому +1

    Love a good Pound...ing 😁

  • @beepbop6697
    @beepbop6697 Рік тому

    Anyone who has ever looked up a name in a phone book, or used a dictionary, intuitively knows what a binary search is. Keeping driving by half until you find the page you are looking for.

  • @bryan69087
    @bryan69087 Рік тому

    MORE MIKE POUND

  • @EtzEchad
    @EtzEchad Рік тому

    I worked for a company where they had a binary search subroutine that they used for everything. The worst case was when they copied a column of values to a local array, SORTED it, and then called the binary search routine.
    So, thats O(N) for the copy, O(NlogN) for the sort, and O(logN) for the seach. A linear search would've been O(N) - the same as the copy!
    (The worst code I ever saw.)

  • @goguhu
    @goguhu Рік тому

    And obvious if you have a type of distribution (random, bell curve, ...) you could estimate the location based on the first/last value in the range to check ...

  • @cscscscss
    @cscscscss Рік тому +4

    Great video! I understood it very well.

  • @MuhammadRaza-yd6sg
    @MuhammadRaza-yd6sg Рік тому

    Been just thinking about this lately

  • @davidgillies620
    @davidgillies620 Рік тому +2

    Binary search is a notoriously hard algorithm to get right on the first attempt. The key insight is that one of l or r must change at every iteration, but there are subtleties to do with things like integer overflow, caching and so on that make it an extremely bad idea to write your own version. You should almost always use a library version of the code (these days that's true for most well-known algorithms).

    • @rmsgrey
      @rmsgrey Рік тому

      Yeah, you shouldn't write your own code to do something the CPU, OS, compiler, or an available library will do for you (unless you know what you're doing and have a compelling reason). At best, you create something marginally better optimised for your specific use case; at worst, you create a buggy mess that ends up fighting the system as both it and your code attempt to achieve the same things in different ways.

  • @erske
    @erske Рік тому

    100% of time spent is figuring out the best way to sort the damn sample. Searching through it is the easy part!

  • @GTCrais
    @GTCrais Рік тому +1

    Not only did you swap X for A, but you also called your list variable "lst", which in that font looks like "1st", which makes it even more confusing :D

  • @peterboccia
    @peterboccia Рік тому

    Great video, but I sincerely appreciate when I studied it in C, more pure to me. But anyway it's the same. Great job

  • @Atrix256
    @Atrix256 Рік тому

    Do interpolation search next!

  • @kuro4841
    @kuro4841 Рік тому +1

    would it be an improvement to first check the first and last index to see if the given number is even in range before looking through the boxes?
    also would it be advisable to place the first lookup point based on where between highest and lowest it is?
    so if its closer to the lower bounds, then maybe starts dividing a little sooner like at 40% of the dataset?
    would this add too much performance overhead or would it possibly an improvement?
    i know that division is a comparatively costly operation. I would be super interested on someone elses input on this :)

  • @dimitrys.6563
    @dimitrys.6563 Рік тому

    What about « ponderating » the « middle » by adding two extra steps at the beginning, looking for boundaries values. With that known, instead of taking the middle for next « check » we could check closer from the left or right if the needed number is closest to the given boundary.
    Do you think it would gain probability to go faster ? If yes, how much ?
    Second idea : to build the random order list, could be as idea to build it sequentially ba pushing new item in list which would be « new pushed item is last one plus a random number between 1 and … let say… 3 ». You would get an ordered list without doubles and spaced from 1 to 3.

    • @TheKilledDeath
      @TheKilledDeath Рік тому +1

      The first idea would work as long as the numbers are uniformly distributed among the list. Which probably happens with how this list was produced (lots of random = linear distribution).
      But in any case where the list is not uniformly distributed it would get a lot slower. And lots of lists are not uniformly distributed. Take a look at Benford's Law for an example.

  • @JeffKaylin-ft5cx
    @JeffKaylin-ft5cx Рік тому

    If the search gets REALLY big, I assume the data will be read in a record of fixed length. Would checking the Last data in the record be useful? Like if you're in the library and you may have to switch rooms or floors, should you check the last book so you won't have to come back to that room later when you notice you were SO close?

  • @RixtronixLAB
    @RixtronixLAB 10 місяців тому

    Nice info, thanks :) 👍

  • @poordelir
    @poordelir 11 місяців тому

    This is how database index works.

  • @DavidAspden
    @DavidAspden Рік тому

    He knows this stuff and made mistakes and didn't prep and still made a good video. It was so funny the 16 changing to 10....

  • @ConradPino
    @ConradPino Рік тому

    Green hands arise from hurriedly drawing and moving hands across the sheet while ink is still wet.

  • @ANoBaka
    @ANoBaka Рік тому

    If you for some reason have knowledge of the distribution of the values in the collection, perhaps by calculating it while sorting it, would it be worth it to use the distribution to find the number faster by making more educated guesses?

  • @Hobbitstomper
    @Hobbitstomper Рік тому

    Phone books... something my 2 year old son will only hear about, from what he soon will refer to as, "older people".

  • @mog068
    @mog068 11 місяців тому

    Binary search goes with insertion sort, if you can afford it.

  • @mightytitan1719
    @mightytitan1719 Рік тому

    I watched your video of steganography can you make that in detail to see how algorithm works

  • @NunyaBidness-v1g
    @NunyaBidness-v1g Рік тому

    So I built this with the explanation before I found out he had actual python code in there, and I did it slightly differently. Why use floor division when you can just do a binary shift on "m"? Wouldn't a binary shift be exponentially faster considering that computers aren't really optimized for division?