Deleted scene: five words with twenty-five letters

Поділитися
Вставка
  • Опубліковано 28 гру 2024

КОМЕНТАРІ • 103

  • @ysakhno
    @ysakhno 2 роки тому +120

    Actually I would be interested to hear you explaining _every_ non-trivial line of your solution in this much detail.

    • @mnm1273
      @mnm1273 2 роки тому +4

      And it would most likely be entertaining if he had to explain in that level of detail the trivial ones too. Would be nice to see him pad the time.

  • @motmahp
    @motmahp 2 роки тому +80

    I'm surprised to find I wasn't subscribed to the main channel, but one consequence is that I watched the second video first. 😀

    • @goldenwarrior1186
      @goldenwarrior1186 2 роки тому

      Never expected to be the one to make the likes the funny number

  • @ilRosewood
    @ilRosewood 2 роки тому +33

    WAIT! Bec was in the room?! I'm stunned.

    • @bsharpmajorscale
      @bsharpmajorscale 2 роки тому +3

      False. Clearly it was advanced photography and 3D technology. Can't fool MY eyes!

  • @NigelMelanisticSmith
    @NigelMelanisticSmith 2 роки тому +10

    I'm glad to see someone names their variables in the same way I do

  • @olivier2553
    @olivier2553 2 роки тому +26

    My instinctive approach would be to use set, star with one word, try to find a second one compatible, then when trying to find the third word, only continue from the point in the word list that we already reached, because we know that previous words in the list were already deemed incompatible (having duplicate letters), etc. never go back to the beginning of the list of words. So the list should be scanned only once for each 1st word.

    • @olivier2553
      @olivier2553 2 роки тому +2

      On a second thought, that would not give all the words: suppose that words A, D, G, L and P are a set of words, the algoithm would ignore a set made of A, F, K, M and Q or A, D, G, N and R...

    • @Azirale
      @Azirale 2 роки тому +4

      I had the same approach, to use sets for each word that have all words that are still 'valid' with them. Then picking a seed word and recursively restricting the sets based on what's available. Doing it this way automatically sets out the scope for each search, and you can apply the forward-only search within that scope.
      Even better you can sort the search words at the outset by how many words would still be valid for each. Doing the forward-only search this way means the biggest set operations are left for the end, but by then you've already eliminated as many words as possible so the operations are still kept small.
      I combined that sort with the forward-only search from the graph method, and got the time down to 450s from 900s.

    • @michael1
      @michael1 2 роки тому +2

      You'd have thought the naive brute force approach would be (after throwing out all the words that have duplicate letters etc) to sort the letters in each word into alphabetical order, e.g GRAPE = AEGPR. At that point that anagrams are all the same word and should be discarded as duplicates comes straight out of your new representation.
      From there a trie is a good data structure to find words in your set that do or don't have particular letters. Obviously your first word would just be going through each word in turn. Let's say your first word was 'ABHOR' - actually happens to already be in alphabetical order. Well you search your trie starting at C (because A and B are used) and the second letter for the C word isn't going to begin CR, CH, CO, CA (not that they are in the trie in this order anyway, any word beginning CA would begin with A because you've sorted the letters. A is most definitely going to be the first letter of the first sorted-word of any solution. Similarly B, unless it's in the first word (in which case it would be the 2nd letter) will be the first letter of the second sorted-word...and so on. The point is, you don't traverse letters that you've already used and so won't need to create sets of 20 letters to compare) You should bail faster once you start to run out letters. I'm not convinced that would speed it up sufficiently though. I think you'd beat 31 days, but the graph theory is likely the best approach because you're putting the information you want to find (words that don't share letters) directly into the data structure and then presumably searching that.

    • @olivier2553
      @olivier2553 2 роки тому +1

      @@michael1 My idea, even if I did not mentioned it, was to have the letters sorted in a word, treating the words as set of letters instead of strings. In fact I understand that is how Matt did.
      And the list of words would be ordered alphabetically by their set of letters (all words containing an A in any position would be at the top of the list, the all words containing a B but no A, etc.)
      I have often bee accused that people can read inside my head. :)
      In fact, my starting position is just a read out of the graph I think.

  • @i_Hally
    @i_Hally 2 роки тому +1

    Enjoyed the main video immensely and very much appreciate every extra morsel

  • @ror3D
    @ror3D 2 роки тому +70

    Using the sets is very intuitive, but sets are slow data structures, which is probably slowing the algorithm biiig time. A very good way to do this same check that can potentially be a lot faster (I don't know in python but in C/C++ it would certainly be) is by encoding the letters in a word as a bit field. It fits all in a normal integer (32 bits) and you can just bit-wise-and the value for each pair of words, if it's 0 then they don't share letters, if different than 0 then they share one or more letters.

    • @SwordQuake2
      @SwordQuake2 2 роки тому +2

      Oooh I didn't think of that. That's a cool idea.

    • @petergibson1003
      @petergibson1003 2 роки тому +8

      I used that same method and it runs fast. Take about 23s on my laptop (using PyPy and multiprocessing) for the full search

    • @HebaruSan
      @HebaruSan 2 роки тому +2

      bitwise-and, right? Is arithmetic-and actually a term?
      But yeah, that's a brilliant way of representing this problem. The non-0 result even tells you what letter(s) they share in common. For those not impressed, bitwise-and is something your CPU can do in the shortest time of anything it can do, so you can stack a lot of these up in a second.

    • @pvic6959
      @pvic6959 2 роки тому +3

      i mean sets have O(1) lookup I thought? even in python. its a hash look up iirc. my bigger issue is SPACE. image how much memory you have on the stack/ram because of that lol

    • @aedeatia
      @aedeatia 2 роки тому +4

      @@pvic6959 Don't forget the constant term that is ignored in O(1). Bitwise operators are implemented in silicon in the ALU. I don't see how hashing can beat that in speed.

  • @NoNameAtAll2
    @NoNameAtAll2 2 роки тому +9

    instead of calculating len of union you could have tested if intersection (&) is empty

  • @lio1234234
    @lio1234234 2 роки тому +1

    Didn't get the notification for the first one, did for this one. Saw this one first, great explanation and super useful, thanks!

  • @NotHPotter
    @NotHPotter 2 роки тому +1

    Shorter video generally gets watched first, so here I am. Almost got me to pause and go back to get context from the first video, but I stuck through it.

  • @error13660
    @error13660 2 роки тому +4

    I think I have a better idea for filtering out anagrams:
    What if you associated every letter with a unique prime number (like: a-> 2 b-> 3 c->5 ...) and for every word you multiplied together the associated numbers of its letters. This way you would get a unique number for each letter combination without considering the order. So you would put this created number in a list (or a hash map for better performance) and every time you have a new word, you would just have to convert it into a number this way and check if the number exists in the list. If it does, the word is an anagram, if not it is unique.

    • @belg4mit
      @belg4mit 2 роки тому +2

      Prime multiplication covers the general case where repeats are allowed, but is overkill given that they are not. It's cheaper to just use a bit per character, which only requires 26 bits, less than a 32-bit "word" of data, whereas the product of 22nd-26th primes is 5.7 million, or 33 bits. Bit twiddling is also much faster than multiplication.

    • @Zambozoo
      @Zambozoo 2 роки тому +1

      An easier way is to use a 32bit number and give each letter a bit. Cheaper than multiplying!

    • @error13660
      @error13660 2 роки тому +1

      @@belg4mit On a modern system integer multiplication is fast enough and it also covers larger alphabets like the Hungarian. Also using bitmasks from the developer- prospective isn't necessaryly easier.
      But i like the idea

    • @error13660
      @error13660 2 роки тому +1

      But i see the huge problem with numbers larger than the Long limit. So possible solutions: Long[] as a large bitmask and some function to set its bits abstracting the array. Or just use a longer boolean array, that shouldn't be so bad for erformance. But in both cases we have the problem of the equality checks.

  • @grapetoad6595
    @grapetoad6595 2 роки тому +2

    First watched this, now to get some context. For some reason YT only recommended this one and not the main

  • @TheCollapsedPsi
    @TheCollapsedPsi 2 роки тому

    Oh no. I started watching this one before the first channel, but I realized it was for a video I hadn't watched so I left to go watch the first channel.
    I can't believe I missed the chance to be called a maverick.

  • @Raptremrum
    @Raptremrum 2 роки тому +16

    I watched this first, was very confused what was going on, so I watched the main channel, then went back after I understood the problem and noticed the modified ending. Do you always do that or is it new?

    • @mythicgamer6291
      @mythicgamer6291 2 роки тому +2

      I think it's new or he been doing for a while

  • @GeorgeFoot
    @GeorgeFoot 2 роки тому +1

    I would have used a 32-bit integer as a bitfield to represent a set of letters. Then bitwise OR to combine them, AND to test intersections, etc. It should be very fast even in python.

    • @Corrup7ioN
      @Corrup7ioN 2 роки тому +1

      Based on the latest main channel video, it looks like you win

  • @Sparky5869
    @Sparky5869 2 роки тому

    I watched second video first because I went through my subscriptions, adding everything to my "watch later" list from top to bottom, since I was on mobile, and that is reverse order :P
    It's usually fine bc most channels only upload one video a day and they're rarely related to their previous video

  • @svenwouters9547
    @svenwouters9547 2 роки тому

    I like you too! This popped up in my recommended without the original so now I'm gonna watch the other video.

  • @traywor
    @traywor 2 роки тому +1

    Whaat? I was soo ssure, that it were actual pictures and I was just imagining the little movement. Also now, the perfectly timed giggles, no longer look like magic acting skills anymore. So S.A.D.

  • @juneguts
    @juneguts 2 роки тому +3

    But have you found any words with 25 unique letters? What is the longest string of unique letters that comprises a valid word?

  • @yahccs1
    @yahccs1 2 роки тому

    I wonder if the UA-cam algorithm is clever enough to not suggest videos that should be watched after something we have not already watched. Fortunately your 2nd channel one popped up after I'd seen the other, but for those who see the videos out of order... perhaps UA-cam suggested them out of order, or perhaps the viewers deliberately chose to watch part 2 first.
    Sometimes if a video pops up that has a part or episode nu,mber on, I usually hunt for previous videos in the series so I can get the whole story - unless they are stand-alone episodes when it doesn't matter.
    My Mum admitted she liked to read the back pages of a book first - perhaps lots of people do! I did have a look at the back as well, but after reading a few chapters and then having a look at the contents list and noticing something interesting was at the back!
    Don't you mean "And if you've not watched this..." in the description?

  • @ZipplyZane
    @ZipplyZane 2 роки тому +2

    The filter you put when you pause the video looks exactly like the chromatic aberration I see in my glasses if I look off center to the left. So I can look off center to the right instead, I can compensate for it.
    For a bit I thought the chromatic aberration was getting worse and was concerned.

    • @malbacato91
      @malbacato91 2 роки тому

      oh wow! that's very cool! thx for making me try that!

  • @Corrup7ioN
    @Corrup7ioN 2 роки тому

    "don't send me ways I could've improved it"

  • @Vincent-kl9jy
    @Vincent-kl9jy 2 роки тому

    No wonder it took a month to run lol, set operations are typically inefficient because they have to do that uniqueness test on every op

  • @thetranberries2855
    @thetranberries2855 2 роки тому

    I got the notification for this video and not the main channel so started this one and went wait a minute 😅

  • @graemewells3716
    @graemewells3716 2 роки тому

    This is deffo something that would bog down the story. But its great to hear details of how the code works for those that are interested

  • @D33r_Hunt3r_
    @D33r_Hunt3r_ 2 роки тому +5

    I don't regret asking, but I am still nonetheless confused about everything you just said.

  • @JudithOpdebeeck
    @JudithOpdebeeck 2 роки тому +2

    i'm just amazed you can explain code in under three minutes

  • @Hinyousha
    @Hinyousha 2 роки тому

    I got the notification for the second channel video first, realised that I haven't watched what you were toalking about and go to the first channel the

  • @AndyLundell
    @AndyLundell 2 роки тому +5

    Bec Hill was in the studio? You should have brought her on-stage instead of using the goofy photographs!

    • @KernelLeak
      @KernelLeak 2 роки тому

      Ah yes, Matt's "Bec To The Studio Tour"...
      No, wait - that was Jonathan Pie.

  • @matthewellisor5835
    @matthewellisor5835 2 роки тому

    Alright then, on to the main channel.

  • @DanielDugovic
    @DanielDugovic 2 роки тому

    Interesting, I hadn't considered it's possible to determine the intersection count by measuring the union count.

    • @Wouter10123
      @Wouter10123 2 роки тому +1

      It's not very efficient though, the union operator does a lot of work in the background, which includes checking for duplicates.

  • @SuperYoonHo
    @SuperYoonHo 2 роки тому +1

    awesome!

  • @KerryHallPhD
    @KerryHallPhD 2 роки тому +1

    loved bec in the video. all excellent. keep it up 😋

  • @LeeSmith-cf1vo
    @LeeSmith-cf1vo 2 роки тому

    Are those freezeframes supposed to be anaglyphs? I couldn't make it work with my red/cyan galsses

  • @AdriLeemput
    @AdriLeemput 2 роки тому

    Guess I'm not the only one watching the second channel first.
    Happens when you go alphabetically

  • @werefenrir1
    @werefenrir1 2 роки тому +2

    AS AN ENGINEER I WOULDA PROBABLY USED THE UNION METHOD TOO. ( I AM ON A CHROMEBOX AND CANNOT FIGURE OUT HOW TO GET RID OF CAPSLOCK SINCE I REMOVED MY KEYBOARD'S CAPSLOCK BUTTON AND DONT HAVE ANYTHING WITHIN REACH TO STICK IN THE HOLE)

    • @DanielSDiggs
      @DanielSDiggs 2 роки тому

      there is a virtual keyboard you can use built into windows

    • @werefenrir1
      @werefenrir1 2 роки тому

      @@DanielSDiggs my chromebox runs chrome OS

  • @AA-il9pc
    @AA-il9pc 2 роки тому

    Give_it_a_try is a rough variable name.

  • @hurktang
    @hurktang 2 роки тому

    "don't send me ways i could have improved it"
    Too bad he edited that bit out of the video XD.

  • @pyglik2296
    @pyglik2296 2 роки тому +1

    I know you said to don't tell you how to improve it (even though someone already improved it), but I can't help but wonder, why not just keep looking for the next words in the same way?
    First you take a word from scanA and check if it has no letter in common with a word from scanB. If not, you take word from scanC, which starts after the scanB, and so on, until you get to scanE.

  • @patrick.gilmore
    @patrick.gilmore 2 роки тому

    I watched this one first, because I do "FILO" on my YT subscription page. 🙂

  • @eamonnsiocain6454
    @eamonnsiocain6454 2 роки тому

    This is fascinating and, no, I’m not made of cloth! LOL!

  • @krzysztofmazurkiewicz5270
    @krzysztofmazurkiewicz5270 2 роки тому +1

    So the code is posted on GitHub you say... Interesting...

  • @muriloporfirio7853
    @muriloporfirio7853 2 роки тому

    You said it could be optimised, but I believe it's quite good already (at least better than what I would have thought of).

  • @mattcoulter7
    @mattcoulter7 2 роки тому

    I am a loyal second channel first enjoyer

  • @garrettsmith9788
    @garrettsmith9788 2 роки тому

    I regret watching this. ;-)

  • @linkVIII
    @linkVIII 2 роки тому

    But last time I did 2nd first you said it was wrong!

  • @johnchessant3012
    @johnchessant3012 2 роки тому +1

    Hi

  • @Axman6
    @Axman6 2 роки тому +2

    Imagine if there was a way to talk about the sorts of things you have in a program, more than just objects, but… the type of the things, and how one type of thing is different from another type of thing… some kind of “type system” perhaps.
    Seriously Matt, I will personally teach you Haskell so you can maths while you program and program while you maths. I’ll even do it in person next time you’re in 🇦🇺

  • @veggiet2009
    @veggiet2009 2 роки тому

    I like code 👍😁

  • @user-nj1qc7uc9c
    @user-nj1qc7uc9c 2 роки тому

    matt, i think you couldve improved it!

  • @carnsoaks1
    @carnsoaks1 2 роки тому

    Son of Maverick. ie Imagineverick.

  • @danieltaber4924
    @danieltaber4924 2 роки тому

    I did watch this one first :)

  • @techiehelper1114
    @techiehelper1114 2 роки тому

    C programmers be crying

  • @michaelsommers2356
    @michaelsommers2356 2 роки тому +1

    Where are the five twenty-five-letter words the title promised?

    • @LordJazzly
      @LordJazzly 2 роки тому

      Well - if you re-arrange it to 'Words: five, with twenty-five letters', then everything after the colon is self-descriptive. Otherwise, no clue.

  • @ianlarm1588
    @ianlarm1588 2 роки тому +2

    wow im a maverick lol

  • @wobblysauce
    @wobblysauce 2 роки тому

    2nd channel first

  • @Tsnoeijs
    @Tsnoeijs 2 роки тому +1

    21st

    • @Axman6
      @Axman6 2 роки тому

      UA-cam thinks this comment needs to be translated into English 🤔

  • @jessetrevena4338
    @jessetrevena4338 2 роки тому +1

    You are good at coding Matt.