Advanced Data Structures: Inverting the BWT

Поділитися
Вставка
  • Опубліковано 11 жов 2024

КОМЕНТАРІ • 17

  • @arpitmathur2933
    @arpitmathur2933 3 роки тому +6

    superb explanation. Just started my masters in Computational biology. Thanks

  • @Cat-ct9hn
    @Cat-ct9hn Рік тому +1

    Thank you very much for your video! I'm taking a bioinformatics course and this is the clearest explanation I've come across so far, excellent material.

    • @niemasd
      @niemasd  Рік тому +1

      Thank you for the kind words! Glad you're enjoying it 😄

  • @YaBoiMoustafa
    @YaBoiMoustafa Рік тому +2

    I’m studying for a bioinformatics exam and reversing the BWT is almost guaranteed to be on the final. I was scared of the question but this video made it super simple. I’m hoping this actually works with any string because if it does then that’s amazing!!!! Thank you

    • @YaBoiMoustafa
      @YaBoiMoustafa Рік тому +2

      Just tested it with another string, it worked! I’m surprised it’s so easy. Thank you!

    • @niemasd
      @niemasd  Рік тому +1

      @@YaBoiMoustafa No problem, glad the video helped! Good luck on the exam 😄

  • @niemasd
    @niemasd  Рік тому +3

    Someone had asked how we can possibly sort the Last column in linear time to get the First column, but I can't seem to find the comment anymore. Here's the explanation:
    All you need to do is keep track of the counts of each letter in your alphabet, which you can do in linear time. Here's a Python implementation, which would be O(n + length of alphabet), and given that n >> "length of alphabet" (not in this simple example, but in reality), it reduces to O(n):
    pastebin.com/DMrL3fKU
    Note that ord(c) converts a character to its ASCII integer, and chr(i) converts an ASCII integer to the corresponding character. Here's a link that lets you step through this code in the browser:
    pythontutor.com/render.html#code=bwt%20%3D%20%22ANNB%24AA%22%0Acount%20%3D%20%5B0%20for%20_%20in%20range%28256%29%5D%0Afor%20c%20in%20bwt%3A%0A%20%20%20%20count%5Bord%28c%29%5D%20%2B%3D%201%0Afor%20i%20in%20range%28256%29%3A%0A%20%20%20%20for%20_%20in%20range%28count%5Bi%5D%29%3A%0A%20%20%20%20%20%20%20%20print%28chr%28i%29%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false

  • @RLLLx
    @RLLLx 6 місяців тому +2

    This is impressive! How do people come up with this kind of algorithm?!

    • @onescYT
      @onescYT 3 місяці тому

      right? we are standing on the shoulders of giants

  • @Xn_Fdez
    @Xn_Fdez 4 місяці тому

    Gracias por el video máquina! Muy buena explicación.

  • @mlemImlem
    @mlemImlem Місяць тому

    very good explanation thank you for teaching this

  • @julianbiju1761
    @julianbiju1761 10 місяців тому +1

    Question:
    Thank you for making this video, it really has helped!
    A quick question: As you have subscripted the numbers, we can identify that B_1 is followed by A_3. However, programmatically, the burrows-wheeler encoding given to us will not have these subscripts(I.e. perhaps we receive an encoding ABBBKAAAAAA where we can't distinguish between any of the duplicate characters). Given your example, assuming we observe the first column corresponding to B_1 we see A_3, we will in reality have no idea whether that is A_1, A_2, or A_3. How do we reconcile this in our code?

    • @niemasd
      @niemasd  10 місяців тому +1

      No problem; I'm glad you enjoyed the video! Great question! You just go down the first and the last column and subscript each element like I do at time 1:52. I didn't go into it in this video, but the i-th occurrence of symbol x in the first column *is exactly* the i-th occurrence of symbol x in the last column. So even though the BWT you're given doesn't have the subscripted numbers, you just fill them in yourself (as I do in the last column in the video, which *is* the BWT)

  • @nottofind
    @nottofind 6 місяців тому

    What is the DOI of the original Burrows Wheeler Transformation Paper? I can't find it :/
    Great explanation though!

    • @niemasd
      @niemasd  6 місяців тому

      Great question! BWT was originally not intended for this task (it was originally intended for data compression), and the original BWT paper can be found here: www.eecs.harvard.edu/~michaelm/CS222/burrows-wheeler.pdf
      *To my knowledge*, this is the first paper that applies BWT to the "match a bunch of short strings to a single long string" problem in Bioinformatics: doi.org/10.1186%2Fgb-2009-10-3-r25
      Also, *to my knowledge*, this is the first paper that applies BWT to genomic data: doi.org/10.1089/cmb.2005.12.943

  • @kaushalrao101
    @kaushalrao101 3 роки тому +1

    very helpful, thanks!

  • @gedgod111
    @gedgod111 3 роки тому

    Thank you very much