5. Library Complexity and Short Read Alignment (Mapping)

Поділитися
Вставка
  • Опубліковано 19 січ 2015
  • MIT 7.91J Foundations of Computational and Systems Biology, Spring 2014
    View the complete course: ocw.mit.edu/7-91JS14
    Instructor: David Gifford
    Prof. Gifford talks about library complexity as it relates to genome sequencing. He explains how to create a full-text minute-size (FM) index, which involves a Burrows-Wheeler transform (BWT). He ends with how to deal with the problem of mismatching.
    License: Creative Commons BY-NC-SA
    More information at ocw.mit.edu/terms
    More courses at ocw.mit.edu

КОМЕНТАРІ • 15

  • @sfmambero
    @sfmambero 4 роки тому +5

    17:00 FM index BWT transform

  • @abail7010
    @abail7010 4 роки тому +2

    Thank you very much for this course!
    I wish my profs. at university would be only a bit as good as this explanation!

  • @mrm259
    @mrm259 6 років тому +1

    Thank you so much for your useful course :)

  • @mrm259
    @mrm259 6 років тому +1

    Thanks for this useful video. I have a question: in a real DNA reference Occ(a) will always be 1 ?

  • @user-bo1bb2cg5g
    @user-bo1bb2cg5g 6 років тому

    In estimating the library complexity part with a poisson model, each molecule observed can give us a lambda, so can anyone please tell me how we can determine which lambda is the best one? It seems that we can use maximum likelihood, but how can we carry out it in this case?

  • @user-bo1bb2cg5g
    @user-bo1bb2cg5g 6 років тому

    My another naive question is why we need a conditional probability under the condition of Lambda now that we assume the process follows poisson sampling?

  • @wimthiels638
    @wimthiels638 7 років тому +1

    for people we had trouble following the BW-transform algorithm (like myself) => this is another take on it (and clearer, for me at least) :
    ua-cam.com/video/kvVGj5V65io/v-deo.html

  • @omriadini3692
    @omriadini3692 9 років тому

    shouldn't C1 follow a3 on the transform string?

    • @TommyCarstensen
      @TommyCarstensen 7 років тому

      No, a3 is followed by c2 in the last row of the matrix.

  • @helgegrelck2394
    @helgegrelck2394 7 років тому +2

    This is a great course! ... except the dear professor Gifford says, he only needs the last column and not the first column in the BWT-matrix, although he uses the first column the whole time.
    What he should do, is to only use the last column, and count from the top, how many letters are before the letter in question. He is confusing the whole class! And this is MIT??? Sad!

    • @lucasmiranda3665
      @lucasmiranda3665 6 років тому

      But can't the first column be reconstructed just ordering the characters alphabetically?

    • @yifanzhang6895
      @yifanzhang6895 6 років тому

      Thx a lot! I was initially confused, until your comment notified me.
      His originally slides are actually correct, but his demonstration on board is not
      What should be LF(6,c) is this
      There are 4 bases in the BWT(T) that is before 'c' alphabetically, that is three a and one $, so occ(c) is 4
      There are 1 c that is before the 6 position in the BWT(T), that is the c on the 2nd position. So Count(6,c) is 1
      Therefore LF(6,c) = 4+1 = 5.
      This means the position 6 C in the BWT(T) is at position 5 in the first column
      Do it for every base in the BWT(T) and you can reconstruct the first column with only BWT(T)

    • @joshfry9237
      @joshfry9237 3 роки тому

      When he said "we don't need to use the first column" he was technically correct but misleading. He's technically correct because if you are given the BWT(T) string, *and the Occ(*) and Count(*,*) functions*, then we can do the rank matching. In pre-processing the reference genome, the BWT matrix along with the Occ(*) and Count(*,*) functions are computed and supplied, so all that *we* need for the input of the algorithm is the BWT(T) string and the two functions, not the first column. He's misleading because in explaining *how to do the rank matching*, he's computing on Occ(*) on the board, which does require knowledge of the first column. Our algorithm does not compute Occ(*), however.
      That's why the algorithm isn't conjuring something from nothing (which is what I think people were confused about). All the compression happens in the sorting of BWT plus the computing of Occ(*) and Count(*,*)

    • @joshfry9237
      @joshfry9237 3 роки тому

      True, but unnecessary since the relevant information of the first column is precomputed for you in the Occ(*) function which is supplied along with BWT(T)

  • @yqh1994
    @yqh1994 6 років тому +3

    eric needs to shush up