2. Local Alignment (BLAST) and Statistics

Поділитися
Вставка
  • Опубліковано 19 січ 2015
  • MIT 7.91J Foundations of Computational and Systems Biology, Spring 2014
    View the complete course: ocw.mit.edu/7-91JS14
    Instructor: Christopher Burge
    In this lecture, Professor Burge reviews classical and next-generation sequencing. He then introduces local alignment (BLAST) and some of the associated statistics.
    License: Creative Commons BY-NC-SA
    More information at ocw.mit.edu/terms
    More courses at ocw.mit.edu

КОМЕНТАРІ • 45

  • @sfmambero
    @sfmambero 4 роки тому +9

    56:45 statistics of alignment

  • @NirvanSengupta
    @NirvanSengupta 9 років тому +1

    Good overview of traditional Sanger sequencing and next generation sequencing technologies like roche 454, illumina, ...

  • @coconutade111
    @coconutade111 8 років тому +5

    You guys are awesome! thank you for providing this class!

  • @EDUARDO12348
    @EDUARDO12348 8 років тому

    Break the query into 2-3 letters and find where the best matches lie. Then evaluate based on best fits

  • @DeanJordan-qp8kw
    @DeanJordan-qp8kw Місяць тому

    Timestamp: 33:01
    Goal by End of Session: 33:01
    edit: complete! :) coming back in one hour to finish the lecture

  • @hgz11
    @hgz11 Рік тому +2

    good question by the young lady...helped me clarify things in my head.

  • @phartatmisassa5035
    @phartatmisassa5035 9 років тому +4

    Thank you!

  • @sfmambero
    @sfmambero 4 роки тому +3

    31:30 Local alignment

  • @MrAntihumanism
    @MrAntihumanism 9 років тому +2

    Very good to listen to.

  • @michaelchoi5880
    @michaelchoi5880 6 років тому +1

    Thank you very much

  • @ProfBeckmann
    @ProfBeckmann 3 роки тому +3

    Thanks! Awesome lecture. Nice to see the dideoxy ntp structures shown - super helpful!
    I would like to hear more about the errors prone to each methodology. would also be nice to see a read length table for each technology.
    Can you make a video explaining how to setup and sort the barcoding reactions referenced? How long are the barcodes?

  • @soheilh
    @soheilh 3 роки тому +2

    Thank you

  • @luisheribertogarciaislas1785
    @luisheribertogarciaislas1785 6 років тому +3

    Thank you for sharing! I'm new on bioinformatics and I find it really interesting!

    • @JRush374
      @JRush374 6 років тому +1

      Luis Heriberto Garcia Islas I just got done with this bioinformatics lecture series from UC Davis and it was really good.
      Fundamental Algorithms in Bioinformatics: ua-cam.com/play/PL_w_qWAQZtAbh8bHfzXYpdnVKCGCDmmoL.html

    • @luisheribertogarciaislas1785
      @luisheribertogarciaislas1785 6 років тому

      Thank you Josh!
      I'll start to take a look at it right away!
      Regards!

  • @rabiafidan1502
    @rabiafidan1502 4 роки тому +1

    In llumina sequencing, we add four dNTPs and one of them is attached then we reverse termination and add four dNTPs again. Think about a situation where there are C repeats for example CCCCACCCTCCCCCGCCCCCG now, in this situation concentration of dGTPs will be lower than other dNTPs. Doesn't this increase the risk of error on Cs and misincorporation? each time you need dGTP, its concentration is lower.

  • @Yzhang250
    @Yzhang250 5 років тому +1

    When the lady asked can we do something better than m times n. I was thinking about KMP. I know KMP is for exact match, but can we make some modification and make it work?

  • @sumitkumar-el3kc
    @sumitkumar-el3kc 4 роки тому

    1:01:39 does Pi and Rj being 1/4 means we're assuming there's the equal distribution of all four bases in query as well as subject?

  • @phartatmisassa5035
    @phartatmisassa5035 9 років тому +23

    I was thinking about ways to optimize the matching algorithm in assembly, and this is what I came up with.
    Let Mx, Nx, My, and Ny be CPU registers of 64bits each.
    Recall base pairs A-T, G-C, and (A,G) are purines, and (T,C) are pyrimidines.
    Let registers Mx and Nx represent whether the base at bit location i is an A-T pair as 0, or a G-C pair as 1
    Let registers My and Ny represent whether the base at bit location i is a purine (A or G) as 0, or a pyrimidine (T or G) as 1
    We can now represent a base at bits i of registers Mx and My, (ie both regs Mx and My are required to represent a sequence, the N registers represent another sequence we are comparing to):
    Let sequence M be: ACTCCGAT... (just to keep things short)
    Reg Mx would equal: 01011100...
    Reg My would equal: 01111001...
    I hope that makes sense, if not just ask and I will try to clarify
    Let sequence N be: GCTAAGAT...
    Reg Nx would equal: 11000100...
    Reg Ny would equal: 01100001...
    Now the fun part, you can compare a sequence (or sub-sequence) of 64 bases at once with these instructions (MMX might make things faster if you can do bitwise intructions with them, haven't got there yet in my studies):
    This is the formula I worked out to get the matches in a sequence of 1's for match, 0 for mismatch at that base/bit position: (~ is not, ^ is xor, | is or)
    ~ ( (Mx ^ Nx) | ( My ^ Ny) )
    that would be...
    -----------
    xor r8, r9 ;; 1 at bit i if miss-match, r8 = Sx = Mx ^ Nx
    xor r10, r11;; 1 at bit i if miss-match, r10 = Sy = My ^ Ny
    or r8, r10 ;; 1 at bit i if miss-match in either Sx, or Sy, or both
    not rax, r8 ;; rax now has 1 in bit i if there was a match at bit/base i
    ------------
    ...in pseudo x64 assembly
    if SSE4.2 is available on the CPU, then the popcnt instruction is available, it returns the number of 1's in a register, something like:
    ------------
    popcnt rbx, rax ;; would put the number of matches in Seq M and N
    -------------
    So you can then divide the number of matches (in rbx) divided by 64 times 100 to get the percentage of matching bases in Seq M and Seq N, you can also do bit scanning or masking to evaluate exactly which bases matched, that is represented in rax already.
    You can "clap" together 64 bases at once using this method, though it would take some preprocessing of a sequence represented as chars (bytes in ASCII). You can also speed it up by parallelizing over many sub-sequences on different cores, or maybe use GPGPU with OpenGL/CL.
    The same thing can be done in C, just use unsigned long integers, or uint64_t, don't use signed, it get confusing and the two's compliment will most probably screw things up.

    • @5050view
      @5050view 7 років тому

      @Rohan Patgiri: What's your problem dude? People are here to learn and discuss the lecture. You are not in the gossip section!

    • @ProfBeckmann
      @ProfBeckmann 3 роки тому

      Is it much faster, coding it in assembly?

    • @Mahi-nw5vh
      @Mahi-nw5vh Рік тому

      Be my sensei

  • @kishorkumarsarker8270
    @kishorkumarsarker8270 4 роки тому +3

    Could you please help to find out previous lecture of it. Means the first lecture.

    • @Ilya_Sss
      @Ilya_Sss 3 роки тому

      Here are the first lecture ua-cam.com/video/lJzybEXmIj0/v-deo.html

  • @AhmedIbrahim-co9rw
    @AhmedIbrahim-co9rw 5 років тому

    what book did the professor use in his lectures ?
    are there any textbook or slides available to facilitate studying ? thanks

    • @mitocw
      @mitocw  5 років тому +1

      Visit the course on MIT OpenCourseWare to see all the materials (readings, lectures notes, assignments, projects) available for the course at: ocw.mit.edu/7-91JS14. Best wishes on your studies!

  • @coltonhenry3636
    @coltonhenry3636 8 місяців тому

    The girl from 43:14 with her hand up is so f☠️ing relatable 😂😂

  • @maheribrahim685
    @maheribrahim685 2 роки тому

    i'm a biology major with no cs background and very little stat and math knowledge, is it okay if i get nothing at all, and how to change that?!

  • @sumitkumar-el3kc
    @sumitkumar-el3kc 4 роки тому +1

    How is it O(mn) when the last bases of the query are never touching the first bases of the subject? Am I missing something??

    • @sumitkumar-el3kc
      @sumitkumar-el3kc 4 роки тому +1

      Or do we also go right to left in the alignment after left to right? Please help anyone??

    • @turnerburchard5272
      @turnerburchard5272 4 роки тому +3

      @@sumitkumar-el3kc O notation isn't that precise. You could say it is something like O(mn-n) but at that point the difference is inconsequential so we estimate it simply to O(mn). In practice, the algorithm won't take exactly mn computations, but it will at least be on the same scale so estimation is easy.

    • @sumitkumar-el3kc
      @sumitkumar-el3kc 4 роки тому

      @@turnerburchard5272 understood, thank you, I'm new to these concepts.

  • @shubhrabhattacharya223
    @shubhrabhattacharya223 6 років тому

    Where can I get the pdf of the book he referred to?

    • @mitocw
      @mitocw  6 років тому

      Visit the course on MIT OpenCourseWare for the materials at: ocw.mit.edu/7-91JS14.

  • @marwanmohamed6575
    @marwanmohamed6575 5 років тому +3

    what kind of statistics i should learn to understand this ... i know nothing about statistics

    • @mitocw
      @mitocw  5 років тому +5

      One of these two prerequisites for this course might help: 18.440 Probability and Random Variables, 6.041 Probabilistic Systems Analysis and Applied Probability. See the course on MIT OpenCourseWare for more details: ocw.mit.edu/7-91JS14. Best wishes on your studies!

  • @bikidas6690
    @bikidas6690 3 роки тому +2

    Any book for bioinformatics?

    • @Joeythegoats
      @Joeythegoats 3 роки тому

      "bioinformatics and functional genomics" is best so far

  • @gurpchirp
    @gurpchirp 2 роки тому

    yo i am fucking lost. can someone list prereqs?

  • @fahlihamza7937
    @fahlihamza7937 8 років тому

    address address searched sddcdse the same time as the

  • @ZortLF2
    @ZortLF2 6 років тому +6

    What a dismally bad lecture. Blasted through the biology stuff dropping jargon in the first lecture of the semester, wasted time splitting hairs for the algorithm stuff. Why spend 10 minutes talking about lambda just to conclude with saying it's not important???

  • @solomongrundy2211
    @solomongrundy2211 4 роки тому +1

    im a nigga

  • @tartanhandbag
    @tartanhandbag 7 років тому +17

    while i'm sure this presenter has great knowledge, his presentation style is so dull and dry that even though the material is mind blowing, i have once again failed to make it all the way through the video on my 3rd attempt.

    • @ProfBeckmann
      @ProfBeckmann 3 роки тому

      just up the speed of playback bruh. 😎