STAT115 Chapter 3.6 SAM and BAM files

Поділитися
Вставка
  • Опубліковано 27 сер 2024

КОМЕНТАРІ • 11

  • @elliottkillian
    @elliottkillian 4 роки тому +1

    Great info!

  • @c.p.8689
    @c.p.8689 Рік тому

    Thank you! You are great!

  • @bhayj
    @bhayj 3 роки тому

    thanks a lot! very informative and fast explanation

  • @tinacole1450
    @tinacole1450 2 роки тому

    Like your explanation. Do you know how to annotate a sam file? Can you send a link?

  • @rezomgeladze5750
    @rezomgeladze5750 2 роки тому

    It will be great if you add links for appropriate information in comments. Thanks!

  • @haroonzeb7087
    @haroonzeb7087 3 роки тому

    hi , how to create .bam and .bai files

    • @loganchen7889
      @loganchen7889 3 роки тому +1

      Usually, aligners will create the '.bam' for you, as well as the '.bai' file. BWA as an example, that only generate 'sam' file, which you can use the SAMTOOLS to covert it to the '.bam' file, also used the SAMTOOLS to create the '.bai' file.
      bwa xxxxxx (map command) | samtools view -BST - -o xxxx.bam
      then,
      samtools index xxxx.bam
      Hope this helps.

    • @haroonzeb7087
      @haroonzeb7087 3 роки тому

      @@loganchen7889 thanks my 2nd question is how to create vcf file from fast or fastq files .is it necessary to first go through bcf tools or direct way to create vcf file or is it . mandatory first to have bcf file then vcf . please elaborate the answer with syntax and example

    • @loganchen7889
      @loganchen7889 3 роки тому +1

      @@haroonzeb7087 I think what you have encountered is about variants calling. The VCF format was used to store the variants information, including contig (chromosome), location, reference, alternative base, and other related information. The simplest way to get vcf file from the raw fastq/fasta file should include two processes. 1. Mapping: align the sequences in the fasta/fastq file to the genome. 2. Variants calling: use a variant calling algorithm, deepvariant (mentioned by Shirely), GATK (widely used) to call the variants. The default output format is VCF. What you mentioned, bcf, is a binary format of VCF, if I remember correctly. Maybe, the examples of this process will be presented in the further videos, I am not sure, as I am also an audience of the course. Hope this message helps.

    • @haroonzeb7087
      @haroonzeb7087 3 роки тому

      @@loganchen7889 absolutely i know VCF .but how to create VCF file is it necessary for the creation only BCF tools or BCF syntax is used as mandatory .and could you shed light on if VCF is created via bcf tools then or any other yours recommendationfor the creation of VCF file
      thanks in advance

    • @loganchen7889
      @loganchen7889 3 роки тому +1

      @@haroonzeb7087 I am not sure if I understand correctly. I think many other tools, other than bcftools, which you mentioned, gatk, deepvariant, which I mentioned before, could create vcf/bcf. "The relationship between BCF and VCF is similar to that between BAM and SAM." (evomics.org/vcf-and-bcf/). I am not familiar with the bcftools, I don't know if anybody still used it to call variants. There are best-practice pipelines for GATK on both somatic and germline variants calling, you can refer (gatk.broadinstitute.org/hc/en-us/articles/360035894731-Somatic-short-variant-discovery-SNVs-Indels-) and (gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-).