Merging genotype data with PLINK

Поділитися
Вставка
  • Опубліковано 27 сер 2024
  • The video will show you how to merge #SNP #genomic data using the #PLINK software.
    The script from the video is available at: pastebin.com/x...
    #genetics #genomics

КОМЕНТАРІ • 30

  • @georgewanjala4605
    @georgewanjala4605 2 роки тому +2

    SO professor, when merging and you receive multiple warnings e.g. Warning: Multiple chromosomes seen for variant 'OAR13_201555.1'. and the files are finally merged without an error message, does it mean merging was successful regardless warnings?

    • @GenomicsBootCamp
      @GenomicsBootCamp  2 роки тому +1

      The merging was successful, but you need to check if the result is according to the expectations.
      This usually happens when the two map files are different, e.g. from different chip versions.
      Check a few SNP names that appear in the Warning message in both of the initial map files and in the resulting map file after the merge. See if the result is as you expect. Normally you want the newer map file positions in the merged files.
      If I remember correctly, the map positions from the files appearing in the --merge option are overwritten by the map positions from the data specified in the --file or --bfile options. So if you find the older map file in the merge result, you can also swap them. You still get the Warnings, but now with the desired outcome.

  • @rasakilaide6904
    @rasakilaide6904 3 роки тому +2

    Thank you, sir

  • @georgewanjala4605
    @georgewanjala4605 3 роки тому +2

    Your tutorials have really opened my eyes, and I am yearning for more and more.
    Could you please also find it convenient to share how to install and use Admixture and Eigensoft software in windows interfaced with Ubuntu.

    • @GenomicsBootCamp
      @GenomicsBootCamp  3 роки тому

      With Eigensoft I have no experince, so I don't think it will appear in the near future. With Admixture there is some experience, so I might put it up at some point.

    • @georgewanjala4605
      @georgewanjala4605 3 роки тому

      @@GenomicsBootCamp Thanks alot sir. I am eargely waiting.

  • @DarkFuneralD
    @DarkFuneralD Рік тому +1

    Hello Prof. Gabor, thank you very much for this PLINK series. I am actually trying to merge multiple datasets (multiple human populations) genotyped with different genotyping arrays. I get multiple warnings and error saying that Multiple chromosomes/positions seen for variant XYZ. I tried the --flip command as suggested, however it doesn't solve the problem so I excluded all the snps causing the error and was able to merge the files. Yet I ended up with a genotyping rate of 0.2 which is very low. Do you have any suggestion on how to merge all the datasets and keep the maximum number of snp and a high genotyping rate? Thank you!!

    • @GenomicsBootCamp
      @GenomicsBootCamp  Рік тому

      Hi,
      I assume the SNP names are the same in all map files, and this is the basis of the merge.
      There could be two sources of warnings/errors:
      1) if the data sets were genotyped throughout a longer time interval, it is probable that different reference genomes were used. Thus the SNP coordinates (chromosome, position) are different on these chips. The solution here is to update all chromosomes and positions of all SNP sets to the most recent one, and perform the merge. You can use the map coordinates of the most recent SNP set ou have for these coordinates.
      2) The second source of problem could be that in some genotyping platforms for the same SNPs there is the so called TOP and FORWARD coding (I believe in Illumina chips). In this case, even with the same SNP name there could be 3+ alleles when merging datasets. Typically manifests with a message in PLINK mentioning 3+ alleles, and the least frequent set to missing. Here the --flip option does not solve the issue. The solution is to find the conversion table from the SNP manufacturer and use the --update-alleles option to harmonize the data sets.
      www.cog-genomics.org/plink/1.9/data#update_map
      Did any of these solve your issue?

  • @georgewanjala4605
    @georgewanjala4605 Рік тому +1

    Dear Professor, Happy new year, is it possible to merge two different density SNP data sets? e.g. 50K with 100K?

    • @GenomicsBootCamp
      @GenomicsBootCamp  Рік тому +1

      yes, it is possible. The ones with the same SNP names will merge, might be warnings if there are differences in chromosomes and position. The SNPs not appearing on other chip will be set to missing

  • @prabhuyogi1786
    @prabhuyogi1786 5 місяців тому

    Hi sir, I need to merge 22 plink binary files into one sir, kindly advice
    Thanks in advance

  • @lucasf.c.y.dossoukpongan4684
    @lucasf.c.y.dossoukpongan4684 2 роки тому +1

    Sorry professor, I have tried to do as in the video but i got error: Ambiguous sex IDs written to breed 1.nosex, errors: failed to open breed1_SNP.txt., please can you help me to understand what i did wrong

    • @GenomicsBootCamp
      @GenomicsBootCamp  Рік тому

      Hi,
      Probably you do not have this file in your working directory. It is not automatically created, I just selected some random SNPs for the purposes of this example and saved them in this file.
      For you it could be different SNPs or even different file name, the only thing that matters for this example is the principle how you do this.

  • @usfbge
    @usfbge Рік тому

    Hello Dr. Genomics, I have obtained SNP data from three biological replications of two different plant varieties, parent1 and parent2, which are involved in a genetic mapping cross. Each parent was replicated three times. Now, I am looking for a suitable software or approach to generate consensus SNP data for parent1 and parent2 based on their respective biological replications. This consensus SNP data will be crucial for downstream linkage mapping analysis. Any suggestions or recommendations on software or methodologies for this purpose would be highly appreciated.

  • @mervek7560
    @mervek7560 2 роки тому +1

    These videos are very much appreciated Gábor, thank you!
    As I had 2 DNA batches from different time points I had to merge my SNP data using PLINK. Performing PCA I found out that my control samples positioned far from each other - indicating the batch effect. How can I remove the batch effect? Do you know any R packages/PLINK commands to solve this issue?
    Cheers,
    Merve

    • @GenomicsBootCamp
      @GenomicsBootCamp  2 роки тому

      Hi, I am not sure about the question. I do not think tweaking the data to get a more cohesive group on a PCA is a good idea. Again, I am not sure if this was the question. The samples are different for a reason, which could be some kind of population structure, or simply that e.g. one group has a TOP and the other the FORWARD nucleotide coding. In the PLINK merging step did you get any strange messages mentioning 3 alleles?
      I don't know if it is a possibility to keep only one of the groups. If not, you also might consider dealing with this in the follow-up analysis, e.g. by correcting for his effect in your model, if the method you use allows for this.

  • @phillipinesithole1773
    @phillipinesithole1773 2 роки тому +1

    Hi Prof Gabor I am starting with my analysis for my masters project I have the bim bed and fam files do I use the same method to merge the files

    • @GenomicsBootCamp
      @GenomicsBootCamp  2 роки тому

      Hi,
      The logic is similar regardless ot the file type. If you have binary ped files, as you describe, use --bmerge, for ped+map file --merge
      www.cog-genomics.org/plink/1.9/data#merge

  • @georgewanjala4605
    @georgewanjala4605 2 роки тому +1

    Prof, please advise me here, is it recommended to merge different final reports before converting them or otherwise?

    • @GenomicsBootCamp
      @GenomicsBootCamp  2 роки тому

      Technically you can jion the two final reports also before conversion, but this has a number of uncertain points. So it is way safer the convert both separately and then merge in PLINK the (binary) ped files.

  • @indologyandindianhistory673
    @indologyandindianhistory673 2 роки тому +1

    Hello Gabor! Thank you very much. Am learning a lot from your videos. I've recently run into a practical problem while trying to merge two datasets I'm interested in analysing. One of the datasets is in geno/snp/ind format and the data I want to merge into this is in plink bed/bim/fam format. How should I go about merging here? Any help or guidance would be highly appreciated

    • @GenomicsBootCamp
      @GenomicsBootCamp  2 роки тому +1

      Hi, I am not familiar with the geno/snp/ind format. Are these "Ancestrymap" files?
      In any case, the process is similar. Convert the geno/snp/ind files to ped+map, or bed/bim/fam and then use the usual merging process. There must be conversion tools around, so you do not necessarily need to write your own code.
      I found this one for conversion, not sure if it is your case: reich.hms.harvard.edu/software/InputFileFormats
      Let me know if this worked, could be a nice topic for a future video.

    • @indologyandindianhistory673
      @indologyandindianhistory673 2 роки тому

      @@GenomicsBootCamp hi Gabor! I think you are right and I've installed Linux on my machine to use the eigensoft tools. Will let you know if I succeed in merging I'll try over the weekend.
      It would be great if you cover this in a future video as you seem to think as well :)

    • @indologyandindianhistory673
      @indologyandindianhistory673 2 роки тому

      @@GenomicsBootCamp Hi Gabor! I managed to merge the datasets I was interested in using the eigensoft tools convertf and mergeit. Thanks for the guidance and searching for the Reich lab page

  • @drchinmoymishra
    @drchinmoymishra Рік тому +1

    Please share the data files used for easy understanding

    • @GenomicsBootCamp
      @GenomicsBootCamp  Рік тому

      I do not have the original small files any more, but I will create something similar and share.

    • @drchinmoymishra
      @drchinmoymishra Рік тому

      @@GenomicsBootCamp that will be great help

  • @zahrakhamis8554
    @zahrakhamis8554 Рік тому +1

    great explanation, helped me alot, how can we merge dosage files saved as txt files using plink and R ?

    • @GenomicsBootCamp
      @GenomicsBootCamp  Рік тому

      Hi,
      I think PLINK currently can not deal with dosage. Maybe PLINK2 will be able, but as I found out yesterday, the merge functionality there is still not complete there, so we will have to wait for it...

    • @GenomicsBootCamp
      @GenomicsBootCamp  Рік тому

      Correction, of sorts:
      Seems PLINK has some capacities for dosage, see:
      www.cog-genomics.org/plink/1.9/assoc#dosage
      On the other hand, the way the description is phrased on the linked part, and elsewhere on PLINK website, it seems the PLINK authors themselves are not too convinced, so I would be careful.