Genomics in practice - Genotype data format change with PLINK

Genomics Boot Camp

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 16 лип 2024
Accompanying text, including any code, for this video in my "Genomics Boot Camp" book, available online at:
genomicsbootcamp.github.io/bo...
Support the channel
*******************************************
To support the Genomics Boot Camp channel, check out the associated site for genotyping services covering livestock, companion animals and more: azdatasolutions.eu/

КОМЕНТАРІ • 49

@Fasilgetachew 3 роки тому ⁺⁴
Thanks, Professor. Your channel is an excellent support in analyses of my data. Keep up the nice work.
@seyedhashemi9636 3 роки тому ⁺²
Thanks for such an informative video!
@moslemmoghbeli4325 5 місяців тому ⁺²
thank you for all of Video
@mdrasheduzzaman7613 3 роки тому ⁺⁷
Thanks a lot Professor. It helps a lot. I faced a problem. Actually I was using the GWAS PLINK type and plink was not finding the options "--file", "--freq", "--recode", "--out" etc. Using "./plink" instead of "plink" solved the issue. Maybe the piece of info will help others and save a lot of time. :)
Thanks a lot, again.
@GenomicsBootCamp 3 роки тому ⁺³
Thanks! Indeed the "./plink" should be used instead of "plink" on Linux OS, and probably also Mac. The test runs and the video was done in Windows 10
@baharehbehrooziasl9517 2 роки тому
Hi, I am running R on Linux system. When I called PLINK, it ran normally. However, when I wanted to apply the first line of code to change the format, I received error message. I tried ".\plink" but it gave me an error. Do you have any suggestion?
@mdrasheduzzaman7613 2 роки тому
@@baharehbehrooziasl9517, Hello. I think it is just a typo. It should be a forward slash, not a back slash (back slash is used in windows file system actually). So, try "./plink" with forward slash. Let us know if it solves the issue.
@baharehbehrooziasl9517 2 роки тому
@@mdrasheduzzaman7613 Thanks for your prompt response. Sorry for the typo, I actually used "./plink" in the code. But it did not work. It gave me the warning error:"error in running command". However, when I used "plink", it ran normally but it gave me the error " unknown option "--bfile" ( and the same for recode and out).. when I tried to recode the dataset.
@mdrasheduzzaman7613 2 роки тому
@@baharehbehrooziasl9517, I think the probable reason is your plink executable file is not in the working directory you set for your R session. So, try resetting the working directory (where your plink file is).
Note: Put all your data files and the plink file in the same directory and run the code to see if it works correctly.
@yawpr3ko837 11 місяців тому ⁺¹
How do I change from CSV file to PED or MAP file?
@adamramses9722 3 роки тому ⁺¹
Thanks for your videos Professor it's a great help indeed, I would love a video about changing Bed/bid/fam format into raw data format like 23andmefile format so it can be used in analysis like gedmatch and such websites.
@GenomicsBootCamp 3 роки тому
Hi! At some point, I want to make a larger video on all #PLINK input/output files...
For now, for your question you may try the line:
plink --bfile [binary ped file] --recode 23 --out outputName
You need to extend it with --chr-set or similar if you have non-human data. This approach handles one individual at a time.
Thinking about it now, this info with some more details might be a decent video... Thanks for the suggestion!
@adamramses9722 3 роки тому
@@GenomicsBootCamp Thanks so much for your answer professor would love a video about that would be really helpful, I tried to follow your instructions but i guess am missing smth am trying to convert a bed file but it's has lots of samples and i keep getting an error Error: --recode 23 can only be used on a file with exactly one sample.
@GenomicsBootCamp 3 роки тому
@@adamramses9722 Yes, the --recode 23 seems to work that way. So you need to combine it with a --keep option in the same line, keeping only a single individual at a time. The --keep option is explained in the video "How to select and remove individuals in PLINK" on this channel. So you can go ahead and try this.
One additional issue is that if you have many individuals, the manual approach could be tedious, so you need to implement a looped solution, that runs things automatically. I will also try to provide such a solution in a video.
@adamramses9722 3 роки тому
@@GenomicsBootCamp Thanks so much for your guidance professor i tried to search about this but sadly didn't find much information regarding this so a video about that would be really helpful
@GenomicsBootCamp 3 роки тому
The video on changing PLINK to 23andMe comes tomorrow. Thanks for the idea!
@georgewanjala4605 3 роки тому ⁺¹
Professor, I would like to know how to use subfolders (sub-directories) in the main directory, i.e. if I have some clustered datasets saved in subfolders and I want plink and R to read from them and save output there directly to avoid jamming data in the main folder.
Otherwise, I enjoy following your tutorials repeatedly, they are elaborative and very helpful.
@GenomicsBootCamp 3 роки тому
George!!! Thank you for the commen and the question! Somehow I did not think about this before, but it is a great improvement to data management (will make a video about this and credit you if you agree...)
As for the answer:
1) You create a subdirectory, e.g. output - could be anything you want
2) You specify the --out statement as: --out output/outFileName where "outFileName" is the name you would normally state in the --out option
3) The whole thing works also as input, e.g. --file data/inputData where data is the subfolder where inputData.ped and inputData.map are stored
4) I tested it on Windows 10 system, so for Linux (and Mac?) one has to probably use the opposite sleash, i.e. the \
@georgewanjala4605 3 роки тому
@@GenomicsBootCamp Thankyou so much professor.
@georgewanjala4605 3 роки тому ⁺¹
Dear Professor,
I have encountered some errors, please check out in your email for captures, I tried to share them here but was unable. thanks
@georgewanjala4605 2 роки тому ⁺¹
@Genomic Boot Camp, would you please advise on how to convert plink files to FASTA file format or Arlequin file format.
@GenomicsBootCamp 2 роки тому ⁺¹
here seems to be an easy way, but you need perl
github.com/gungorbudak/ped2fasta
@georgewanjala4605 2 роки тому
@@GenomicsBootCamp, Thank you professor
@kanatyermekbayev9 3 роки тому ⁺¹
Hello, if one has .map and .ped files (instead of three you indicated) how shall she/he upload into PLINK using R? Thanks
@GenomicsBootCamp 3 роки тому ⁺¹
Hi, with ped and .map files you need to use the "--file" option instead of the "--bfile" that is in the video. So you only need to delete the "b" and update the file name to yours.
Also, you don't need to upload anything, just have the .ped and .map files in your working directory, similarly as in the video.
@kanatyermekbayev9 3 роки тому ⁺¹
@@GenomicsBootCamp thanks for the response. The problem was with PLINK version 2. Download 1.9 and it is working well.
@GenomicsBootCamp 2 роки тому ⁺¹
@@kanatyermekbayev9 Thanks for the clarification!
@minakshi3645 2 роки тому ⁺²
can you help me to change txt format to ped , should I need to remove extra information present in my file (I downloaded snp data from ucsc browser and gwas catalog)
@GenomicsBootCamp 2 роки тому
Hi, Without an exact format it is hard to suggest a solution. Could you give an example what columns are present, and how is it formatted? E.g. it is one line per SNP or one line per individual?
@minakshi3645 2 роки тому
@@GenomicsBootCamp thank you for replying me..
So If I download my data from gwas catalog it downloaded in tsv format so the columns Are as follows - author,date , journal, link,study, disease, sample size, region, chromosome I'd,mapped gene,SNP id, strongest SNL risk allele p value, p value m log,cnv
And if I fetch my data in CSV format from UCSC browser of apoe gene it have following column ---
Name, chromosome,strand,txstart,txEnd,cdstart,cdends,exon count,exon starts,exonends, protein I'd,align I'd...
These are my format but in Plink the format is different. Can u please tell me where to fetch SNP data for human disease or to change it in Plink required format
@minakshi3645 2 роки тому
@@GenomicsBootCamp thank you for replying me..
So If I download my data from gwas catalog it downloaded in tsv format so the columns Are as follows - author,date , journal, link,study, disease, sample size, region, chromosome I'd,mapped gene,SNP id, strongest SNL risk allele p value, p value m log,cnv
And if I fetch my data in CSV format from UCSC browser of apoe gene it have following column ---
Name, chromosome,strand,txstart,txEnd,cdstart,cdends,exon count,exon starts,exonends, protein I'd,align I'd...
These are my format but in Plink the format is different. Can u please tell me where to fetch SNP data for human disease or to change it in Plink required format
@ademolaaina6059 Рік тому ⁺¹
Hi Prof. Did you find a way to convert vcf to ped map?
@GenomicsBootCamp Рік тому
Hi! Yes, there is a video on in on the channel:
Convert between PLINK to VCF file formats (Remake)
ua-cam.com/video/EJDknrHAkXs/v-deo.html
@ashvinkumarkatral1978 Рік тому ⁺¹
Thank you very much for a handy topic.
I am trying to convert VCF file to plink binary files. But I am facing problem while running. I am ending with "Error: Invalid alternate allele on line 23 of --vcf file". I could check the data file for the same and I could not find any error in the data format.
Please suggest for the best way out.
Thank you very much Sir
@GenomicsBootCamp Рік тому
Hi,
I don't know what could be the problem, but my first thought would be to compare that line with e.g. the line 22. That did not show an error, so should be ok. Then look for anything that is different, especially in the alternate allele colum. maybe it is missing or there is a weird sign there? But maybe the probel is elsewhere on the line.
Also, for trial I would just manually delete that line from vcf file and see if the problem remains. If still the line 23 is indicated then it might be something around it.
If a different line is indicated for the same (e.g. line 100), you can now look at the faulty line 23 and 100, and see what is common in them.
Not very scientific approach, but worth a try.
@liutrvcyrsui 2 роки тому ⁺¹
Thanks for the video. Can PLINK generate Mean Genotype File Format ?
@GenomicsBootCamp 2 роки тому
Hi, To my knowledge not, but you could check the --recode option and prepare the files for an other program whci does. www.cog-genomics.org/plink/1.9/data#recode
In particular, the BimBam program seems promising. From the Appendix 1 of its manual:
"Imputation without panel.
./bimbam -g input/cohort.txt -p input/pheno.txt -e 10 -s 20 -c 15 -o pref
-wmg
This command line asks bimbam to run EM 10 times, each EM run 20 steps. After
imputation, output mean genotypes."
www.haplotype.org/download/bimbam-manual.pdf
@AmitabhBiswas Рік тому ⁺¹
Had an error while running --bfile first command
Error: --export requires at least one output format. (Did you forget 'ped' or
'vcf'?)
@GenomicsBootCamp Рік тому
The error message seems to point towards the output file, so:
1) check if you have the .bim, .bed. and .fam files in the same directory yourun PLINK, just to be sure
2) perhaps you did not specify what type of output you want, so if you do not have e.g. --recode or --make-bed in the PLINK line, just add it there
@vinaymore8210 3 роки тому ⁺¹
how to convert Vcf file to ped and map and what is .tbi file
@GenomicsBootCamp 3 роки тому ⁺¹
Hi, the file conversion will be discussed in the video tomorrow (09.June).
With the .tbi files, I do not have much experience, but they seem to be some kind of index files for VCF.
@kashifkhan-xr8fj Рік тому ⁺¹
Hello sir....Could you please help me how to convert SNP genotypic data txt format into ped and map file?
@GenomicsBootCamp Рік тому
Is it close to any of the PLINK input file formats? See: ua-cam.com/video/ZRyfpe1zqVg/v-deo.html
If yes, adapt to it and use --recode to get ped+map
@kashifkhan-xr8fj Рік тому
@@GenomicsBootCamp ... Thank you for your reply sir... My data doesn't match with any of these formats.... It's an affymetrix genotypic 50k data having columns like Probeset ID, Animals ID ( 90 samples)with AA, AB, BB genotypes, affy SNP ID, chr id , start, strand, dbsnpRS ID etc.
@reemalsaidi8664 5 місяців тому
I receive this comment 15761 MB RAM detected; reserving 7880 MB for main workspace.
Error: Failed to open ADAPTmap_genotypeTOP_20160222_full.map. Also can i convert to csv format??
@GenomicsBootCamp 4 місяці тому
The "Error: Failed to open..." error message usually refers to the missing file. Do you have that map and ped file, with that exat name in your working directory?

Наступне

Автоматичне відтворення

Genomics in practice - SNP data quality control with PLINK