If only there was a word that could accurately represent how much gratitude and love I feel for you every time I watched one of your videos and solve a doubt... you are awesome Professor
The "PhenotypeMeasures.txt" file used for the --pheno should be in the same directory as PLINK. It has a header line, so these columns could be used according to needs, as explained in the second half of the video. Was this your question, or I missed something?
Hello Professor Thank you so much for the work you are doing with these tutorials. I have found them very helpful. I have some work that involves close to 200 phenotypes and I have been using the one-by-one approach, but would certainly like to create a set of looks that could handle the whole work-flow. After creating the phenotype specific bed file I then use GEMMA to run the association and then with a combination of R scripts and plink scripts (both 1.07 and 1.9) I then create region plots where p-value is plotted on the left y-axis, estimated recombination rate is plotted on the right y-axis, and the snps are color coded as a function of LD. Have you an further thoughts on how looping could be employed withoult creating a massive number of files?
Hi, My approach with GEMMA is the following (probably you do something similar, but I list it here to see if we deviate at any point): 1) create a file wich contains all individuals (rows) and phenotypes (columns). This file has a header, to make use of PLINK's --pheno and --pheno-name options 2) create a vector in R for the phenotype names - same names as in the file in 1) Loop starts here for each elelment of the vector, replacing the PHENOTYPE name for the PLINK update 3) prepare the PLINK file for analysis using PLINK system(paste0("./plink --file ...... --pheno pheno_file_from_point1.txt --pheno-name ", PHENOTYPE ," --make-bed --out phenoForGemma 4) run GEMMA - I run the relationship matrix preparation and the actual GWAS run each time. The relationship matrix could be technically done just once. For me the time was not a factor, for so many phenotypes you probably run it just once, outside of the loop. 5) plot the results in R. The output file name from GEMMA is always the same, so a standardized script could be created. Then save the GEMMA output files (with the actual results) and the manhattan plot by renaming the file involveing the PHENOTYPE name from the loop Loop ends here. This still creates a bunch of files, depending on the number of phenotypes you are after, but at least you get rid of the temporary ones, and you can concetrate on the plots/results files. Was this you were after?
Really helpful! Thanks! Just curious whether when using the --assoc function to analyze data, does it calculate p-values based on the phenotype of interest (present in the fam. files), or do we need to specify somehow the phenotype after the command?
To be honest, I prefer GEMMA for GWAS analyses, so a more detailed check on this would be needed. But from the info I see on the PLINK website, I think the phenotype column from .fam file that is being used for this.
Thank you so much professor. I really appreciate it. I have another question that usually comes handy in explaining genetic explaination of phenotypes which is narrow sense and broad sense heritability. I wonder if PLINK can measure? I read the document, but somehow could not figure it out in acgt64 software.
Hi, this is a bit more complex, coming down to modeling, if you consider the entire genetic variance (so including dominance and others), ad the narrow sense is just the additive variance. Maybe there is a video on this at some point, but not now, as it is further away from the current focus, quite deep in quantitative genetics.
Dear Professor, this is very helpful. thank you. I have two questions? 1. what if I want to update the phenotype column with an outcome coded as 0 and 1 (No, Yes), wouldn't the zeros be considered missing? 2. what happens if the number of individuals in the phenotype file is not proportional to that in the genotype files?
Hi, Question nr. 1: I am not sure about the behaviour. Best to try it out in a small sample. But this is relevant only if you want to use the file further with PLINK. In a worst case scenario you rename it to 1-Yes, 2-No Question nr. 2: It does not matter. The phenotype file can have any number of entries, only the ones with matching FID+IID will be updated. If you have more in the pheno file, these will be ignored, if less, these entries will not be updated in your ped file.
If only there was a word that could accurately represent how much gratitude and love I feel for you every time I watched one of your videos and solve a doubt... you are awesome Professor
Thank you for creating this video, Professor. Very helpful!
You are welcome!
Thank you so much for this wonderful content, professor!! Your materials have been invaluable for my research project. 🙏
Thank you professor for your hard work.
Thank you, Prof. Gabor, for another amazing video, I wonder how you set the phenotype column ( height data.tex) file that you used for --pheno.
For you information I have a ped and map file
This comment seems to be disconnected. Could you clarify?
The "PhenotypeMeasures.txt" file used for the --pheno should be in the same directory as PLINK. It has a header line, so these columns could be used according to needs, as explained in the second half of the video.
Was this your question, or I missed something?
Hello Professor
Thank you so much for the work you are doing with these tutorials. I have found them very helpful. I have some work that involves close to 200 phenotypes and I have been using the one-by-one approach, but would certainly like to create a set of looks that could handle the whole work-flow.
After creating the phenotype specific bed file I then use GEMMA to run the association and then with a combination of R scripts and plink scripts (both 1.07 and 1.9) I then create region plots where p-value is plotted on the left y-axis, estimated recombination rate is plotted on the right y-axis, and the snps are color coded as a function of LD. Have you an further thoughts on how looping could be employed withoult creating a massive number of files?
Hi, My approach with GEMMA is the following (probably you do something similar, but I list it here to see if we deviate at any point):
1) create a file wich contains all individuals (rows) and phenotypes (columns). This file has a header, to make use of PLINK's --pheno and --pheno-name options
2) create a vector in R for the phenotype names - same names as in the file in 1)
Loop starts here for each elelment of the vector, replacing the PHENOTYPE name for the PLINK update
3) prepare the PLINK file for analysis using PLINK
system(paste0("./plink --file ...... --pheno pheno_file_from_point1.txt --pheno-name ", PHENOTYPE ," --make-bed --out phenoForGemma
4) run GEMMA - I run the relationship matrix preparation and the actual GWAS run each time. The relationship matrix could be technically done just once. For me the time was not a factor, for so many phenotypes you probably run it just once, outside of the loop.
5) plot the results in R. The output file name from GEMMA is always the same, so a standardized script could be created. Then save the GEMMA output files (with the actual results) and the manhattan plot by renaming the file involveing the PHENOTYPE name from the loop
Loop ends here.
This still creates a bunch of files, depending on the number of phenotypes you are after, but at least you get rid of the temporary ones, and you can concetrate on the plots/results files.
Was this you were after?
Really helpful! Thanks! Just curious whether when using the --assoc function to analyze data, does it calculate p-values based on the phenotype of interest (present in the fam. files), or do we need to specify somehow the phenotype after the command?
To be honest, I prefer GEMMA for GWAS analyses, so a more detailed check on this would be needed. But from the info I see on the PLINK website, I think the phenotype column from .fam file that is being used for this.
Thank you so much professor. I really appreciate it. I have another question that usually comes handy in explaining genetic explaination of phenotypes which is narrow sense and broad sense heritability. I wonder if PLINK can measure? I read the document, but somehow could not figure it out in acgt64 software.
Hi, this is a bit more complex, coming down to modeling, if you consider the entire genetic variance (so including dominance and others), ad the narrow sense is just the additive variance. Maybe there is a video on this at some point, but not now, as it is further away from the current focus, quite deep in quantitative genetics.
Dear Professor, this is very helpful. thank you. I have two questions? 1. what if I want to update the phenotype column with an outcome coded as 0 and 1 (No, Yes), wouldn't the zeros be considered missing? 2. what happens if the number of individuals in the phenotype file is not proportional to that in the genotype files?
Hi,
Question nr. 1: I am not sure about the behaviour. Best to try it out in a small sample. But this is relevant only if you want to use the file further with PLINK. In a worst case scenario you rename it to 1-Yes, 2-No
Question nr. 2: It does not matter. The phenotype file can have any number of entries, only the ones with matching FID+IID will be updated. If you have more in the pheno file, these will be ignored, if less, these entries will not be updated in your ped file.
@@GenomicsBootCamp thank you so much for the feedback
Would this work with the binaries? just changing for bfile?
Yes, it should work the same way across all PLINK files. If binary ped files are used, the bfile is to be specified, as you mentioned.