I am with zero experience, and failed so many times by following youtubers, you script works and I can easily catch up, even different methods. Thankyou sooooooooomuch.
Many thanks for this video. It was extremely helpful! Just a quick question, do you have a link to any papers that use the same method for ranking genes? I've gone for the same approach, but will need to defend it in my viva and I am struggling to find publications using this method. Secondly, I just want to confirm that you use regular p-values rather than adjusted p-values for the ranking calculation?
Perhaps you didn't know, but now there's a software platform called RNAlysis that is highly modular and contains a graphical user interface so that you can interrogate any RNA based research question and never have to write a single line of code.
A prebuilt software will never have the flexibility of coding. Said this, maybe for entry level is just fine... but in real research you may need to explore different settings to get the best results.
Thank you for the nice tutorial, I have two questions: 1/ From the Github-page of fgsea (issue #131 'Which genes should I use"), the developer seems to mention that fgsea automatically extracts background genes from the sorted input vector. You use this custom function to filter the pathways yourself. I ran both options (i.e. once without filtering and once with filtering) and results are very comparable, but NES-values and P-values are not exactly the same. Would you still recommend using the function? 2/ To rank genes, you use (df$log2FC)*(-log10(df$PValue)). Do you have any references to the use of this formula? One of the developers advises to use the test-statistic to rank (i.e. t column from limma and stat column from DESeq2). I have also seen ranking be done based on log2FC. Why do you advise this formula? Thank you.
Hi, thanks so much for your comment, two very good questions. 1. I wasn't aware that fgsea already filters out genes from the gene sets - not sure if the comment you referred to means that, I would need to check the methods or the fgsea funtion itself to confirm this. But thanks so much for asking and letting me know! Once I check it I'll post it:) In case you want to read about this filtering, this recent article covers the topic. genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0761-7 2. You're totally on point, I've also seen different methods. I guess I'd rather go for a combination of both because I want to rank them also by significance, not only fold change. But again, depends on your project and what you're looking for. I think I shared a paper already on my blogpost which recommends this ranking method - www.ncbi.nlm.nih.gov/pmc/articles/PMC3957066/
I am with zero experience, and failed so many times by following youtubers, you script works and I can easily catch up, even different methods. Thankyou sooooooooomuch.
Another great video! Thank you Biostatsquid, see ya in the next one!
Great concise tutorial!
Much appreciated!👍
Do you have more information on the DEG data? Is it disease vs control? Which disease and which tissue?
Excellent tutorial, thank you!
thanks. at the end I got confused with all the ups and downs
Teacher, df and significant_df, which one is suitable for bg_gene? Tanks so much.
Teacher, i need ur help, since the dataset is df gene, why many p value is above 0.05? how do i get the df gene set for my scRNA?
Many thanks for this video. It was extremely helpful! Just a quick question, do you have a link to any papers that use the same method for ranking genes? I've gone for the same approach, but will need to defend it in my viva and I am struggling to find publications using this method.
Secondly, I just want to confirm that you use regular p-values rather than adjusted p-values for the ranking calculation?
Perhaps you didn't know, but now there's a software platform called RNAlysis that is highly modular and contains a graphical user interface so that you can interrogate any RNA based research question and never have to write a single line of code.
A prebuilt software will never have the flexibility of coding. Said this, maybe for entry level is just fine... but in real research you may need to explore different settings to get the best results.
Can you please put your all your code?
Hi! Yes of course, you can find it here: biostatsquid.com/fgsea-tutorial-gsea/
Promo`SM 😳
Thank you for the nice tutorial, I have two questions:
1/ From the Github-page of fgsea (issue #131 'Which genes should I use"), the developer seems to mention that fgsea automatically extracts background genes from the sorted input vector. You use this custom function to filter the pathways yourself. I ran both options (i.e. once without filtering and once with filtering) and results are very comparable, but NES-values and P-values are not exactly the same. Would you still recommend using the function?
2/ To rank genes, you use (df$log2FC)*(-log10(df$PValue)). Do you have any references to the use of this formula? One of the developers advises to use the test-statistic to rank (i.e. t column from limma and stat column from DESeq2). I have also seen ranking be done based on log2FC.
Why do you advise this formula?
Thank you.
Hi, thanks so much for your comment, two very good questions.
1. I wasn't aware that fgsea already filters out genes from the gene sets - not sure if the comment you referred to means that, I would need to check the methods or the fgsea funtion itself to confirm this. But thanks so much for asking and letting me know! Once I check it I'll post it:)
In case you want to read about this filtering, this recent article covers the topic. genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0761-7
2. You're totally on point, I've also seen different methods. I guess I'd rather go for a combination of both because I want to rank them also by significance, not only fold change. But again, depends on your project and what you're looking for. I think I shared a paper already on my blogpost which recommends this ranking method - www.ncbi.nlm.nih.gov/pmc/articles/PMC3957066/