Our lab always had money to pay bioinformaticians to do all of our RNAseq analysis. Now we missed out on a grant and don’t have money to pay them, so I have to learn how to do it myself. Thanks a lot, this was very helpful.
Amazing tutorial. I have been really trying to start learning R and this is the first tutorial I have found that starts with read counts and walks through DESeq2 in an accessible way. I really appreciate the education
Amazing tutorial!!! For those who have a dataset with geneIDs such as entrez as the rownames and also want to annotate them before plotting, run this code after doing the resres[Order(res$padj),] and write.csv function (using the same variables as in the tutorials, but pretending that the gene names aren't provided): # Read the .csv function again so R Studio visualises it as a table (make sure to enable headers) res.annotated
hello, Mike excellent tutorial. Really, your video is a great contribution to students, faculties, and researchers who want to work in this field. Thanks as I have learned much from you. GOD bless you
I cannot tell you how great this was!!! I’m a Medical student currently doing research and this tutorial just made it so easy. Thanks a lot Mike for putting it on UA-cam! I feel guilty having access to this for free lol
Thanks for watching, but I don't have access to that dataset. We mapped public reads to transcripts to create it. This was 2 or 3 years ago for a class that I'm no longer teaching. Sorry about that.
Hi Mike, Please suggest, As I have 4 sample (Control, Treatment 1, 2, and 3) with two condition of temperature and 3 replicate for each. How to put replicate 1,2,3 in columns? Should I use R1,R2,R3 of control of one condition and R1, R2, R3 of control of another condition.? Please suggest. Thanks
Hi, thank you so much . I have a question how did you make the first column as a row names? I get "duplicate 'row.names' are not allowed error' every time!!
Yea, i know that error. It's because you have the same gene name more than once and R won't allow duplicate row names. The best idea may be to i.d. the duplicates and rename them like gene.1 or gene.2.
my data set is SARS-2 and control as like your water and 15psi.So in that case which one is equal to your condition??I am confused to interpret MA plot.Kindly help me in that case.
Do you know of any good tutorials that show how to use Deseq for 4 groups at once instead of just 2? For instance, if I had A, B, C, D and I wanted to compare A vs B, A vs C, and A vs D. Is that pretty tricky?
Thankyou for your video Sir. Sir in your video the command write .csv; when it create, it just prints the number without "+enumber na", without this is it correct?
In R you can label column names and row names. I think Im setting the names of the rows equal to what's in the first column (row.name = 1). So if i want to extract info for a specific gene or label genes in a graph, I can use the rownames. Default rownames are 1-n.
like this GEO data set GSE138252 .In the supplementary file there are 3 files in txt format. So i want to analyze the differentially expressed gene from this. But i cannot define the control and infection sample from them.Moreover how can i generate the phenodata of that data set?
Hi Mike. Thank you very much for this tutorial. I have a question to ask. Should the RNA-seq data normalized by RSEM follow the same method for DEseq2 normalization? Thank you
I think it means: for these genes, a sample had an expression value that was an outlier of the expected distribution and a difference may or may not be trusted. I.e. I have 6 samples and for a gene I see expression values like 0,0,0,0,0,100. An outlier may be driving an observed difference. Could be wrong though.
Hi Mike, thank you so much for making such a great video!! I have one question about using apeglm. In the beginning you loaded the apeglm package, but I didn't see where you actually run the package in this example. Could you please give explanation on how to use the package, in terms of on what basis I should be choosing to use apeglm, the codes to run the package, and any other necessary setting to run apeglm (e.g. beta prior, fit type, and test type?) (Sorry for the lengthy question btw)
Its probably just a formality and may not have used apeglm in this example. It can be a dependency used by a couple DESeq commands like lfcShrink() which reduces log fold change for lowly expressed genes. Can't tell you any more about it than that lol
Hi Mike, Thanks a lot for the tutorial. Can I please ask you a question regarding choosing Facotor levels for DeSeq2? I have 3 sets of samples (i) infected 1 (2) infected 2 (3) control. I would like to compare (1) control against infected 1, (2) control against infected 2 (3) infected 1 against infected 2 and (4) control against infected 1 and infected 2 combined. I am not sure how set the factors and factor levels for this ? Could you please give me a suggestion? Many thanks, GG
All of that would be described in your coldata file. I believe deseq will do pairwise comparisons of your factors, so you'd have 1 column with factors: control, infected1, infected2 and it'll do all the pairwise comparisons (i think, never tried). You can also have another column with just control and infected as the factors and run a separate experiment just comparing control and infected samples. Hope that makes sense.
Hi, Thank you so much for this, finally a good explanation that works:) I have one question please, I am more interested in the differentially expressed genes in the test group not the control, like I am trying to have "condition_15spu_vs_water". I tried to change the sample order in the sample sheet but it did not work.
The sample order has to match exactly in the dataset and the column data sheet. In reality though, the order doesn't matter, like I just talk about control vs test, but the interpretation is the same if it's test vs control.
The number of variables in your info and cts files are not same. It has to be an equal number. Maybe you forgot to index your first data frame column. He indexed his genes column and his variable count became 6 in both the files.
Hi Mike, may God bless you for making this video (and other video). Your channel is give the best explanation I've seen so far.
Prolly the best RNA Seq analysis I've ever seen. Thank you!!!
I am doing a rna-seq analysis on sugarcane and Ive never touched R and today I made my first graph with Deseq2 thanks to you!!!!
Our lab always had money to pay bioinformaticians to do all of our RNAseq analysis. Now we missed out on a grant and don’t have money to pay them, so I have to learn how to do it myself. Thanks a lot, this was very helpful.
Wonderful Brother!! Simply Wonderful!! Love from Nepal
Amazing tutorial. I have been really trying to start learning R and this is the first tutorial I have found that starts with read counts and walks through DESeq2 in an accessible way. I really appreciate the education
Amazing tutorial!!!
For those who have a dataset with geneIDs such as entrez as the rownames and also want to annotate them before plotting, run this code after doing the resres[Order(res$padj),] and write.csv function (using the same variables as in the tutorials, but pretending that the gene names aren't provided):
# Read the .csv function again so R Studio visualises it as a table (make sure to enable headers)
res.annotated
hello, Mike excellent tutorial. Really, your video is a great contribution to students, faculties, and researchers who want to work in this field. Thanks as I have learned much from you. GOD bless you
Thanks Mike, this is the best explanation I have come across for a beginner like me
Hi Mike, your video save my life as a beginner bioinformatician that will be giving a presentation in two weeks Lolol
Glad to help :)
Hi Mike ! This video helped me alot ! thank you !
The best best tutorial ever!!!!! Thank you sooooo much!
Thank you, please keep sharing your knowledge. Very elaborative tutorial ever watched.
I cannot tell you how great this was!!!
I’m a Medical student currently doing research and this tutorial just made it so easy. Thanks a lot Mike for putting it on UA-cam! I feel guilty having access to this for free lol
Thanks very much, really good explanation there Mike.
please put the video for heatmap using complexheatmapk package for the differentially expressed genes. thanks
very useful thank you. also could you please give us the link of the dataset so we could have a practice on it?
Thanks for watching, but I don't have access to that dataset. We mapped public reads to transcripts to create it. This was 2 or 3 years ago for a class that I'm no longer teaching. Sorry about that.
How are you setting up a notepad file to get two variables in the Coldata file? Please help me with this
How DESeq2 calculates p-value, I know how the mean and LFC are calculated but how p-value is calculated? thanks
HI Mike, Thank you so much for your great explanation. How I can download RNA seq from GEO to analysis based on the your method? thanks
Hi Mike, Please suggest, As I have 4 sample (Control, Treatment 1, 2, and 3) with two condition of temperature and 3 replicate for each. How to put replicate 1,2,3 in columns? Should I use R1,R2,R3 of control of one condition and R1, R2, R3 of control of another condition.? Please suggest. Thanks
Awesome, thank you eternally
YOU are Fantastic. Thanks a lot.
Hi, thank you so much .
I have a question how did you make the first column as a row names? I get "duplicate 'row.names' are not allowed error' every time!!
Yea, i know that error. It's because you have the same gene name more than once and R won't allow duplicate row names. The best idea may be to i.d. the duplicates and rename them like gene.1 or gene.2.
Or use a more specific gene identifier.
Hi! where can I find this read count file?
Thank you so much. It helped me alot.
Hi Mike, I have a problem in the command dds
I think you can round() your dataset.
@@mikevandewege3007 thanks for your valuable suggestion. i will try the rounding up
Very Good explanation. Would please make another video on Edger
I don't know how to use edgeR 😉
my data set is SARS-2 and control as like your water and 15psi.So in that case which one is equal to your condition??I am confused to interpret MA plot.Kindly help me in that case.
Do you know of any good tutorials that show how to use Deseq for 4 groups at once instead of just 2? For instance, if I had A, B, C, D and I wanted to compare A vs B, A vs C, and A vs D. Is that pretty tricky?
Thankyou for your video Sir. Sir in your video the command write .csv; when it create, it just prints the number without "+enumber na", without this is it correct?
Thank you!😊
What is the gse I'd of your data
We made the data as part of a class. It's not public.
Beautifully explained. Thank you so much. For someone who doesn't know R why is the row name 1 in the first command when we have so many Genes (13K)
In R you can label column names and row names. I think Im setting the names of the rows equal to what's in the first column (row.name = 1). So if i want to extract info for a specific gene or label genes in a graph, I can use the rownames. Default rownames are 1-n.
thanks for making this video man!
Some sorts of data are not like that. so in that kind of data how can i analyze?
For example?
like this GEO data set GSE138252 .In the supplementary file there are 3 files in txt format. So i want to analyze the differentially expressed gene from this. But i cannot define the control and infection sample from them.Moreover how can i generate the phenodata of that data set?
That's probably when i'd ask the authors for help f available. I tend to not query the GEO, for no good reason though.
@@mikevandewege3007 So how can i analyze the DEG easily by DESeq without any Obstacle in R.
@@mikevandewege3007 So how can i analyze the DEG easily by DESeq without any Obstacle in R.
Hi Mike. Thank you very much for this tutorial. I have a question to ask. Should the RNA-seq data normalized by RSEM follow the same method for DEseq2 normalization? Thank you
I would refer to RSEM's manual about diff exp and normalization.
@@mikevandewege3007 Thank you. I am having some troubles applying a DEseq2 to my RNA seq data.
Thank you for this!
great video, great mohauk! thanks
Hi, in summary(res) what do outliers mean?
I think it means: for these genes, a sample had an expression value that was an outlier of the expected distribution and a difference may or may not be trusted. I.e. I have 6 samples and for a gene I see expression values like 0,0,0,0,0,100. An outlier may be driving an observed difference. Could be wrong though.
It was very useful
Hi Mike, thank you so much for making such a great video!! I have one question about using apeglm. In the beginning you loaded the apeglm package, but I didn't see where you actually run the package in this example. Could you please give explanation on how to use the package, in terms of on what basis I should be choosing to use apeglm, the codes to run the package, and any other necessary setting to run apeglm (e.g. beta prior, fit type, and test type?) (Sorry for the lengthy question btw)
Its probably just a formality and may not have used apeglm in this example. It can be a dependency used by a couple DESeq commands like lfcShrink() which reduces log fold change for lowly expressed genes. Can't tell you any more about it than that lol
@@mikevandewege3007 Thanks!
Hi Mike, Thanks a lot for the tutorial. Can I please ask you a question regarding choosing Facotor levels for DeSeq2? I have 3 sets of samples (i) infected 1 (2) infected 2 (3) control.
I would like to compare
(1) control against infected 1,
(2) control against infected 2
(3) infected 1 against infected 2 and
(4) control against infected 1 and infected 2 combined.
I am not sure how set the factors and factor levels for this ? Could you please give me a suggestion? Many thanks, GG
All of that would be described in your coldata file. I believe deseq will do pairwise comparisons of your factors, so you'd have 1 column with factors: control, infected1, infected2 and it'll do all the pairwise comparisons (i think, never tried). You can also have another column with just control and infected as the factors and run a separate experiment just comparing control and infected samples. Hope that makes sense.
@@mikevandewege3007 Thanks a lot for your quick reply!!!
very useful. Thanks!!
THANK YOU!
Hi,
Thank you so much for this, finally a good explanation that works:)
I have one question please, I am more interested in the differentially expressed genes in the test group not the control, like I am trying to have "condition_15spu_vs_water". I tried to change the sample order in the sample sheet but it did not work.
The sample order has to match exactly in the dataset and the column data sheet. In reality though, the order doesn't matter, like I just talk about control vs test, but the interpretation is the same if it's test vs control.
> dds_new
The number of variables in your info and cts files are not same. It has to be an equal number. Maybe you forgot to index your first data frame column. He indexed his genes column and his variable count became 6 in both the files.