DESeq2 workflow tutorial | Differential Gene Expression Analysis | Bioinformatics 101

Поділитися
Вставка
  • Опубліковано 28 сер 2024

КОМЕНТАРІ • 228

  • @animatedbiologywitharpan
    @animatedbiologywitharpan 2 роки тому +37

    For an Indian PhD student like me (who is not familiar with Bio Info) this channel is a blessing . I will share it with my batchmates. Very nice youTube channel. Keep it up.

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      Thank you Arpan, I am glad you find my videos helpful! :)

    • @AbdullahSharabati
      @AbdullahSharabati Рік тому +3

      So, even Indians needed tutorials?
      Sorry it's just a bad joke, please don't mind me, all respect. :)

    • @animatedbiologywitharpan
      @animatedbiologywitharpan Рік тому +3

      @@AbdullahSharabati Yes why not we all need help. I actually learn from her channel quite frequently. As Indians, we have developed a culture of Peer Learning.

    • @AbdullahSharabati
      @AbdullahSharabati Рік тому

      @@animatedbiologywitharpan I totally understand you and know you are meaning, I was just kidding, really, sorry

  • @AyrodsGamgam
    @AyrodsGamgam Рік тому +7

    wow, you made it like a promenade in the park on a nice spring day. Thanks. Plz never stop making these videos, you are a true prophet!

  • @andreaseriksson4578
    @andreaseriksson4578 Рік тому +6

    Thank you for an excellent and pedagogical video on how to operate DESeq2!
    I had some initial issues, as I had to replace the row numbers with my gene symbols (which where in their own column). But once I had that figured out, for both files (counts and coldata), everything worked smoothly.
    If someone has the same issue, use this script (for colData; similar process for counts):
    DF

  • @user-tl4yl7hq1d
    @user-tl4yl7hq1d 3 місяці тому

    I very much appreciate that you gave a very clear and concise explanation of the workflow of DESeq2. I've learned a lot from it

  • @devinjones7271
    @devinjones7271 Рік тому +2

    SO HELPFUL!!!! I wish I knew about this channel during my phd...

  • @asshimul1168
    @asshimul1168 2 роки тому +7

    That's excellent magic indeed. You have done perfectly. Would you please create the next series according to the same data, how to analyze up and down-regulated gene comparison between treated and untreated groups by using box plot or something? It will be helpful as a newbie for me.

    • @Bioinformagician
      @Bioinformagician  2 роки тому +4

      I will surely consider making a video covering downstream processing and visualization of these DEG :)

  • @catherinewaaaang
    @catherinewaaaang 4 місяці тому

    this is my first time using DESeq2 and your explanations and demonstrations in this video were amazing! tysm

  • @Spirrie2002
    @Spirrie2002 3 місяці тому

    Your tutorials are some of the best on you tube for sure! Well done and thank you so much!

  • @mayconmarcao4554
    @mayconmarcao4554 2 роки тому +1

    Beyond your excellent content, you nailed with your channel name choice (bioinformagician) 🤣😂😂. Thank you!

  • @user-vk8bd1re8c
    @user-vk8bd1re8c 3 місяці тому

    Now I'm working on projekt and writing application reading disfunction expression From genotypem by cell repair In C++ - This video is very professional and helping Me to understanding data set From IT. Thanks You

  • @MGRVE
    @MGRVE 5 місяців тому

    Great tutorial. One comment: reducing the size of the input is not done primarily because of reducing the computational burden, but to lessen the impact from multiple testing correction.

  • @user-mg4vj7yo6v
    @user-mg4vj7yo6v 5 місяців тому

    Your videos make every step so easy to understand!

  • @momilan
    @momilan 5 місяців тому

    Thank you!!!! -From Las Vegas, Nevada

  • @niharikasingh7677
    @niharikasingh7677 Рік тому

    Your channel is extremely helpful to me and has been a real world saviour for gaining a fundamental understanding of my projects. I am working with gene knockout vs control conditions and will be using your pipeline to do the further analysis. Thanks again and keep up the amazing work!! 💛💛

    • @niharikasingh7677
      @niharikasingh7677 Рік тому

      Hi again! I tried to use this method but I'm facing a small error from my end. The Gene IDs are a separate column and hence my no. of rows are not equal to the no. of columns. How did you ensure that the Gene IDs don't get counted as a separate column?

    • @aymsagagi
      @aymsagagi Рік тому

      I am having the same problem !

  • @CynthiaFrancis-sv4rc
    @CynthiaFrancis-sv4rc Місяць тому

    This was great! Thank you.

  • @mamosangcala6499
    @mamosangcala6499 Рік тому

    This was super helpful and easy to follow, thank you sooo much 🙌🙏💓. You are a star

  • @andrenicolau3824
    @andrenicolau3824 2 роки тому

    Congratulations for your channel. I'm subscribing because of this video and your clear explanation...

  • @umarsheikh1992
    @umarsheikh1992 2 роки тому +1

    Thank you the tutorial, was highly helpful and informative.

  • @samuelyeo5450
    @samuelyeo5450 2 роки тому

    Thanks for your tutorial! It was clear, concise and extremely helpful.

  • @RaquelAjalik
    @RaquelAjalik Рік тому

    Absolutely amazing! Thank you so much! You are so gifted.

  • @audebenigneikuzwe4531
    @audebenigneikuzwe4531 Рік тому

    thank youuuuu, you just saved my life literally

  • @RicardoRodriguez-yu8ss
    @RicardoRodriguez-yu8ss Рік тому

    I have learned a lot from your videos! you are the best :)

  • @user-db2os6sr8s
    @user-db2os6sr8s 2 роки тому +1

    Medical student who was struggling with this ! You're so kind and helpful, Thx!!
    And I'm curious about how to export the DESeq2 results into csv file or Excel file to check which gene is on the Upper/Lower right quadrant on the MA plot.

    • @Bioinformagician
      @Bioinformagician  2 роки тому +7

      This is how you can export your results:
      write.csv(as.data.frame(results),
      file="results.csv")

  • @sachithrak.yaddehige6251
    @sachithrak.yaddehige6251 Рік тому

    It was very helpful and clear. Thank You

  • @jgitau001
    @jgitau001 Рік тому

    Very well explained, thank you preparing this video. . .

  • @Sadin15
    @Sadin15 Рік тому

    Thank you so much! This was incredibly helpful.

  • @abhisheksawalkar1018
    @abhisheksawalkar1018 Рік тому

    Greatly explained. Thanks

  • @naveedkhan-fi6ux
    @naveedkhan-fi6ux Рік тому

    it was very easy and informative..... but I really wish you could also work on rice genome

  • @adeyemioluwaseun334
    @adeyemioluwaseun334 10 місяців тому

    Easy to understand video. thank you

  • @kobrarahimi9164
    @kobrarahimi9164 2 роки тому

    well done!
    wait for more videos.

  • @bobyang8491
    @bobyang8491 2 роки тому

    Thanks a lot for making this video! This is very very help ful!!!

  • @deepshikharathore4182
    @deepshikharathore4182 5 місяців тому

    kindly share a video on how to perform differential expression analysis of transcriptome data from TCGA database

  • @soniabachamp347
    @soniabachamp347 Рік тому

    Thank you so much for preparing this video for us. It was extremely useful! I will definitely subscribe to your channel!

  • @marioperez8302
    @marioperez8302 Рік тому

    Your videos are wonderful! would you consider expanding on the use of contrast, perhaps a demonstration with a sample with 3 conditions or more exploring the results? Thank you for considering it and keep up the great work you are doing!

    • @Bioinformagician
      @Bioinformagician  Рік тому +1

      I will surely consider making a video on contrasts. Thanks for the suggestion.

  • @awesomemiso
    @awesomemiso 2 роки тому

    Thank you, awesome explanation, I am now a subscriber :)

  • @JulioSSierraCamarena
    @JulioSSierraCamarena 2 роки тому

    This is so nicely explained, thanks for your videos :3

  • @stretch8390
    @stretch8390 2 роки тому +3

    Would you be willing to do a video on more complicated design setups for DESeq?

    • @Bioinformagician
      @Bioinformagician  2 роки тому +3

      Thank you for the suggestion. I will surely consider making a video covering this topic :)

  • @furkankurtoglu_sys_bio
    @furkankurtoglu_sys_bio 2 роки тому

    Thank you very much! Such a great video!

  • @Sadin15
    @Sadin15 10 днів тому

    Question: For the DESeq summary, what constitutes 'low counts," and what is the "mean count < 6" mean?

  • @johnbaker3296
    @johnbaker3296 Рік тому +3

    Hey great stuff! I was wondering, what if you wanted to compare treated vs untreated but per cell line, would you have something in your design when creating your deseq object like (design = cell_line + condition) or is this extracted using contrasts or both?

  • @maisie2735
    @maisie2735 3 місяці тому

    thank you so much

  • @ethanvouzas6255
    @ethanvouzas6255 5 місяців тому

    Supremely useful!!!

  • @freezingtolerance7493
    @freezingtolerance7493 Рік тому +1

    Hello. I have an quick question in terms of normalization. Since Deseq2 itself has a normalization algorism, I do not need to do further normalization such as FPKM? Or, before performing Deseq2 run, should I first do normalization my read count data?

  • @ahmadnajem9762
    @ahmadnajem9762 5 місяців тому

    Thank you so much for your videos. I would like to learn the pipeline for RNA seq data analysis and I am wondering if there is any order to follow up on the videos. Thank you so much

  • @shamimashrafiyan8591
    @shamimashrafiyan8591 21 день тому

    thanks for the video, If I want to skip "estimate size factor" part, how can I do it? because my data is deseq normalized of expected counts.

  • @user-bn6hv7hq9i
    @user-bn6hv7hq9i 10 місяців тому

    so wonderful!!!! thanks a lot!

  • @kjeyaprakash2638
    @kjeyaprakash2638 3 місяці тому

    Sorry if my question is wrong. You have done DEseq with raw gene counts. is not required to convert these id to gene name and normalize to FPKM or TPM for further analysis?

  • @jahanshanzida7697
    @jahanshanzida7697 2 роки тому

    You did a great job

  • @alexyang274
    @alexyang274 2 роки тому

    absolutely great videos

  • @amus21455
    @amus21455 Рік тому

    really helpful! Thank you so much! But would love to know the explaination of each command. Like what do "~", "," do in the command. Thankssssssss

  • @shilpisehgal5613
    @shilpisehgal5613 2 роки тому

    Keep up the good work.

  • @tushardhyani3931
    @tushardhyani3931 2 роки тому

    Thank you for this !!

  • @fizzahzulfiqar2884
    @fizzahzulfiqar2884 11 місяців тому

    @Bioinformagician very helpful. can you please make vedios on functional annotation of RNA seq data. It would be very helpful

  • @coachjohnhaynes5366
    @coachjohnhaynes5366 Рік тому

    Thanks for the videos

  • @rikstoyandactivityzone5669
    @rikstoyandactivityzone5669 Рік тому

    Thank you for the video

  • @manavgandhi2503
    @manavgandhi2503 2 роки тому +2

    Hello,
    I really loved the walkthrough of DESeq2. Definitely going to follow your channel. I have a question. I have a counts matrix with following groups: control, treatment 1, treatment 2, treatment 3. First, I need to draw comparison between each treatment and control as reference and then compare between treatments. While setting the factor level, if I use control as the reference, then will it draw comparisons as follows: control vs treatment 1, control vs treatment 2 and control vs treatment 3? I believe I can use the contrast function for getting the comparison between treatments? Thank you!

    • @Bioinformagician
      @Bioinformagician  2 роки тому +1

      I am glad my video has been helpful! Regarding your question, yes you can use contrast to make a comparison between treatments.

    • @manavgandhi2503
      @manavgandhi2503 2 роки тому

      @@Bioinformagician Thank you. Could you also confirm if setting the reference as control for factors would draw the same comparisons that I wrote?

    • @Bioinformagician
      @Bioinformagician  2 роки тому +1

      @@manavgandhi2503 Yes, setting control as the reference will allow you to make comparisons between control and treatment levels. Only difference being, you will be able to see the order reversed i.e. "condition_treatment1_vs_control", which essentially gives you genes up/down regulated in treatment 1 compared to control.

    • @manavgandhi2503
      @manavgandhi2503 2 роки тому

      @@Bioinformagician Got it. Thank you!

  • @stanyang4321
    @stanyang4321 Рік тому

    very well explained. Can you tell how to plot heatmap for the data you analyzed in this video ?

  • @Ice84letters
    @Ice84letters 3 місяці тому

    Excellent videos, what does it means the following error when doing the DeSeq matrix? "In DESeqDataSet(se, design = design, ignoreRank) :
    some variables in design formula are characters, converting to factors" thank you very much

  • @jenniferalexandrasolano-go1997
    @jenniferalexandrasolano-go1997 7 місяців тому

    Thanks!

  • @sharincarin5977
    @sharincarin5977 2 роки тому +1

    I tried this protocol for a series matrix file that had log2 normalized value (all decimal values, downloaded from GEO).
    I received the error:
    _Error in DESeqDataSet(se, design = design, ignoreRank) :_
    _some values in assay are not integers_
    Does this mean this package cannot be used for normalized decimal values?
    If yes,
    Which package is more suitable?
    A link to any relevant protocol would be appreciated.
    Thanks

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      You cannot run DESeq2 on log2 normalized counts. You need raw counts to run this analysis.

  • @ahmedal-mammari9639
    @ahmedal-mammari9639 2 роки тому +1

    thank you so much for this very help videos, can you plz explain why you didn't do cpm tpm rpk rpkm before DESQ?

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      DESeq2 requires raw un-normalized read counts as it performs its own set of normalization steps. CPM, TPM, RPKM are all normalization methods that DESeq2 does not use.

  • @sanjaisrao484
    @sanjaisrao484 Рік тому +1

    Mam help.
    Which is better mam, ballgown or DEseq2 ?

  • @julioavazquezm6294
    @julioavazquezm6294 Рік тому

    Hello there, thank you so much for all amazing tutorials. Quick question: If I trying to analyze a counts normalized matrix (median of ratio DESeq2) Do I need to run DEGA? or log2 to that counts matrix?. Thank you so much for your help, I naive bioinformatician

  • @aaaa5nfgfghf
    @aaaa5nfgfghf 2 роки тому +1

    How can I start learning Bioinformatics from scratch ? what are the major skills required to be expert in Bioinformatics?

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      I think the best way to introduce yourself to bioinformatics is to take an online course. There are a lot of starter courses offered on platforms like Udemy or Coursera. There are various online bioinformatics workshops available as well - www.ecseq.com/workshops/workshop_2022-03-A-Practical-Introduction-to-NGS-Data-Analysis-Online-Course.html
      Skills required to be an expert? I don't know either, I am figuring it out too. lol

  • @grsbiosciences
    @grsbiosciences 2 роки тому +1

    What are technical replicates and biological replicates madam

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      I have explained this in one of my previous video - ua-cam.com/video/S1PcT5rp8c4/v-deo.html

  • @liviagozzellino7266
    @liviagozzellino7266 2 роки тому +1

    Hi, thank you for your explanation!!! Very useful video :) I only have one question: once I got the results, how do I select the most differentially expressed genes? Let's say I wanna view only the top 40 genes.

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      You can sort your results by largest log fold changes and lowest adjusted p-values. The first 40 genes on your list are the ones which are most differentially expressed. NOTE: log fold changes can be positive or negative, if you sort log fold changes descending, you will only get top genes with largest positive fold changes. If you wish to get top differentially expressed genes irrespective of the direction, you can sort by taking absolute log fold change values.

  • @manuelsokolov
    @manuelsokolov Рік тому

    Hi! I have one question, if data is given in the TPM format can you still apply the DESeq2? Does it only work with raw data? Thank you!

  • @humerainayat2858
    @humerainayat2858 2 роки тому +1

    I am getting this error when I am trying to create dds
    Error: unexpected '=' in:
    "dds

  • @smritimohanty3483
    @smritimohanty3483 Рік тому

    Hey. Thanks for the video.
    Can you just lemme know that whether you took one data or two different types of data?

  • @toanphanvan9739
    @toanphanvan9739 2 роки тому +1

    Fot the "set factor level" step, What should I do if I have 3 levels and I want to compare gene expression between three level?

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      You could do pairwise comparisons first and then take an intersection between them.

  • @ayaqz3144
    @ayaqz3144 2 місяці тому

    thanks

  • @jkim9931
    @jkim9931 2 роки тому +1

    I enjoy your videos! I have one question about your design. It seems like there are two categorical variables - cellLine and dexamethasone in the colData table. Is there some reason you didn't include the cellLine variable in the design matrix?

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      You are right, I could have used both. However, I wanted to keep it simple for this tutorial and explain how the DEseq2 works for one design factor (i.e. dexamethasone) and so my goal for this analysis was to study the effect of treatment on gene expression. I could have used complex designs like ~ cellLine + dexamethasone, if I wanted to test for the effect of dexamethasone while controling for the effect of cellLine.
      But in this case, to keep it intuitive I chose to demonstrate with one factor. Hope that answers your question. Thank you! :)

    • @jkim9931
      @jkim9931 2 роки тому

      @@Bioinformagician Thanks for the explanation. I was thinking about that. This is a tutorial video so it doesn't have to be more difficult. I think forming a design matrix is involved in linear models which is another topic to explain. Thanks!

    • @Bioinformagician
      @Bioinformagician  2 роки тому +1

      @@jkim9931 That is true, I have tried to explain linear models in my previous video (ua-cam.com/video/0b24mpzM_5M/v-deo.html). But yes, it can be a whole separate video in itself!

  • @bzaruk
    @bzaruk 2 роки тому +1

    how would you do a differential expression between multiple cell lines? do them in pairs and then find the shared highly differentially expressed genes? or is there a way of doing it in one analysis?

    • @aliciagarciaalonso6930
      @aliciagarciaalonso6930 10 місяців тому

      Same issue!! I guess one may be able to do this comparisons in one go using the 'contrast' parameter of the result function? But I haven't checked it out...

  • @vetlove4056
    @vetlove4056 Місяць тому

    How did you take that geneids to the serial number ?? Please guide mee

  • @fabioseiva-uenp9155
    @fabioseiva-uenp9155 2 роки тому +1

    Congrats for your videos! They are really, really very useful and well explained. Just one question, maybe you can help me; Do you know how can I find, using R, the gene names or symbols from the ENSG numbers?

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      I am in the process of making a video on it. Please stay tuned! :)

    • @fabioseiva-uenp9155
      @fabioseiva-uenp9155 2 роки тому

      ​@@Bioinformagician Thank you for your prompt reply. You can bet I'll be tuned. Also, could you tell me if it is possible to get the gene names from the ASHG numbers?

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      @@fabioseiva-uenp9155 I haven't dealt with ASHG numbers, can you tell me what database are they associated with?

    • @fabioseiva-uenp9155
      @fabioseiva-uenp9155 2 роки тому

      @@Bioinformagician Ok, I am learning about using datasets, extracted from GEO, so maybe I am asking the wrong question. The dataset I am referring to is GSE55191. After extracting the data, the ID I have is based on ASHG. Sorry for not being more specific, but if you could take a look and answer me, I would be extremely grateful.

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      @@fabioseiva-uenp9155 Found the mappings - www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL24530

  • @user-ub9qm5ip3c
    @user-ub9qm5ip3c Рік тому

    Hi. Thanks for the video. Please how do I view all the genes (expressed and unexpressed)?

  • @joseantonioduarteconde8743
    @joseantonioduarteconde8743 2 роки тому +1

    Thanks for your useful video! I am a beginner and I have some problems. When I upload my airway package (It is done well) I do not get to obtain the files.csv in my file folder. It looks like nothing happens. Is there another way to get them?

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      Can you give me exact commands you ran to get the data?

  • @munibabashir6951
    @munibabashir6951 2 роки тому +1

    Informative video. Thanks
    I have a query regarding data analysis if you could please help me in that. I have a data set for tumors that I downloaded from cancer data portal so now I have gene expression data and clinical data for both tumors. I want to compare the gene expression of both tumors but I am no getting from where I should start, how can I compare these tumors by using DESeq2. Please guide me. Thank you

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      A couple of questions -
      1. What data have you downloaded - RNA-Seq reads or quantified expression values?
      2. What is the format of the data - are these raw counts or normalized expression values?

  • @ashasanu1988
    @ashasanu1988 2 роки тому +1

    Hello Madam Thanks for you video, I am having 12 samples, in that 3 controls, 3 one trtmt, 3 another trtmt, 3 another trtmt,. So like this when there are 4 conditions, how to perform DESeq of those?

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      Did you try following my video or the DESeq2 vignette? Could you tell me what you tried and what didn't work?

  • @alvaroruiztabas5627
    @alvaroruiztabas5627 2 роки тому +1

    Congratulations for the video. Very helpful. I am having problems with the DESeq2 installation, R tells me that the path is not writeable. Any help?
    Thanks

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      Can you paste the exact error?

    • @alvaroruiztabas5627
      @alvaroruiztabas5627 2 роки тому

      @@Bioinformagician Of course, when I run "BiocManager::install("DESeq2")", in the end of the run the console show "There were 16 warnings (use warnings() to see them)" and hen I run "warnings()", R shows 1: In install.packages(...) :
      installation of package ‘png’ had non-zero exit status
      2: In install.packages(...) :
      installation of package ‘curl’ had non-zero exit status
      3: In install.packages(...) :
      installation of package ‘openssl’ had non-zero exit status
      4: In install.packages(...) :
      installation of package ‘RCurl’ had non-zero exit status
      5: In install.packages(...) :
      installation of package ‘RcppArmadillo’ had non-zero exit status
      6: In install.packages(...) :
      installation of package ‘httr’ had non-zero exit status
      7: In install.packages(...) :
      installation of package ‘GenomeInfoDb’ had non-zero exit status
      8: In install.packages(...) :
      installation of package ‘Biostrings’ had non-zero exit status
      9: In install.packages(...) :
      installation of package ‘GenomicRanges’ had non-zero exit status
      10: In install.packages(...) :
      installation of package ‘KEGGREST’ had non-zero exit status
      11: In install.packages(...) :
      installation of package ‘SummarizedExperiment’ had non-zero exit status
      12: In install.packages(...) :
      installation of package ‘AnnotationDbi’ had non-zero exit status
      13: In install.packages(...) :
      installation of package ‘annotate’ had non-zero exit status
      14: In install.packages(...) :
      installation of package ‘genefilter’ had non-zero exit status
      15: In install.packages(...) :
      installation of package ‘geneplotter’ had non-zero exit status
      16: In install.packages(...) :
      installation of package ‘DESeq2’ had non-zero exit status
      I don't really know how to fix it. Thanks!!

    • @anamikapandey4769
      @anamikapandey4769 2 роки тому +1

      @@alvaroruiztabas5627 install all the dependencies one by one, your problem will be resolved

  • @ayeshatariq8774
    @ayeshatariq8774 10 місяців тому

    Hi, thank you for this amazing video. I am currently doing a gene expression analysis. Even though I have the same row and col names in my counts and coldata I am still getting the FALSE arguments for
    all(colnames(Counts) %in% rownames(Coldata))
    can you please help with that?

  • @saswatsatapathy658
    @saswatsatapathy658 Рік тому

    I get stuck at 7:44 when we put "design = ~ dexamethasone" - Its shows an error of "some values in assay are negative"!! Can someone help here

  • @prabirbarman877
    @prabirbarman877 Рік тому

    Error in `[.data.frame`(countData, , rownames(colData)) :
    undefined columns selected

  • @mithunrock5427
    @mithunrock5427 2 роки тому +1

    Great video can i know what is your qualification and what do you do?

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      I have talked about my qualifications and what I do in one of my videos - ua-cam.com/video/yd8L7cPjI1Y/v-deo.html

    • @mithunrock5427
      @mithunrock5427 2 роки тому

      @@Bioinformagician Hy myself Mithun I have done my undergraduate in bioinformatics and currently pursuing my postgraduate in bioinformatics in Reva University and I am interested to do my PhD in US so I think u can guide me can I have u r mail id or insta Id so I can contact you about this.

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      @@mithunrock5427 Hi Mithun, you can reach out to me with your questions on LinkedIn/email, you can find the details to contact me in the description of every video.

  • @juliangrandvallet5359
    @juliangrandvallet5359 Рік тому

    Thank you! How can I plot now the heatmap? something like > heatmap(as.matrix(res)) ??

  • @mattmarino3460
    @mattmarino3460 2 роки тому +1

    What can you do if the column names in the counts data does not match the rownames of the coldata? I had to create my own sampleinfo file, the names are identical and yet it says they do not match. I have even created an entirely new data frame using the
    for_matching_df

    • @KiwiAteMonkey
      @KiwiAteMonkey 2 роки тому +1

      hello, I have the same problem. Could you solve it for your data?

    • @mattmarino3460
      @mattmarino3460 2 роки тому

      @@KiwiAteMonkey use the colnames() function on your counts data. Copy those one by one into an excel file, add other rows next to each group aka “treated vs u treated” save as .txt or .csv. Then read into R using read.delim(data, header = true, sep = “,”, row.names = 1, stringsAsFactors = FALSE) that should fix it. Then you can check rownames of sample info match the column name of the data using all(colnames(data) %in% rownames(sampleinfo))

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      Convert column col1 in for_matching_df into rownames, and check if all(rownames(for_matching_df) == colnames(counts)) is TRUE?
      If it is not true, then run this:
      counts

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      In general there are two things you will check:
      1. Are all column names in counts present as rownames in colData?
      all(rownames(colData) %in% colnames(counts)) # should be TRUE
      2. Are all rownames in colData in the same order as column names in counts?
      all(rownames(colData) == colnames(counts)) # should also be TRUE
      In case if 2. is FALSE, then column order can be changed by running this:
      counts

    • @hongyu9455
      @hongyu9455 2 роки тому

      @@Bioinformagician Thanks for your tutorial. I have similar issue regarding matching column and row name. I’ve copied the command from you tutorial and got the following error message: all(colnames(counts_data) %in% rownames(colData))
      Error in h(simpleError(msg, call)) :
      error in evaluating the argument 'x' in selecting a method for function '%in%': error in evaluating the argument 'x' in selecting a method for function 'colnames': object 'counts_data' not found
      Many thanks for your help!

  • @poulamigoswami8008
    @poulamigoswami8008 Рік тому

    When we are performing de novo, how to make counts and sample info files? Can you suggest me any tool for making those files

  • @justsoil15
    @justsoil15 Рік тому

    when i use my data, i have error "more columns than column names". I check your file and see 2 files are same format. Why can you read file without error?

  • @excelobiageli9446
    @excelobiageli9446 2 роки тому +1

    Nice video! Really helped, but I have 3 sample groups or levels. I have done pairwise comparison between the levels, but I don't know how to get final results

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      You can just intersect DE genes between both those comparisons.

    • @excelobiageli9446
      @excelobiageli9446 2 роки тому

      Okay, but how will my final result be like? Like the table containing the degs, will it still contain the original number of samples? And what values would it contain? That is where I am confused

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      @@excelobiageli9446 So for each pair-wise comparison, you will have a data.frame with differentially expressed genes, log fold change values, p-values, min.pct and q-values. You will have 2 such data frames from each comparison. You can filter differentially expressed genes based on p-values and/or q-values and log fold changes, and intersect genes column from both data.frames. So what you end up with a vector of genes differentially expressed from both comparisons.

    • @excelobiageli9446
      @excelobiageli9446 2 роки тому

      @@Bioinformagician okay!! Thank you very much

  • @OuanhPhomvisith
    @OuanhPhomvisith 8 місяців тому

    Hello, thank you so much for sharing the very helpful vdo. I want to know that when you load read counts data, the head of your read counts table shows only 8 columns of sample data excluding gene_id column, but when I do with my data, it still shows 9 columns (9 variables) including gene_id column. So, how can I do as you did? @Bioinformagician

  • @aymsagagi
    @aymsagagi 2 роки тому +1

    That is very helpful. i have a question, i always have problem with my metadata, i tried to save is as csv, text and so on but it always give me an error message "Error in DESeqDataSetFromMatrix(countData = cts, colData = metaData, design = ~condition) :
    ncol(countData) == nrow(colData) is not TRUE" could you please help me sort it out?
    Thank you.

    • @Bioinformagician
      @Bioinformagician  2 роки тому +1

      The error is telling you that the the number of columns in your countData is not equal to the number of rows in your colData. They should be equal. Check the columns and rows in your countData and colData to make sure you don't have any extra columns.

    • @aymsagagi
      @aymsagagi 2 роки тому

      @@Bioinformagician i solved that, thank you.
      But one more problem, i always gets error saying that i have duplicate rows, any suggestions on how to resolved that, especially how to easily check all the rows from a big data.

    • @Bioinformagician
      @Bioinformagician  2 роки тому +1

      @@aymsagagi
      To check for duplicate rows in a data frame:
      df[duplicated(df),]
      To remove duplicated rows:
      df %
      distinct()

    • @aymsagagi
      @aymsagagi 2 роки тому

      Thank you so much.

  • @PharmaAI-LearningCenter
    @PharmaAI-LearningCenter 2 роки тому +1

    Hi, I am new in this domain so please tell about how you got expression data for GSE52778 and how to club all 8 sample data in one csv file.

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      The data was already merged and provided by the authors.
      However, if you are interested to learn how to merge datasets, I have previously covered it. Here's where you can find it - ua-cam.com/video/HrbeaEJqKcY/v-deo.html

  • @prabirbarman877
    @prabirbarman877 Рік тому

    Error in DESeqDataSetFromMatrix(countData = countData, colData = colData, : ncol(countData) == nrow(colData) is not TRUE

  • @kobrarahimi9164
    @kobrarahimi9164 2 роки тому +1

    I have a question, you chose 10 for filtering genes. Is there any method to find this number? I used 10 for my data analysis, I got a lot of reads that was differentially expressed. and I think it is wrong. please help me, if there is method for finding filtering threshold.

    • @Bioinformagician
      @Bioinformagician  2 роки тому +1

      I followed the threshold provided in the vignette. There is no method to find a threshold, it depends on how relaxed or stringent you want to be while filtering for genes. However, you need to be careful as having a higher threshold might filter out genes that may be differentially expressed but are not highly expressed.
      You said you found a lot of reads differentially expressed, can you explain why you think it is wrong?

    • @kobrarahimi9164
      @kobrarahimi9164 2 роки тому

      @@Bioinformagician I got the data from ncbi and working on it. In the summary that was provided with the data, it is written that they are comparing knockdown strain with wild strain in both irradiation and unirradiation condition. They found 31 genes that was differentially expressed. Unfortunately, they did not submit the protocol in supplementary data. When I run this command with threshold 10, I get more than 3000 genes. I dont know where is my problem.

    • @kobrarahimi9164
      @kobrarahimi9164 2 роки тому

      And when I set the pvalue

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      If you have retrieved the data from NCBI GEO, you will find associated publication. Read through the paper to find out what thresholds they used and why. Without those, it is really difficult to re-create what they found. You should also be interested in "why" those thresholds were used, it is important to be able to justify your choices.

  • @acramulhaquekabir5852
    @acramulhaquekabir5852 Рік тому

    Are you planning to do a video on Gene enrichment analysis?

  • @shilpisehgal5613
    @shilpisehgal5613 2 роки тому +1

    Could you please make a video on how to collapse technical replicates? Thanks in advance.

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      Collapsing technical replicates could be done by collapseReplicates() function in DESeq2. I will surely plan on making a short video explaining it. Thanks :)

    • @shilpisehgal5613
      @shilpisehgal5613 2 роки тому +1

      @@Bioinformagician Thank you so much. I am looking forward to it.

  • @aritahalder9397
    @aritahalder9397 Рік тому

    I am getting this error, column 1 contains gene names
    counts_data

    • @aritahalder9397
      @aritahalder9397 Рік тому

      what needs to be done when there are duplicated gene names(with diff expression values) in the data?? Should we keep just one of the duplicated values or average out the values?

  • @prachimishra5517
    @prachimishra5517 2 роки тому +1

    While creating dds, I am getting a error that "count matrix should be numeric, currently it had mode: Character. Can you please tell me how to resolve.

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      Your counts data matrix might have some values as character, you need to convert all values to numeric.
      You can do that by running:
      apply(counts_data, 2, as.numeric)

    • @prachimishra5517
      @prachimishra5517 2 роки тому +1

      Now it is showing, NAs introduced by coercion

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      @@prachimishra5517 Is it possible for you to email me a screenshot of your counts matrix and the commands you are running on my email? You will find my email in the description of the video. Thanks!

  • @paradoxoftrips
    @paradoxoftrips 11 місяців тому

    Hi, I am PhD student struggling to use the R studio. would you please help me to perform differential abundance analysis for my data, please let me know, I will mail you the data. I would be very grateful to you.

  • @sanjaisrao484
    @sanjaisrao484 2 роки тому +1

    Please tell how to collect sample data From GEO data set, I used GEOquery for GSE99816, but didn't contain sample names same as data set, please help mam.
    I

    • @Bioinformagician
      @Bioinformagician  2 роки тому +1

      You will have to create one if it does not have sample metadata.

    • @sanjaisrao484
      @sanjaisrao484 2 роки тому +1

      @@Bioinformagician Can you please say how to do that. Btw thanks for answering my every query you don't know how much help you are doing . THANK YOU VERY MUCH ❤️

    • @sanjaisrao484
      @sanjaisrao484 2 роки тому

      @@Bioinformagician Mam reply please

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      @@sanjaisrao484 To give you an example, let's say your counts data has 5 columns, then you can create sample metadata like this:
      colData

    • @sanjaisrao484
      @sanjaisrao484 2 роки тому

      @@Bioinformagician Thanks mam