Pathway enrichment analysis tutorial in R with clusterProfiler()

Поділитися
Вставка
  • Опубліковано 31 гру 2024

КОМЕНТАРІ • 29

  • @singh_nimisha
    @singh_nimisha Рік тому +3

    Laura, you teach us like we are a bunch of kids. I find it awesome! You are so sweet! This helped me so much, Ma'am! Thank you.

  • @miguelreis6249
    @miguelreis6249 2 місяці тому

    Laura, thank you so much for doing these. Even hard heads like me can follow our tutorials, amazing stuff! The world bows in amazement.

  • @pulcinella96
    @pulcinella96 Рік тому +1

    Laura, thank you so much for all of your amazing content. It's helped me so much during my MSc course. Just wanted you to know that all of your hard work is much appreciated!

    • @biostatsquid
      @biostatsquid  Рік тому

      Thank you so much for your comment, this means a lot! Glad it helped:)

  • @xlxeat
    @xlxeat Рік тому

    The tutorial is very helpful even i ran the enrichment pipeline lots of times before. Your code gave me useful tips!

  • @joeyoviedo5202
    @joeyoviedo5202 Рік тому

    wow, so glad I found your channel, very high quality content. I would love to see more workflows using other clusterProfiler functions. Also, It would be cool to have workflow options for generating data visualizations that are good for comparing exposure groups and exposure windows using overlapping significant DEGs. Thank you! Have a squidtastic🦑day!

    • @biostatsquid
      @biostatsquid  Рік тому

      Thank you so much for your comment! Glad you like the videos. Great suggestions - will definitely add them to my list;) Quick question - what do you mean by 'exposure groups' and 'exposure windows'?

    • @joeyoviedo5202
      @joeyoviedo5202 Рік тому

      ​@@biostatsquid Hi, so I just mean for example when like there are lets say 3 exposure windows ie 24H, 3Days, 7Days and 3 exposure groups ie like different concentration of treatment or possibly different tissue/cell types, etc. Does that hopefully help what I mean lol. And its so nice to chat with you! Cheers!

    • @biostatsquid
      @biostatsquid  Рік тому +1

      @@joeyoviedo5202 Oh I see, so like ways to visualise comparisons of DEGs at different time points and possibly groups? That's a really good idea, will definitely add that to my todo list;) Thanks for the suggestion!

  • @xiaofeili7379
    @xiaofeili7379 10 місяців тому

    This is a great tutorial. I have a question, how about if I want to analyze mouse data and GSEA didn't have a murine KEGG gene set?

  • @苏秋羊
    @苏秋羊 6 днів тому

    I need help, teacher. Since df include all the gene not just the differential genes, how to get the whole genes list since i saw that some p value is above 0.05, how do i get that list for my scRNA analysis. Appreciate it.

  • @markrenton6981
    @markrenton6981 Рік тому +1

    Has anyone tried changing all of the mouse .gmt files to .RDS? I can get all of them to do it except for the GO CC set. Anyone else run into this problem?
    It will read the .gmt file, but when i execute the saveRDS() function, it just doesn't appear in the folder like it did for all of the other .gmt files

  • @aidaht1
    @aidaht1 Рік тому +1

    your channel and videos are greatI liked your website as well, ! thanks so much for your help.
    I have a question, I have conducted differential expression analysis on TCGA-PRAD and a microarray dataset (GPL570) to get differential expressed genes between Normal and Cancer tissues.
    after that I drew a Venn diagram to get common DEGs between these two dataset, however my common DEGs ar just gene symbols, I don't have logFC or p.value for them(I have these for each of the datasets but I don't have them after drawing Venn diagram).
    how can I do PEA with cluster pofiler for my common DEGs obtained from Venn diagram? thanks in advance.

    • @biostatsquid
      @biostatsquid  Рік тому

      Hi! Thanks so much for your feedback, I'm glad your found them useful!
      I think the best option is to perform PEA independently for each of the two datasets (careful, remember to subset the background genes for the genes present in the datasets separately). Then maybe you can use a similar approach and see which pathways overlap.
      Otherwise, you might consider doing GSEA (video coming up soon!) on your selected gene list, ranking them by a consensus metric - e.g., some kind of average (but careful if you are considering log2FC as the sign is also important). This paper on concordant integrative gene set enrichment analysis might help: pubmed.ncbi.nlm.nih.gov/24564564/
      Hope this helped!:)

  • @KeshavSharma-lh7zf
    @KeshavSharma-lh7zf 7 місяців тому

    can i follow the same for proteomics data

  • @shawsheryl5092
    @shawsheryl5092 Рік тому

    Aaaaaaaaawesome!!!!! I've finished watching all your videos about pathway analysis and they really help a lot!! I'm really grateful for your excellent explaination!!!! But I wonder if I could apply GSEA into proteomic analysis? I've get the expression matrix of the proteins, but I don't know if I could match the protein ids with the gene set... could you please provide me some suggestions? I'd be approciate it a lot!!

    • @biostatsquid
      @biostatsquid  Рік тому

      Thanks for your comment! Glad you liked the videos:)
      Unfortunately,I have never applied GSEA to proteomics (which I believe is called PSEA;) so I cannot give you a sure answer, but I here are some suggestion to try out:
      - Following the same steps as for GSEA, but before running GSEA, convert gene symbols to protein IDs. There are a few tools to do this within R, or you could also use the UniProt Retrieve/ID Mapping tool. I think this should work if the IDs match, and you use gene sets based on protein-coding genes.
      - You might want to check out this publication, presenting PSEA-Quant: www.ncbi.nlm.nih.gov/pmc/articles/PMC5352860/
      It allows you to perform PSEA (it's a web-based tool as far as I know) - but most importantly, if you check the methods you might figure out how to download protein sets from the tool itself.
      Hope you find a solution! Let me know! Good luck!

    • @shawsheryl5092
      @shawsheryl5092 Рік тому

      @@biostatsquid Thanks for your suggestions! I'm sorry to reply you so late because I'm not confident of my consequences.
      First I checked the PSEA-Quant article but I failed to visit the url they provided.🥲
      Then I tried to find if there are protein datasets directly matching Uniprot ID so that I can lose as least information as I can. But when I tried to use uniprot id to analyse by clusterprofiler(), it showed error. I even tried to make my own gmt file(use uniprot id directly) to use in gsea, but it failed too. And I'm not that professional enough to build my own package...(keep learning💪)
      Finally, I chose to transfer uniprot id into entrezid, and got my results. But I doubt the reliability of this method because some proteins come from the same gene, and some of them up, some of them down, which may act as counteraction. Fortunately in my protein set there are only 2 proteins from the same gene and I eliminate them, to some degree the result still has some value as a reference.

  • @LeviRafal
    @LeviRafal Рік тому

    Should codes in this chunk:
    # Subset to those pathways that have p adj < cutoff and gene count > cutoff (you can also do this in the enricher function)
    target_pws genecount_cutoff]) # select only target pathways have p adjusted < 0.05 and at least 6 genes
    res_df genecount_cutoff)
    as there are some cases when one of the two direction (up or down) of pathways with the same name does not pass the padj_cutoff, so directly filtering the values themselves would be more accurate?

  • @Myri912
    @Myri912 7 місяців тому

    Hello! thank you very much for the video, it has helped me a lot. However I had a query as I have played the whole script on my computer with my own SDR data. I have run the whole script and everything seems to be correct except when I run the last step "target_pws

    • @Myri912
      @Myri912 7 місяців тому

      I have another query, I have tried to use another data set and I get this result directly when running ClusterProfile: --> No gene can be mapped....
      --> Expected input gene ID: HSD11B2,PTPN11,ABCG1,GALE,WASL,PLA2G12A
      --> return NULL...
      --> No gene can be mapped....
      --> Expected input gene ID: APBB1,BID,GALT,NDUFA1,ABCB4,RUNX1
      --> return NULL...
      It's like my genes don't match...how can that happen?
      Thanks in advance!

  • @MinuMathews-dc3oy
    @MinuMathews-dc3oy 9 місяців тому

    When i put in df

    • @biostatsquid
      @biostatsquid  9 місяців тому

      That's probably because your file is in a different folder, or not there at all. Make sure to download the file, put it in a folder and then set in_path to the full path of that folder. You can check if the file is there with list.files(in_path), for example. Hope this helps!

  • @praveenkhatri4084
    @praveenkhatri4084 Рік тому

    Very informative, I was wondering, If I want to GSEA for plant for eg soybean, how I do that, as ORG.db library is not available for that, can u plz help me with that

    • @biostatsquid
      @biostatsquid  Рік тому

      Hi Praveen, thank you for your comment! Actually, I have no experience working with non-model organisms, but I think perhaps another tool might be of more use?
      I saw a few people recommend agriGO enrichment tool for plant species -
      www.biostars.org/p/112022/
      www.biostars.org/p/261449/
      but if you want to stick with clusterProfiler, you can always create a custom gene set, as long as you keep the format clusterProfiler needs:)
      Good luck!

  • @mihacerne7313
    @mihacerne7313 Рік тому +2

    SQUUUUUUUUUUUUIDTAAAAAAASTICCCCCC

  • @tanmoychatterjee7922
    @tanmoychatterjee7922 3 місяці тому

    Please ma'am don't use preinputed code it is not helpful. We need how to write R script

  • @nidhilangelomariyelil6091
    @nidhilangelomariyelil6091 26 днів тому

    Lady you are good. But you only tell the facts half only.
    We get easily confused in certain points like gmt files in half way.
    If are decided to teach just do it correctly.
    For beginners in bioinformatics you are mocking is.
    Half truth is worse and worst than lying.

  • @shrivastava3892
    @shrivastava3892 6 місяців тому

    The differential data that you loaded in the r script initially, which has approx 30 thousand something genes and four variables, are they pre-processed data, like removing the duplicates and adjusting the p values and log FC?? Or are they raw data tT saved from r script?