DoubletFinder: Detect doublets in single-cell RNA-Seq data in R | Detailed workflow tutorial

Поділитися
Вставка
  • Опубліковано 31 січ 2025

КОМЕНТАРІ • 60

  • @ravimore5786
    @ravimore5786 2 роки тому

    Thank you very much for this workflow. It's really helpful to understand the process and steps involved in the doubletfinder. I appreciate your efforts to educate the researcher through this activity.

  • @kendy17
    @kendy17 3 роки тому +1

    You're awesome keep up the amazing work!

  • @kitdordkhar4964
    @kitdordkhar4964 3 роки тому +4

    This was very useful. It was different from our analyst strategy. Small request, instead of terminal bash, it would be helpful if you can route through save folders and files [setwd> ]. Thanks!

    • @Bioinformagician
      @Bioinformagician  3 роки тому +1

      Thank you for the suggestion, I am more comfortable in maneuvering through the folders via terminal. However, I shall try to do it via R next time :)

  • @sinanouraei7335
    @sinanouraei7335 Місяць тому

    Very nice tutorial 👍

  • @quamtumone
    @quamtumone 3 місяці тому

    Really great tutorial!

  • @kanahia7460
    @kanahia7460 3 роки тому

    I do really enjoy your channel 🤠 I am doing same analysis and it is very kind of you that you share your approach and code! Many thanks 👍

    • @Bioinformagician
      @Bioinformagician  3 роки тому +1

      I am glad to hear my videos have been helpful! Thank you for your kind words :)

  • @parmenideskim9739
    @parmenideskim9739 2 роки тому

    A really great video!!! Thank you very much !!!

  • @hyebinhan6473
    @hyebinhan6473 2 роки тому +2

    THANK YOU!!! This was a life saver. Quick question: I plan to use tabula muris senis, the mega mouse single-cell dataset and I was able to manuver through selecting age/organs I wanted to use. BUT I believe they have datasets per mouse and per organ... if that's the case, do I still have to run doubletFinder on each mouse or do you think I can use the selected age/organ, with the assumption that preparation process was similar enough that batch effect would likely be minimal..... I have 15 mice on Tabular muris I plan to use and additional 15 mice I have to filter 🥲

    • @Bioinformagician
      @Bioinformagician  2 роки тому +2

      I suggest you first process your data with all 15 mice at once, as a merged object and visualize. Look for batch effects. If you don't find any, then you run doubletFinder on merged object. If you do find batch effects in your data then you will have to take the run doubletFinder for each individual mice route.

  • @tushardhyani3931
    @tushardhyani3931 2 роки тому

    Thank you for this video !!

  • @juliwang3751
    @juliwang3751 5 місяців тому

    I think it's important that you explain why we assumer 7.5% doublet in our data. I know it has something to do with the number of droplets captured. But how do we determine the number of droplets captured (in order to infer the estimated % of real doublets)? Thank you!

  • @SavannahVictoria-d8i
    @SavannahVictoria-d8i Рік тому

    Thank you for your tutorial,could you please tell me if the paper tell us how to mark doublets in the raw data?

  • @熊飞-b5k
    @熊飞-b5k 7 місяців тому

    Hello Khusbu,
    when I run "> sweep.res.list

  • @pin-juikung5794
    @pin-juikung5794 3 місяці тому

    so at the final, if I would like to filter out those doublets, and continue my rest analysis, what should I do to filter out those doublets?

  • @xiaosajackxu4242
    @xiaosajackxu4242 2 роки тому +1

    Amazing job! Can you paste your codes of how you subset and recluster singlets after finishing DoubletFinder? Or can you confirm if you did exactly the same as the following steps? Thanks!
    singlet

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      Yes, I would run the steps you ran to recluster my cells after removing doublets from my data.
      Thank you for the suggestions for video topics, I have them in my pipeline :)

  • @kalpanidesilva3062
    @kalpanidesilva3062 Рік тому

    Thank you very much. Can you please do a tutorial on how to use DropletUtils library

  • @giovaniclaresta2356
    @giovaniclaresta2356 Рік тому

    Hi Thank you for very details tutorial!! May i know how I can get the cell identity from demuxlet data after I get all the singlet?thank you

  • @熊飞-b5k
    @熊飞-b5k 2 роки тому +1

    Thank you for this video, but the question is whether the search and removal of doublet should be carried out before data merging and QC. In your previous video of data integration, you merged 7 samples. Does that mean that we need to clean the data 7 times before merge?Hope for your reply.

    • @熊飞-b5k
      @熊飞-b5k 2 роки тому +1

      What I mean is when we need to integrate several datasets, before which step should we perform the detection of doublets?Befor merge datasets?If the detection of doublets should be done before merge() function, is it necessary to perform QC and pre process standard workflow for each dataset separately?

    • @Bioinformagician
      @Bioinformagician  2 роки тому +2

      Yes, it is recommended to perform doublet removal and QC for each dataset individually before integrating datasets. It can however be run on merged data. The standard workflow steps just helps identify and remove clusters of cells with low UMI or high mitochondrial %. These low quality cells must be filtered out before running a doublet prediction algorithm and before integrating and moving ahead with further downstream analysis.

  • @kimiaslk9348
    @kimiaslk9348 11 місяців тому

    you are amazing
    thank you so much

  • @pariaalipour61
    @pariaalipour61 2 роки тому

    Thank you so much for this helpful video. I have a question. At the last step that we detect doublets and we remove them how we could go back to the first step to do integration? no sure how to transfer the needed assay to the data.

    • @Bioinformagician
      @Bioinformagician  2 роки тому +1

      You shall use "integrated" assay (if used CCA method to integrate), and move forward with the steps just how you would process data in 'RNA' slot of Seurat object.

    • @pariaalipour61
      @pariaalipour61 2 роки тому +1

      @@Bioinformagician When I do DoubletFinder the integration still needs to be done. I mean after subsetting doublets from every individual sample, what approach I need to take. Should I move forward with subsetted samples and integrate. Thanks

  • @뚱카-y2t
    @뚱카-y2t Рік тому

    22:25 I want to clear lines with doublet characters from DF.classification column in metadata table. How can I clear it by typing command?
    Because to remove the doublet and integrate all samples.

  • @anaarsenijevic3207
    @anaarsenijevic3207 Рік тому

    Hello, Thanks for the great tutorial! I have one question, maybe I missed it, but - why do you use the nsclc data when calculating the pK value (starting from line 47) rather than pbmc that you used in the steps before that? Thank you!

    • @RupakDeySarkar
      @RupakDeySarkar Рік тому

      @anaarsenijevic3207, she used the pbmc seurat object only in line 47. Only the name of the list she created has the nsclc name, you can name it anything you want.

  • @Carolina_pt
    @Carolina_pt Рік тому

    Thank you so much for this tutorial it's very informative. I was wondering if you knew how to find the expected number of doublets for icell8 sequencing data? Thank you in advance

  • @mostafaismail4253
    @mostafaismail4253 2 роки тому +1

    Please we need application of NMF (non negative matrix factorization) in scRNA-seq for finding expression programs

    • @Bioinformagician
      @Bioinformagician  2 роки тому +1

      I'll consider making a video on this soon :) Thanks for the suggestion.

  • @Ob-xt4ej
    @Ob-xt4ej 2 роки тому

    Thank you for tutorial. I run pK Identification code, and then pK=0.2. The number of doublets is the same, but the shape of the graph is different. I wonder if I can move on to the next step or if I need to fix this issue. Thank you!

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      Did you use Strategies for pK optimization? Did you find your optimum pK to be 0.2?

  • @sonaaritra
    @sonaaritra Рік тому

    Hello Khusbu, I'm working with a publicly available dataset GSE193688 where they have provided individual .h5 files for every samples. I'm trying to run the doublet finder program on it but as you have mentioned that it should not be preferable to run on merged samples then should I run it for each one separately? I have a total of 18 files for individual biopsy samples. Is there any faster method?

  • @veerachon2281
    @veerachon2281 2 роки тому +1

    Could you please explain, How to assume this or this value is commonly expected ? -> Assuming 7.5% doublet formation rate

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      10X user guides provide expected multiplet rate for different protocols.
      Here I have used the table on page 18 from the Chromium Next GEM Single Cell 3ʹ Reagent Kits v3.1 user guide (support.10xgenomics.com/single-cell-gene-expression/library-prep/doc/user-guide-chromium-single-cell-3-reagent-kits-user-guide-v31-chemistry) to get the doublet formation rate.

    • @youvikasingh7955
      @youvikasingh7955 Рік тому

      @@Bioinformagician But what if I had 10000 cells as input and approx 1100 recovered cells?🤔..Thanks really helpful channel😍

    • @jessicacastillo8535
      @jessicacastillo8535 Рік тому

      @@youvikasingh7955 How did you solve that issue? Thanks!

  • @blackmatti86
    @blackmatti86 2 роки тому

    Can I still run DoubletFinder on 'SCTransform normalised' sample?
    If yes, is it as simple as setting 'sct = TRUE' in 'sweep.res.list_pbmc

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      DoubletFinder can be used on Seurat object that has been SCTransform during pre-processing steps. And yes, it is as simple as setting sct = TRUE.

  • @chadhighfill4578
    @chadhighfill4578 2 роки тому +2

    How would you filter out the doublets?

    • @tomasmontserrat704
      @tomasmontserrat704 2 роки тому +5

      I think you can use subset():
      pbmc.seurat.filtered

    • @Bioinformagician
      @Bioinformagician  2 роки тому +1

      That's right! You can use subset() to filter out doublets.

    • @chadhighfill4578
      @chadhighfill4578 2 роки тому

      @@Bioinformagician How do you do this when DF.classification_SOME VALUE is always changing? i.e. how do you filter out the doublets in a dynamic way?

  • @EdDone-q6g
    @EdDone-q6g Рік тому

    Thanks for this workflow and shared the code. I have one issue when I run your code at the second last step.
    > DimPlot(pbmc.seurat.filtered, reduction = 'umap', group.by = "DF.classifications_0.25_0.21_691")
    Error in `[.data.frame`(data, , group) : undefined columns selected
    In addition: Warning message:
    The following requested variables were not found: DF.classifications_0.25_0.21_691
    Could you please help to check it?
    Thanks.

  • @tulikabhardwaj484
    @tulikabhardwaj484 3 роки тому

    Thanks thanks thanks a lot

  • @marionaisern6420
    @marionaisern6420 2 роки тому

    I don't understand why in a dataset of 15000 real cells, a pN of 0,25 would represent the integration of 5000 artificial doublets... If anyone can solve my question...
    Thank you!!!

  • @tulikabhardwaj484
    @tulikabhardwaj484 3 роки тому

    Waiting for your metagenomics and metatranscriptomics one.

    • @Bioinformagician
      @Bioinformagician  3 роки тому

      I will surely consider making a video on this in the near future :)

  • @Surajcxscsingh
    @Surajcxscsingh Рік тому

    so we are only putting aside hetrotropic doublets not homotropic

  • @blackmatti86
    @blackmatti86 2 роки тому

    What do you do when running 'bcmvn_pbmc

    • @Bioinformagician
      @Bioinformagician  2 роки тому

      I am unable to answer why you get NULL at find.pK step as I cannot recreate this error.

    • @rahmaqadeer9178
      @rahmaqadeer9178 2 роки тому

      Did you sort this out? I also get the same 'null' as I run this although my data is stored in this variable when I print it

    • @blackmatti86
      @blackmatti86 2 роки тому

      @@rahmaqadeer9178 No, didn’t manage to fix this

    • @beatriceplougastel-douglas1861
      @beatriceplougastel-douglas1861 Рік тому

      I am also getting ' bcmvn_nsclc % select(pK)' my numeric value for the pK is 20

    • @NBAasDOGG
      @NBAasDOGG Рік тому

      @@rahmaqadeer9178
      The problem is that ParamSweep cannot find your normalized RNA counts.
      Here’s how to fix it:
      Instead of using "NormalizedData(sobj, normalization.method = "LogNormalize", scale.factor = 10000)"
      Do the following:
      "sobj