Single cell analysis in python with Scanpy

Поділитися
Вставка
  • Опубліковано 19 січ 2025

КОМЕНТАРІ • 60

  • @tousifazmain6205
    @tousifazmain6205 Рік тому +4

    These are the tutorials that deserve millions of likes. A well-detailed tutorial is so rare on the internet although much needed. Godspeed my friend!

    • @sanbomics
      @sanbomics  Рік тому +2

      Thanks! If you liked this I would check out my "comprehensive" hour long video that goes into much more depth

    • @jorge1869
      @jorge1869 Рік тому

      I wonder what are the steps before this. I mean, how we generate the .mtx files.

    • @tousifazmain6205
      @tousifazmain6205 Рік тому

      @@jorge1869 Mtx files are output of Cellranger count/aggr

    • @jorge1869
      @jorge1869 Рік тому +1

      @@sanbomics I watched that video and it starts from .mtx files. I would like to see from the start with the fastqc files to the final analysis with scanpy. It would be great, huge. The complete workflow.

    • @tousifazmain6205
      @tousifazmain6205 Рік тому

      @@jorge1869 Yes I agree. @sanbomics

  • @RaS-h4r
    @RaS-h4r 2 місяці тому +1

    so how do we decide that n_neighbors=10 during clustering. do you think that I should change it if I am analyzing tissue cells

  • @abdelrahmanabdelhadi4195
    @abdelrahmanabdelhadi4195 Рік тому +1

    Extremely helpful, thank you !!

  • @blackmatti86
    @blackmatti86 2 роки тому +1

    Is there a way to display value counts, i.e. number of cells on the UMAP? I know you can display annotations on the UMAP by using ** legend_loc='on data' ** But I can't seem to find a way to display the number of cells next to/under the cluster names 🤔

    • @sanbomics
      @sanbomics  2 роки тому +1

      Yup! I haven't done this exactly, but you can probably access the matplotlib text objects from the plot directly and update their text to include the number of cells in that cluster. Alternatively, and potentially easier if you are new to matplotlib, you can just add text directly to to the plot with matplotlib.pyplot.text() and just adjust them manually to your desired position. Of course the former is more automated and elegant, but the latter works too if you only plan to do it a few times and don't mind the manual step.

  • @bnb7462
    @bnb7462 Рік тому

    Thank you so much for videos. What dataset are you using for this? It is hard for me to find the dataset that you are using in this clip.

    • @sanbomics
      @sanbomics  Рік тому +1

      This was my own data. This video is getting pretty outdated. I would recommend my complete single-cell tutorial instead. That also uses public data which you can download to follow along

    • @bnb7462
      @bnb7462 Рік тому

      @@sanbomics got it. I wanted to replicate your graph. Thanks though. Thanks to your all clips, I have learnt a lot. Happy Christmas

  • @Phoenixpapagei
    @Phoenixpapagei 2 роки тому

    Thank you so much! Your videos are really helpful to navigate the world of scRNA

    • @sanbomics
      @sanbomics  2 роки тому

      I am glad the videos are helpful for you! Thanks!

  • @amiel954
    @amiel954 Рік тому

    Hello, to access the dataset do i have to do : tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz
    or is it a another file ?

    • @sanbomics
      @sanbomics  Рік тому

      that looks like one that should work. What is the output after tar?

  • @daniel98carvalho
    @daniel98carvalho Рік тому

    Do you know where I would be able to find scRNA-seq FASTQs for healthy human lung tissue other than Tabula Sapiens?

    • @sanbomics
      @sanbomics  Рік тому +1

      Most studies that do scRNAseq will have a control condition. So you can look for disease datasets and just take the control samples. For example, there are a lot of COVID datasets. You can check out the one I use in my 1+ hour SC video

    • @daniel98carvalho
      @daniel98carvalho Рік тому

      @@sanbomics Perfect. Thanks so much for the input, I really appreciate it!

  • @sledgedoon3656
    @sledgedoon3656 3 місяці тому

    Did this video is made by Preprocessing and clustering 3k PBMCs (legacy workflow)? I didn't sure to use legacy workflow or the newest! Hoping can answer my doubts, thanks a lot!

    • @sanbomics
      @sanbomics  Місяць тому

      This video is very outdated at this point. I would check out some of my newer videos. Things change fast!

    • @sledgedoon3656
      @sledgedoon3656 Місяць тому

      @@sanbomics OK, thanks a lot!

  • @DipakKumar-de7gz
    @DipakKumar-de7gz Рік тому

    Hey. I am getting an attribute error in my code while finding the marker gene.
    I am at the last step where I am making a data frame for the marker genes. The error is coming while finding the zika index as per your code. The line is:
    mark_i = np.where(adata.raw.var_names == 'FAM83D')[0][0]
    mark_i
    adata.raw.X.toarray()[:, mark_i]
    The attribute error is coming in the last step because my adata.raw.X is not a sparse matrix. The dimensions are already there somehow. Can you tell me how to proceed?

    • @sanbomics
      @sanbomics  Рік тому

      Hi, sorry for the late reply. You can just remove the .toarray() part because this is what converts the sparse to dense

  • @antoniogiuseppefaietalasar6849

    I am wondering if your videos can be applied to the more common bulk RNA databases such as TCGA

    • @sanbomics
      @sanbomics  Рік тому

      Which videos/parts in particular?

  • @SrinithiRanganathan
    @SrinithiRanganathan Рік тому

    Great explanation

  • @ianik
    @ianik 2 роки тому +1

    Quality content!

  • @DeeptiMittalArora
    @DeeptiMittalArora 4 місяці тому

    Which data you used for analysis? Could you please provide the link?

    • @sanbomics
      @sanbomics  Місяць тому

      This was in house data. But there are plenty of tutorial datasets easily available like PBMC 3K. I wouldn't recommend following this tutorial anymore though, because it's outdated. I have newer ones

  • @acorndaydreams3706
    @acorndaydreams3706 2 роки тому +1

    Thank you for this video

  • @tenzintseten8928
    @tenzintseten8928 Рік тому

    Thanks alot for the tutorial video. Just wondering if there are any ways in python or R to analyze bacterial genome sequence ? Would be highly appreciated if you could help

    • @sanbomics
      @sanbomics  Рік тому

      There are many ways, but it depends on what you are trying to do.

  • @wholu8497
    @wholu8497 2 роки тому

    Interesting to see that XIST shows downregulation in Zika group... 16:16

    • @sanbomics
      @sanbomics  2 роки тому

      Nice catch! Very interesting... These data aren't from one of my ongoing projects or I would have liked to explore that more!

  • @vjsanchezarevalo
    @vjsanchezarevalo 2 роки тому

    I have tried to load the three files in sc and I got an error: Keyerror: 2, any idea?

    • @sanbomics
      @sanbomics  2 роки тому +1

      Can you put the line of code you used here? It is 10x cellranger output?

    • @vjsanchezarevalo
      @vjsanchezarevalo 2 роки тому

      @@sanbomics Those are the files that I have:
      GSM3577886_late_KPC_barcodes.tsv.gz
      GSM3577886_late_KPC_features.tsv.gz
      GSM3577886_late_KPC_matrix.mtx.gz
      This is my code: adata=sc.read_10x_mtx('./', prefix='GSM3577886_late_KPC_',var_names='gene_symbols', cache=True )
      Thanks!

    • @vjsanchezarevalo
      @vjsanchezarevalo 2 роки тому

      Any suggestion?

    • @sanbomics
      @sanbomics  2 роки тому

      Hmm.. it says I have 3 replies but I only see 2. Did you comment the code? Can you make it an issue on my github?

  • @ohohjournal5828
    @ohohjournal5828 2 роки тому

    thanks, great video!

  • @saxtoncruz6128
    @saxtoncruz6128 2 роки тому

    Great Vid, Im new to the channel but you have helped me a ton already. Quick question... When I load my anndata object from the tutorial file. I have:
    AnnData object with n_obs × n_vars = 2700 × 32738
    var: 'gene_ids'
    instead of :AnnData object with n_obs × n_vars = 8845 × 36602
    var: 'gene_ids', 'feature_types'
    any advice? thanks in advance!

    • @sanbomics
      @sanbomics  2 роки тому +2

      Is 8845 × 36602 what I had? That is fine, it just means your sample has 2700 cells and 32738 genes. Don't worry!

    • @saxtoncruz6128
      @saxtoncruz6128 2 роки тому

      @@sanbomics Thanks for the update! Love the Channel. Could you give some advice on how to subcluster on a particular cluster? I would like to analyze the difference between say cluster 0 and itself, not against all clusters. Thanks for the help your awesome!

  • @musedmoments
    @musedmoments 2 роки тому +1

    Do Scanpy + SCVI tutorial please!

    • @sanbomics
      @sanbomics  2 роки тому +1

      I shall keep that in mind for a future video!

  • @hypergamer1078
    @hypergamer1078 Рік тому

    Very nice

  • @chrisdoan3210
    @chrisdoan3210 2 роки тому

    Thank you for this video! Would you please made a video that compare scRNA-seq data between non-diseased vs diseased sample using scanpy? I appreciate that!

    • @sanbomics
      @sanbomics  2 роки тому

      Hi. The most recent scRNA video (the long one) I have lethal covid and healthy samples. Albeit, i don't do too much comparison between the two.

    • @chrisdoan3210
      @chrisdoan3210 2 роки тому

      @@sanbomics All videos you made are super helpful and concise.

  • @jxyeee6525
    @jxyeee6525 2 роки тому

    Would Scanpy be good for analyzing copy number variant data (presented in similar matrix format: cell by gene)

    • @sanbomics
      @sanbomics  2 роки тому

      Hmm, what exactly are the data?

    • @jxyeee6525
      @jxyeee6525 2 роки тому

      @@sanbomics I don't understand why the comment is not showing up whenever I refresh the page, but the cnv data I am thinking about using is the 10X genomics' aggregated tnbc cnv dataset (on their website).