How to rarefy community data in R with vegan and the tidyverse (CC200)

Поділитися
Вставка
  • Опубліковано 5 вер 2024

КОМЕНТАРІ • 29

  • @silviagonzalezcolino858
    @silviagonzalezcolino858 4 місяці тому +1

    Thank you so much for your tutorials!! You make complex things look easier which is very helpful (specially in analysing data)

  • @deborahdupont-valcy496
    @deborahdupont-valcy496 2 роки тому +1

    Thank you so much for this detailed explanation! It was really useful for my own data.

    • @Riffomonas
      @Riffomonas  2 роки тому

      Wonderful - I'm glad it was useful! Thanks for watching

  • @charleslehnen9636
    @charleslehnen9636 Рік тому

    Check out the parameter of vegan's function:
    "Instead of drawing a plot, return a “tidy” data frame than can be used in ggplot2 graphics. The data frame has variables Site (factor), Sample and Species."

    • @Riffomonas
      @Riffomonas  Рік тому +1

      Thanks - yeah i think that's new since I made the video

  • @sven9r
    @sven9r 2 роки тому +1

    we rarely see this face @20:44 ! Pat thinking longer than a nanosecond about one of his 2198321673213 variables.

    • @Riffomonas
      @Riffomonas  2 роки тому +1

      Lol. Plus I think it was the end of a long day at the end of a long week 😂🤓

  • @meseretmuche6984
    @meseretmuche6984 2 роки тому +1

    thank you so much for your unlimited help,
    Dr, if you have a lecture regarding Hill number (q=0, q=1, q=2) for diversity analysis of vegetation, please provide me.

    • @Riffomonas
      @Riffomonas  2 роки тому

      My pleasure? Unfortunately I don’t have anything about hill numbers

  • @oluwafemioyedele
    @oluwafemioyedele 2 роки тому +1

    @Path, I think there is a function to deal with either rownames or columname in tibble package

    • @Riffomonas
      @Riffomonas  2 роки тому +1

      Ah - you're right! Thanks :) tibble.tidyverse.org/reference/rownames.html

  • @brantainman
    @brantainman 2 роки тому +1

    @ 5:00 it is suggested that tibbles do not allow row names. I think this is incorrect and the following code is the tidy way to do it:
    shared %>%
    pivot_wider(names_from = name, values_from = values, values_fill = 0) %>%
    column_to_rowname('Group')

    • @brantainman
      @brantainman 2 роки тому +1

      Also, vegan has a great new feature that avoids all the data manipulation for getting tidy data: my_curves

    • @Riffomonas
      @Riffomonas  2 роки тому

      That’s great to see!

    • @Riffomonas
      @Riffomonas  2 роки тому

      This actually creates a data frame rather than a tibble. A tibble is a special kind of data frame

  • @cristianvillenaalemany7972
    @cristianvillenaalemany7972 2 роки тому +3

    Thank you very much for all the material, very useful!
    I have different library sizes in my microbiome data and I would like to normalize it using rarefaction to min_n_seqs since the smaller sample contains more than 12000 reads, as you well explained. If I use vegan:rrarefy, I obtain the specified subset of reads from my original OTU table. One single random subset might not be representative enough for each sample since there is high diversity. Is there a way you recommend to carry on a multiple iteration rarefaction and a final OTU table in which the values are the average of the multiple subsets?
    Thanks for your attention.

    • @Riffomonas
      @Riffomonas  2 роки тому +2

      Thanks for watching! RUnning rrarefy a bunch of times and then averaging the counts is effectively the same as using the relative abundance, which I showed in an earlier episode causes problems. I would suggest running whatever test you're doing on single a subsampling and then repeat it a few times to see if the results change any. In my experience, the low relative abundance taxa are what change the most and for most OTU-based analyses they don't come up as significant. if they do, I generally discount them because they're so rare.

    • @cristianvillenaalemany7972
      @cristianvillenaalemany7972 2 роки тому

      ​@@Riffomonas I will try as you suggest, it makes a lot of sense. Again, thank you very much for your help! Your videos are awesome!

  • @bugslutt
    @bugslutt Рік тому

    If I want to loop the rrarefy command on my data matrix 1000 times and save all the output (to ultimately calculate an average), what code would I use? I've been trying to figure it out and am struggling!

  • @nendinosaurus
    @nendinosaurus 11 місяців тому

    Ok nice, but what do you use for bar plots then for example? When you need a single dataframe. Do you use a single subsampling for things like that?

    • @Riffomonas
      @Riffomonas  5 місяців тому

      I don't usually use barplots 😂 I would take the average value for each sample and plot that as a jittered plot

  • @lisakelly4921
    @lisakelly4921 2 роки тому +1

    If you rarefy to min_n_seq is there a risk of removing significance between two groups of samples when you downstream statistical analysis?

    • @Riffomonas
      @Riffomonas  2 роки тому +1

      Hi Lisa thanks for watching! I think there’s a trade off. If you increase the min_n_seqs value you will have a better limit of detection but fewer samples. With fewer samples you’ll have less statistical power to detect differences. It might be worth running an analysis at multiple levels and see what happens

  • @edwinimfumu3221
    @edwinimfumu3221 Рік тому

    Hi Sir, thanks for your videos. I rarefied my data using iNEXT. Now i am having problem to plot the data. Can you show how to resize plot, legend, etc when using iNEXT

  • @aabidhussain2138
    @aabidhussain2138 Рік тому

    using this code on my data, the shared file produced is empty with only column names in it. what could be the problem?

  • @user-sh7tz1dq2i
    @user-sh7tz1dq2i Рік тому

    Thanks Pat. In QIIME2, your taxa will reach a plateau while your sampling depth increases to a certain level. Is there a similar approach to get that number of sampling depth while plotting the rarefaction curves in Vegan?

    • @Riffomonas
      @Riffomonas  5 місяців тому

      Sorry, I don't use qiime and am not really familiar wiht why you see that. Perhaps because they're using closed reference clustering and it is saturating all of the available taxa in the reference?

  • @gimanibe
    @gimanibe Рік тому

    Thanks Pat. How would you plot the output of drarefy?