Check out the parameter of vegan's function: "Instead of drawing a plot, return a “tidy” data frame than can be used in ggplot2 graphics. The data frame has variables Site (factor), Sample and Species."
@ 5:00 it is suggested that tibbles do not allow row names. I think this is incorrect and the following code is the tidy way to do it: shared %>% pivot_wider(names_from = name, values_from = values, values_fill = 0) %>% column_to_rowname('Group')
thank you so much for your unlimited help, Dr, if you have a lecture regarding Hill number (q=0, q=1, q=2) for diversity analysis of vegetation, please provide me.
Thank you very much for all the material, very useful! I have different library sizes in my microbiome data and I would like to normalize it using rarefaction to min_n_seqs since the smaller sample contains more than 12000 reads, as you well explained. If I use vegan:rrarefy, I obtain the specified subset of reads from my original OTU table. One single random subset might not be representative enough for each sample since there is high diversity. Is there a way you recommend to carry on a multiple iteration rarefaction and a final OTU table in which the values are the average of the multiple subsets? Thanks for your attention.
Thanks for watching! RUnning rrarefy a bunch of times and then averaging the counts is effectively the same as using the relative abundance, which I showed in an earlier episode causes problems. I would suggest running whatever test you're doing on single a subsampling and then repeat it a few times to see if the results change any. In my experience, the low relative abundance taxa are what change the most and for most OTU-based analyses they don't come up as significant. if they do, I generally discount them because they're so rare.
Hi Sir, thanks for your videos. I rarefied my data using iNEXT. Now i am having problem to plot the data. Can you show how to resize plot, legend, etc when using iNEXT
If I want to loop the rrarefy command on my data matrix 1000 times and save all the output (to ultimately calculate an average), what code would I use? I've been trying to figure it out and am struggling!
Hi Lisa thanks for watching! I think there’s a trade off. If you increase the min_n_seqs value you will have a better limit of detection but fewer samples. With fewer samples you’ll have less statistical power to detect differences. It might be worth running an analysis at multiple levels and see what happens
Thanks Pat. In QIIME2, your taxa will reach a plateau while your sampling depth increases to a certain level. Is there a similar approach to get that number of sampling depth while plotting the rarefaction curves in Vegan?
Sorry, I don't use qiime and am not really familiar wiht why you see that. Perhaps because they're using closed reference clustering and it is saturating all of the available taxa in the reference?
Thank you so much for your tutorials!! You make complex things look easier which is very helpful (specially in analysing data)
Thanks!
Check out the parameter of vegan's function:
"Instead of drawing a plot, return a “tidy” data frame than can be used in ggplot2 graphics. The data frame has variables Site (factor), Sample and Species."
Thanks - yeah i think that's new since I made the video
Thank you so much for this detailed explanation! It was really useful for my own data.
Wonderful - I'm glad it was useful! Thanks for watching
@ 5:00 it is suggested that tibbles do not allow row names. I think this is incorrect and the following code is the tidy way to do it:
shared %>%
pivot_wider(names_from = name, values_from = values, values_fill = 0) %>%
column_to_rowname('Group')
Also, vegan has a great new feature that avoids all the data manipulation for getting tidy data: my_curves
That’s great to see!
This actually creates a data frame rather than a tibble. A tibble is a special kind of data frame
thank you so much for your unlimited help,
Dr, if you have a lecture regarding Hill number (q=0, q=1, q=2) for diversity analysis of vegetation, please provide me.
My pleasure? Unfortunately I don’t have anything about hill numbers
Thank you very much for all the material, very useful!
I have different library sizes in my microbiome data and I would like to normalize it using rarefaction to min_n_seqs since the smaller sample contains more than 12000 reads, as you well explained. If I use vegan:rrarefy, I obtain the specified subset of reads from my original OTU table. One single random subset might not be representative enough for each sample since there is high diversity. Is there a way you recommend to carry on a multiple iteration rarefaction and a final OTU table in which the values are the average of the multiple subsets?
Thanks for your attention.
Thanks for watching! RUnning rrarefy a bunch of times and then averaging the counts is effectively the same as using the relative abundance, which I showed in an earlier episode causes problems. I would suggest running whatever test you're doing on single a subsampling and then repeat it a few times to see if the results change any. In my experience, the low relative abundance taxa are what change the most and for most OTU-based analyses they don't come up as significant. if they do, I generally discount them because they're so rare.
@@Riffomonas I will try as you suggest, it makes a lot of sense. Again, thank you very much for your help! Your videos are awesome!
@Path, I think there is a function to deal with either rownames or columname in tibble package
Ah - you're right! Thanks :) tibble.tidyverse.org/reference/rownames.html
we rarely see this face @20:44 ! Pat thinking longer than a nanosecond about one of his 2198321673213 variables.
Lol. Plus I think it was the end of a long day at the end of a long week 😂🤓
Hi Sir, thanks for your videos. I rarefied my data using iNEXT. Now i am having problem to plot the data. Can you show how to resize plot, legend, etc when using iNEXT
If I want to loop the rrarefy command on my data matrix 1000 times and save all the output (to ultimately calculate an average), what code would I use? I've been trying to figure it out and am struggling!
using this code on my data, the shared file produced is empty with only column names in it. what could be the problem?
Ok nice, but what do you use for bar plots then for example? When you need a single dataframe. Do you use a single subsampling for things like that?
I don't usually use barplots 😂 I would take the average value for each sample and plot that as a jittered plot
If you rarefy to min_n_seq is there a risk of removing significance between two groups of samples when you downstream statistical analysis?
Hi Lisa thanks for watching! I think there’s a trade off. If you increase the min_n_seqs value you will have a better limit of detection but fewer samples. With fewer samples you’ll have less statistical power to detect differences. It might be worth running an analysis at multiple levels and see what happens
Thanks Pat. How would you plot the output of drarefy?
Thanks Pat. In QIIME2, your taxa will reach a plateau while your sampling depth increases to a certain level. Is there a similar approach to get that number of sampling depth while plotting the rarefaction curves in Vegan?
Sorry, I don't use qiime and am not really familiar wiht why you see that. Perhaps because they're using closed reference clustering and it is saturating all of the available taxa in the reference?