Hi, yes. Good spot! You are right, we misplaced the numbers for K and k. It should be the other way around in this slide show. It should have been "Basketball Team K=8, k=7 (large)". Thank you for correcting this misplacement in the webinar slides.
Thank you for your video, it was very helpful! I have a question regarding your suggestion to use a t-test for small sample sizes (n ≤ 3). Could you explain why you recommend a t-test over a non-parametric test in these cases? If you could provide any references or further reading on this topic, it would be greatly appreciated.
Hello @JuneMongeLorenzo. We noted your question and have forwarded it to Mathias Gerl, Head of Data Analysis at Lipotype. Mathias is right now unavailable and will return to office in a few days. We will comment over here with Mathias' answer, once he returned! This may take a few days. See you then!
Hello @JuneMongeLorenzo. Mathias is back and provided this answer to your question. Does this answer your question? I’m glad you found the video helpful. Non-parametric tests, such as the Mann-Whitney U test or the Wilcoxon signed-rank test, rely on rank-based methods to assess differences between groups. These tests typically require a larger sample size to achieve sufficient power and reliability because the number of possible rank permutations is limited with very small samples. Consequently, the results may not be meaningful or statistically significant when the sample size is extremely small. On the other hand, the t-test was specifically developed to handle small sample sizes. It was introduced to manage situations where sample sizes are limited, and the population standard deviation is unknown. The t-test can be used as a pragmatic approach in the case of very small sample sizes. However, it should be used with caution as the assumptions of the t-test (e.g., normality) cannot be verified with such small sample sizes. Additionally, it will only return significant results for large effect sizes between the samples. For further reading, I highly recommend “An Introduction to Medical Statistics” by Martin Bland. This book provides an excellent overview of statistical methods, including the use of the t-test in medical research, and discusses the assumptions and limitations of various statistical tests in greater detail. Relevant chapters include: - Chapter 10: Comparing the Means of Small Samples - Chapter 12: Methods Based on Rank Order Bland, Martin. *An Introduction to Medical Statistics*. Fourth edition. Oxford Medical Publications. Oxford: Oxford University Press, 2015.
Hi Edoardo, As mol% data are not normally distributed, we suggest to use a non-parametric test, e.g. a Wilcoxon rank sum test (unpaired) or Wilcoxon signed rank test (paired), depending on the experimental design. Hope this helps!
@@lipotype_global thanks for the helpful reply and indeed that's what I thought of using as well. However power gets a big hit when sample size is small. I also have just the Mol data. What transformation and normalization and parametric test/regression family would you recommend for that kind of lipidomic data?
@@EdoardoMarcora Hi Edoardo, Rank based tests can only result in significant p-values, when you have enough replicates. Also, there is no easy statistical A/B test for the appropriate beta distribution (this distribution can model mol% data). If you only have limited number of replicates, your only chance is to use the t-test, despite violating some of its premisses. An option might be a normality test like Shapiro-Wilk-Test. This test checks, if the normality assumption is violated. If you don’t get a significant p-value, there are at least no serious reasons against the normality assumption and then it is reasonable to apply a t-test afterwards. Logging the data usually improves the normality of the data. Please reach out to us through our contact form if you would like to receive an offer for further lipidomics data analysis consultation: www.lipotype.com/contact/
thank you, Mathias, very informative. Can you kindly describe how the circular chart (at 4.38sec in the video) with so many lipids connected to the master lipid classes is generated? Is there any R-package that can do the same?
Hi KN! The graph was done with the ggraph package: ggraph.data-imaginist.com You can find a similar graph in the 4th figure from the top on this page: ggraph.data-imaginist.com/articles/Nodes.html I also used it in this figure: journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000443#pbio-3000443-g003 Does this help you? :)
Does it make sense to apply a model with the dependent and independent variables reversed? Like making a logistic regression model with regularization to predict group membership from lipid concentrations?
Dear researcher, It of course depends on your use case, but it may make a lot of sense to create machine learning models to predict group membership from lipid concentrations. The application of these g models to predict group membership based on lipid concentrations is a wide-spread approach. For example, in one of our studies, we utilized machine learning to estimate a risk score from lipid concentrations. This is a regression analysis, but it would work similarly for classification. You can access this study here: doi.org/10.1371/journal.pbio.3001561 We hope this helps! Lauber, Chris, Mathias J. Gerl, Christian Klose, Filip Ottosson, Olle Melander, and Kai Simons. “Lipidomic Risk Scores Are Independent of Polygenic Risk Scores and Can Predict Incidence of Diabetes and Cardiovascular Disease in a Large Population Cohort.” PLOS Biology 20, no. 3 (March 3, 2022): e3001561
Click here to download our eBook "The Book of Lipidomics" 👉 bit.ly/43fXLxM
Hi, I think at 31.27 Slide 58, The numbers for K and k are misplaced.
Hi, yes. Good spot! You are right, we misplaced the numbers for K and k. It should be the other way around in this slide show. It should have been "Basketball Team K=8, k=7 (large)". Thank you for correcting this misplacement in the webinar slides.
Thank you for your video, it was very helpful! I have a question regarding your suggestion to use a t-test for small sample sizes (n ≤ 3). Could you explain why you recommend a t-test over a non-parametric test in these cases? If you could provide any references or further reading on this topic, it would be greatly appreciated.
Hello @JuneMongeLorenzo. We noted your question and have forwarded it to Mathias Gerl, Head of Data Analysis at Lipotype. Mathias is right now unavailable and will return to office in a few days. We will comment over here with Mathias' answer, once he returned! This may take a few days. See you then!
Hello @JuneMongeLorenzo. Mathias is back and provided this answer to your question. Does this answer your question?
I’m glad you found the video helpful.
Non-parametric tests, such as the Mann-Whitney U test or the Wilcoxon signed-rank test, rely on rank-based methods to assess differences between groups. These tests typically require a larger sample size to achieve sufficient power and reliability because the number of possible rank permutations is limited with very small samples. Consequently, the results may not be meaningful or statistically significant when the sample size is extremely small.
On the other hand, the t-test was specifically developed to handle small sample sizes. It was introduced to manage situations where sample sizes are limited, and the population standard deviation is unknown.
The t-test can be used as a pragmatic approach in the case of very small sample sizes. However, it should be used with caution as the assumptions of the t-test (e.g., normality) cannot be verified with such small sample sizes. Additionally, it will only return significant results for large effect sizes between the samples.
For further reading, I highly recommend “An Introduction to Medical Statistics” by Martin Bland. This book provides an excellent overview of statistical methods, including the use of the t-test in medical research, and discusses the assumptions and limitations of various statistical tests in greater detail. Relevant chapters include:
- Chapter 10: Comparing the Means of Small Samples
- Chapter 12: Methods Based on Rank Order
Bland, Martin. *An Introduction to Medical Statistics*. Fourth edition. Oxford Medical Publications. Oxford: Oxford University Press, 2015.
What statistical test should applied when using molar fraction/Mol% which represent a proportion?
Hi Edoardo,
As mol% data are not normally distributed, we suggest to use a non-parametric test, e.g. a Wilcoxon rank sum test (unpaired) or Wilcoxon signed rank test (paired), depending on the experimental design.
Hope this helps!
@@lipotype_global thanks for the helpful reply and indeed that's what I thought of using as well. However power gets a big hit when sample size is small. I also have just the Mol data. What transformation and normalization and parametric test/regression family would you recommend for that kind of lipidomic data?
@@EdoardoMarcora
Hi Edoardo,
Rank based tests can only result in significant p-values, when you have enough replicates. Also, there is no easy statistical A/B test for the appropriate beta distribution (this distribution can model mol% data). If you only have limited number of replicates, your only chance is to use the t-test, despite violating some of its premisses. An option might be a normality test like Shapiro-Wilk-Test. This test checks, if the normality assumption is violated. If you don’t get a significant p-value, there are at least no serious reasons against the normality assumption and then it is reasonable to apply a t-test afterwards. Logging the data usually improves the normality of the data.
Please reach out to us through our contact form if you would like to receive an offer for further lipidomics data analysis consultation:
www.lipotype.com/contact/
thank you, Mathias, very informative. Can you kindly describe how the circular chart (at 4.38sec in the video) with so many lipids connected to the master lipid classes is generated? Is there any R-package that can do the same?
Hi KN!
The graph was done with the ggraph package:
ggraph.data-imaginist.com
You can find a similar graph in the 4th figure from the top on this page:
ggraph.data-imaginist.com/articles/Nodes.html
I also used it in this figure:
journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000443#pbio-3000443-g003
Does this help you? :)
@@lipotype_global Many thanks, Mathias. Let me give it a try on my dataset.
Does it make sense to apply a model with the dependent and independent variables reversed? Like making a logistic regression model with regularization to predict group membership from lipid concentrations?
Dear researcher,
It of course depends on your use case, but it may make a lot of sense to create machine learning models to predict group membership from lipid concentrations. The application of these g models to predict group membership based on lipid concentrations is a wide-spread approach.
For example, in one of our studies, we utilized machine learning to estimate a risk score from lipid concentrations. This is a regression analysis, but it would work similarly for classification. You can access this study here: doi.org/10.1371/journal.pbio.3001561
We hope this helps!
Lauber, Chris, Mathias J. Gerl, Christian Klose, Filip Ottosson, Olle Melander, and Kai Simons. “Lipidomic Risk Scores Are Independent of Polygenic Risk Scores and Can Predict Incidence of Diabetes and Cardiovascular Disease in a Large Population Cohort.” PLOS Biology 20, no. 3 (March 3, 2022): e3001561