Which values are being used to generate pca1 or Pc2 and so on .. plz use one example for gene expression data . Calculation video is very clear . Thanks
I can explain how PCA (Principal Component Analysis) works and demonstrate the calculation of the first principal component (PC1) using a simple example with gene expression data. Step-by-Step PCA Calculation Standardize the Data: The first step in PCA is to standardize the data so that each gene expression level has a mean of 0 and a standard deviation of 1. Covariance Matrix Calculation: Compute the covariance matrix to understand how the genes vary with respect to each other. Eigenvalues and Eigenvectors: Calculate the eigenvalues and eigenvectors of the covariance matrix. The eigenvectors represent the directions of maximum variance (principal components), and the eigenvalues represent the magnitude of these variances. Principal Components: Project the original data onto the eigenvectors to get the principal components. Example with Gene Expression Data Assume we have a simple dataset with gene expression levels for 3 genes (Gene1, Gene2, Gene3) across 3 samples. Sample Gene1 Gene2 Gene3 S1 2.5 2.4 1.0 S2 0.5 0.7 2.0 S3 2.2 2.9 0.5 Step 1: Standardize the Data First, calculate the mean and standard deviation for each gene. Mean of Gene1 = (2.5 + 0.5 + 2.2) / 3 = 1.733 Mean of Gene2 = (2.4 + 0.7 + 2.9) / 3 = 2.0 Mean of Gene3 = (1.0 + 2.0 + 0.5) / 3 = 1.167 Standardize each gene expression value: For Gene1: S1 = 2.5 − 1.733 Var(Gene1) S1= Var(Gene1) 2.5−1.733 S2 = 0.5 − 1.733 Var(Gene1) S2= Var(Gene1) 0.5−1.733 S3 = 2.2 − 1.733 Var(Gene1) S3= Var(Gene1) 2.2−1.733 For simplicity, let’s assume standard deviations are calculated as follows: Var(Gene1) = ( ( 2.5 − 1.733 ) 2 + ( 0.5 − 1.733 ) 2 + ( 2.2 − 1.733 ) 2 ) / ( 3 − 1 ) ((2.5−1.733) 2 +(0.5−1.733) 2 +(2.2−1.733) 2 )/(3−1) Var(Gene2) = ( ( 2.4 − 2.0 ) 2 + ( 0.7 − 2.0 ) 2 + ( 2.9 − 2.0 ) 2 ) / ( 3 − 1 ) ((2.4−2.0) 2 +(0.7−2.0) 2 +(2.9−2.0) 2 )/(3−1) Var(Gene3) = ( ( 1.0 − 1.167 ) 2 + ( 2.0 − 1.167 ) 2 + ( 0.5 − 1.167 ) 2 ) / ( 3 − 1 ) ((1.0−1.167) 2 +(2.0−1.167) 2 +(0.5−1.167) 2 )/(3−1) Standardized data (assuming standard deviations are 1 for simplicity): Sample Gene1 Gene2 Gene3 S1 0.767 0.4 -0.167 S2 -1.233 -1.3 0.833 S3 0.467 0.9 -0.667 Step 2: Covariance Matrix Calculation Calculate the covariance matrix for the standardized data. Cov = ( Var(Gene1) Cov(Gene1, Gene2) Cov(Gene1, Gene3) Cov(Gene2, Gene1) Var(Gene2) Cov(Gene2, Gene3) Cov(Gene3, Gene1) Cov(Gene3, Gene2) Var(Gene3) ) Cov= ⎝ ⎛ Var(Gene1) Cov(Gene2, Gene1) Cov(Gene3, Gene1) Cov(Gene1, Gene2) Var(Gene2) Cov(Gene3, Gene2) Cov(Gene1, Gene3) Cov(Gene2, Gene3) Var(Gene3) ⎠ ⎞ Step 3: Eigenvalues and Eigenvectors Compute the eigenvalues and eigenvectors of the covariance matrix. Step 4: Principal Components Project the standardized data onto the eigenvectors. For simplicity, let's assume the eigenvectors (principal components) are: PC1 = ( 0.5 0.5 − 0.7 ) PC1= ⎝ ⎛ 0.5 0.5 −0.7 ⎠ ⎞ Calculation of PC1 Calculate the projection of each sample on PC1: PC1 ( S1 ) = 0.5 × 0.767 + 0.5 × 0.4 − 0.7 × ( − 0.167 ) = 0.3835 + 0.2 + 0.1169 = 0.7004 PC1(S1)=0.5×0.767+0.5×0.4−0.7×(−0.167)=0.3835+0.2+0.1169=0.7004 PC1 ( S2 ) = 0.5 × ( − 1.233 ) + 0.5 × ( − 1.3 ) − 0.7 × 0.833 = − 0.6165 − 0.65 − 0.5831 = − 1.8496 PC1(S2)=0.5×(−1.233)+0.5×(−1.3)−0.7×0.833=−0.6165−0.65−0.5831=−1.8496 PC1 ( S3 ) = 0.5 × 0.467 + 0.5 × 0.9 − 0.7 × ( − 0.667 ) = 0.2335 + 0.45 + 0.4669 = 1.1504 PC1(S3)=0.5×0.467+0.5×0.9−0.7×(−0.667)=0.2335+0.45+0.4669=1.1504 So, the values of PC1 for the samples S1, S2, and S3 are approximately 0.7004, -1.8496, and 1.1504, respectively. Summary The process involves standardizing the data, calculating the covariance matrix, finding eigenvalues and eigenvectors, and projecting the data onto the principal components. This projection yields the principal component scores (e.g., PC1 values) for each sample.
I HAVE LOG FOLD CHANGE VALUE . FOR GENES AS ROWS AND SAMPLES ON COLUMNS WITH OTHER PARAMETERS LIKE FOLD CHANGE ,P ADJSUTED VALUE ETC . hOW CAN I ANALYZE THIS TYPE OF DATA
Dear Dr.ASIF, please make video tutorials on , how to do proteomic and metabolomics data analysis. Followed you on twitter, love your style of teaching
As salam alikum Dr. Asif, thank you for the explanation of the PCA.I have two questions- 1. I have three treatments in my RNA seq data (heat, ABA and Cold) , and I have raw data from the company, how can I do the PCA analysis ? Do you recommend any software? is it possible to do it using Shiny GO or iDEP? (I followed your tutorial for shiny GO and presented the data to my professor yesterday, and he was happy to see the interesting results it revealed, Jazakillahi Khairan. ) 2. In the video at 6:23, in the second scree plot, I see a descending order of PCA values from PC1 to PC10, like you explained it should be, however, I see the circular dots from the top of PC1 in a rising manner , what does that indicate?
Wslam, Glad if it’s helping scientific community. 1. Yes iDEP can be used for PCA for details please see videos related to iDEP. 2. Circular dots are only showing the trend of data values. However, these dots are not of great value for PCA interpretation. Most important are PC1 and PC2, as these should be higher and higher. As, much these both are higher, it show this much variations are due tở treatment applied under study.
Thanks for your video, but your understanding of PCA is very different from what I learned, if pc1+pc2 is 47%, it means that the first two pc can only explain 47% of the total variation, not that treatment accounts for 47% of the variation. And how come the number of pc are related to the number of treatments? they are supposed to be created with linear combination of the expression of all genes?
Thanks for sending message, my intended mean was the same as you said, under given dataset (example quoted) 47 percent for given rna seq treatments. Yes, we use linear combination of gene experiments and use original variable to generate the axes.
Which values are being used to generate pca1 or Pc2 and so on .. plz use one example for gene expression data .
Calculation video is very clear .
Thanks
I can explain how PCA (Principal Component Analysis) works and demonstrate the calculation of the first principal component (PC1) using a simple example with gene expression data.
Step-by-Step PCA Calculation
Standardize the Data:
The first step in PCA is to standardize the data so that each gene expression level has a mean of 0 and a standard deviation of 1.
Covariance Matrix Calculation:
Compute the covariance matrix to understand how the genes vary with respect to each other.
Eigenvalues and Eigenvectors:
Calculate the eigenvalues and eigenvectors of the covariance matrix. The eigenvectors represent the directions of maximum variance (principal components), and the eigenvalues represent the magnitude of these variances.
Principal Components:
Project the original data onto the eigenvectors to get the principal components.
Example with Gene Expression Data
Assume we have a simple dataset with gene expression levels for 3 genes (Gene1, Gene2, Gene3) across 3 samples.
Sample Gene1 Gene2 Gene3
S1 2.5 2.4 1.0
S2 0.5 0.7 2.0
S3 2.2 2.9 0.5
Step 1: Standardize the Data
First, calculate the mean and standard deviation for each gene.
Mean of Gene1 = (2.5 + 0.5 + 2.2) / 3 = 1.733
Mean of Gene2 = (2.4 + 0.7 + 2.9) / 3 = 2.0
Mean of Gene3 = (1.0 + 2.0 + 0.5) / 3 = 1.167
Standardize each gene expression value:
For Gene1:
S1
=
2.5
−
1.733
Var(Gene1)
S1=
Var(Gene1)
2.5−1.733
S2
=
0.5
−
1.733
Var(Gene1)
S2=
Var(Gene1)
0.5−1.733
S3
=
2.2
−
1.733
Var(Gene1)
S3=
Var(Gene1)
2.2−1.733
For simplicity, let’s assume standard deviations are calculated as follows:
Var(Gene1) =
(
(
2.5
−
1.733
)
2
+
(
0.5
−
1.733
)
2
+
(
2.2
−
1.733
)
2
)
/
(
3
−
1
)
((2.5−1.733)
2
+(0.5−1.733)
2
+(2.2−1.733)
2
)/(3−1)
Var(Gene2) =
(
(
2.4
−
2.0
)
2
+
(
0.7
−
2.0
)
2
+
(
2.9
−
2.0
)
2
)
/
(
3
−
1
)
((2.4−2.0)
2
+(0.7−2.0)
2
+(2.9−2.0)
2
)/(3−1)
Var(Gene3) =
(
(
1.0
−
1.167
)
2
+
(
2.0
−
1.167
)
2
+
(
0.5
−
1.167
)
2
)
/
(
3
−
1
)
((1.0−1.167)
2
+(2.0−1.167)
2
+(0.5−1.167)
2
)/(3−1)
Standardized data (assuming standard deviations are 1 for simplicity):
Sample Gene1 Gene2 Gene3
S1 0.767 0.4 -0.167
S2 -1.233 -1.3 0.833
S3 0.467 0.9 -0.667
Step 2: Covariance Matrix Calculation
Calculate the covariance matrix for the standardized data.
Cov
=
(
Var(Gene1)
Cov(Gene1, Gene2)
Cov(Gene1, Gene3)
Cov(Gene2, Gene1)
Var(Gene2)
Cov(Gene2, Gene3)
Cov(Gene3, Gene1)
Cov(Gene3, Gene2)
Var(Gene3)
)
Cov=
⎝
⎛
Var(Gene1)
Cov(Gene2, Gene1)
Cov(Gene3, Gene1)
Cov(Gene1, Gene2)
Var(Gene2)
Cov(Gene3, Gene2)
Cov(Gene1, Gene3)
Cov(Gene2, Gene3)
Var(Gene3)
⎠
⎞
Step 3: Eigenvalues and Eigenvectors
Compute the eigenvalues and eigenvectors of the covariance matrix.
Step 4: Principal Components
Project the standardized data onto the eigenvectors.
For simplicity, let's assume the eigenvectors (principal components) are:
PC1
=
(
0.5
0.5
−
0.7
)
PC1=
⎝
⎛
0.5
0.5
−0.7
⎠
⎞
Calculation of PC1
Calculate the projection of each sample on PC1:
PC1
(
S1
)
=
0.5
×
0.767
+
0.5
×
0.4
−
0.7
×
(
−
0.167
)
=
0.3835
+
0.2
+
0.1169
=
0.7004
PC1(S1)=0.5×0.767+0.5×0.4−0.7×(−0.167)=0.3835+0.2+0.1169=0.7004
PC1
(
S2
)
=
0.5
×
(
−
1.233
)
+
0.5
×
(
−
1.3
)
−
0.7
×
0.833
=
−
0.6165
−
0.65
−
0.5831
=
−
1.8496
PC1(S2)=0.5×(−1.233)+0.5×(−1.3)−0.7×0.833=−0.6165−0.65−0.5831=−1.8496
PC1
(
S3
)
=
0.5
×
0.467
+
0.5
×
0.9
−
0.7
×
(
−
0.667
)
=
0.2335
+
0.45
+
0.4669
=
1.1504
PC1(S3)=0.5×0.467+0.5×0.9−0.7×(−0.667)=0.2335+0.45+0.4669=1.1504
So, the values of PC1 for the samples S1, S2, and S3 are approximately 0.7004, -1.8496, and 1.1504, respectively.
Summary
The process involves standardizing the data, calculating the covariance matrix, finding eigenvalues and eigenvectors, and projecting the data onto the principal components. This projection yields the principal component scores (e.g., PC1 values) for each sample.
I HAVE LOG FOLD CHANGE VALUE . FOR GENES AS ROWS AND SAMPLES ON COLUMNS WITH OTHER PARAMETERS LIKE FOLD CHANGE ,P ADJSUTED VALUE ETC . hOW CAN I ANALYZE THIS TYPE OF DATA
Amazing video
Glad you like it
Thanks for all those valuable data.
Glad it was helpful!
great video really helped put a lot. please keep uploading more transcriptome analyses videos
Glad if its helping
Thanks, lkeep uploading
Sure, Glad you like it
Nice lecture, keep it up
Glad you like it
Thanks a lot for good work
Glad you like it
Which analysis is best Correlation or PCA??
Both have different purpose and own importance
@@asifmolbio i have 13 different treatments with 19 variables
13 different treatments are actually different extract from different plants
Please upload more like this, please
Sure stay tuned
Dear Dr.ASIF, please make video tutorials on , how to do proteomic and metabolomics data analysis. Followed you on twitter, love your style of teaching
Sure dear mushtaq , thanks for following and your like , i will record a video on metabolome analysis soon
Thank you Dr. Asif for explaining about PCA. I have rna seq data which showing PC1 is >95% while PC2
Glad you like it. Results and treatments you made are good.
As salam alikum Dr. Asif, thank you for the explanation of the PCA.I have two questions-
1. I have three treatments in my RNA seq data (heat, ABA and Cold) , and I have raw data from the company, how can I do the PCA analysis ? Do you recommend any software? is it possible to do it using Shiny GO or iDEP?
(I followed your tutorial for shiny GO and presented the data to my professor yesterday, and he was happy to see the interesting results it revealed, Jazakillahi Khairan. )
2. In the video at 6:23, in the second scree plot, I see a descending order of PCA values from PC1 to PC10, like you explained it should be, however, I see the circular dots from the top of PC1 in a rising manner , what does that indicate?
Wslam, Glad if it’s helping scientific community.
1. Yes iDEP can be used for PCA for details please see videos related to iDEP.
2. Circular dots are only showing the trend of data values. However, these dots are not of great value for PCA interpretation. Most important are PC1 and PC2, as these should be higher and higher. As, much these both are higher, it show this much variations are due tở treatment applied under study.
what does inversely related treatment mean according to the pca ?
It means applying (increasing) Treatment is decreasing set of gene expressions and vice versa.
Thanks for your video, but your understanding of PCA is very different from what I learned, if pc1+pc2 is 47%, it means that the first two pc can only explain 47% of the total variation, not that treatment accounts for 47% of the variation. And how come the number of pc are related to the number of treatments? they are supposed to be created with linear combination of the expression of all genes?
Thanks for sending message, my intended mean was the same as you said, under given dataset (example quoted) 47 percent for given rna seq treatments.
Yes, we use linear combination of gene experiments and use original variable to generate the axes.
Sir pca data input in software
Sure will upload soon
Thanks, Dr. I did PCA to see the genetic relatedness of breeds, and I found that PC1 is 21.6% and PC2 3.94%; I wonder about the interpretation.
Component 1 is contributing 21 percent to relatedness while component 2 only 3.94 %
One other question, who could I consider PC3 in the analysis?
Your PC2 is already low (3 %), no need to consider pc3 as it would be even more low
@@asifmolbio Thank you so much🙏
@@asifmolbio Is there any consideration that we could know the reason for such a lower contribution?
Aoa sir please share endnote latest version. I need it for my thesis... Reply to my email please
I have endnote 7 if you need
@@asifmolbio yes sir please share
R/Sir, is it compatible for apa7
You can try i will share by tomorrow
@@asifmolbio thank you so much sir. You are 👍👍👍👍👍 great