df2genind adegenet - finding optimal number of clusters using iterative K means in R
Вставка
- Опубліковано 5 лют 2025
- Population Structure Analysis ###
#################################################################################
######## Find optimal number of clusters using iterative K means approach #######
#################################################################################
library(adegenet)
library(dplyr)
library(ggplot2)
#Read in csv data
setwd('C:/Users/falk/Google Drive/PhD/PhD Projects/Blue Steel/2017 Data - Growth Chamber/Genotypic Data stuff')
GWAS_GD = read.table("GWAS_GD.txt", sep = '\t',header = T)
GD = GWAS_GD[1:292,5:ncol(GWAS_GD)]
metadata = read.csv("C:/Users/falk/Google Drive/PhD/PhD Projects/Blue Steel/2017 Data - Growth Chamber/Randomizations Origin Data GWAS Names/Meta_data.csv")
obj = df2genind(GD, ploidy=2,sep = '/t') # 1. Make genind object to be used in further analysis
grp = find.clusters(obj, max.n=20, n.pca=200, scale=FALSE) # 2. try different values of k (interactive) using kmeans
#The rule of thumb consists in increasing K until it no longer leads to an appreciable improvement of fit (i.e., to a decrease of BIC)
number of accessions per group
table(grp$grp)
grouping = data.frame(GWAS_GD$name,grp$grp)
colnames(grouping)[1] = 'name'
colnames(grouping)[2] = 'subpop'
#Write out grouping of genotype
write.csv(grouping, "Population_Clustering_6groups.csv",row.names = F)
grouping = read.csv("Population_Clustering_6groups.csv")
metadata = read.csv("C:/Users/falk/Google Drive/PhD/PhD Projects/Blue Steel/2017 Data - Growth Chamber/Randomizations Origin Data GWAS Names/Meta_data.csv")
Do you have this code stored on a github?
You can find this code on my Github, try the link below:
github.com/mighster/Data_Visualization_Graphs/blob/master/Dendrogram_tutorial.R