Knowledge base

Classify the samples according to the distance from the reference class

Introduction

The following are the codes for classifying the samples according to the distance from the centroids of reference class.

Code exmaple

library(data.table)
centroids <- read.table("data/other/GSE10886/pam50_centroids.txt")
gene_name <- fread("data/other/GSE10886/pam_gene_name.txt", header = T)
rownames(centroids) <- gene_name[data.table(pcrID = rownames(centroids)), GeneName, on = "pcrID"]  # Get gene name

# Get intersect genes between my data and centoid data
inter_gene <- intersect(rownames(centroids), rownames(mt))
dat <- mt[inter_gene,]
centroids <- centroids[inter_gene,]

# Calculaed dist
d1=apply(centroids,2,function(i){
  dist(t(cbind(i,dat)))[1:ncol(dat)]
})
# Select the classe with min distance
p1=apply(d1,1, function(x){
  colnames(d1)[which.min(x)]
})

 p1  # Predicted class

References

None

Original

None

Leave a Reply

Your email address will not be published. Required fields are marked *