Discriminant gene analysis
I want to obtain the top 5 discriminant genes (positive and negative direction) after a feature selection process. Is this the proper way to obtain the top 5 discriminant genes?
# New data.frame with genes that have passed both (Fold and rawp) tests
true.genes <- subset(gene.info, (Fold.Test & rawp.Test) == TRUE)
cat(sprintf("Total number of genes that pass both (rawp and Fold) tests: %sn", nrow(true.genes))) # 5241
# Write these genes with their corresponding values to an output .txt file
write.table(true.genes, file="TrueGenes.csv", sep=",", col.names=NA, qmethod="double")
# Ordering the highest genes (by P-value) in the form of a data.frame
# Note: dat.filtered is still in log2 scale
best.genes <- order(rawp.pass)[1:length(rawp.pass)]
best.genes.df <- data.frame(index=best.genes, rawp=rawp.pass[best.genes])
top.genes.matrix <- dat.filtered[best.genes, ]
# Feature Selection via svmRFE which utilizes the library e1071
t.dat <- t(top.genes.matrix)
svm.df <-data.frame(label, t.dat)
ranked.list <- svmRFE(svm.df, k=10, halve.above=100)
# Write the rankings to an output .txt file so that it can be read in later if needed
output <- data.frame(RankedOrder = ranked.list)
write.table(output, file = "RankedList.txt")
top.ranked.genes <- top.genes.matrix[ranked.list, ]
rownames(top.ranked.genes) <- rownames(top.genes.matrix[ranked.list, ])
# Create a new genes.info data.frame for the ranked genes
top.genes.info <- gene.info[rownames(top.ranked.genes ),]
tg <- top.genes.info$pvalue[top.genes.info$pvalue < thresh]
top.genes.info <- top.genes.info[rownames(top.genes.info) %in% rownames(ann),]
top.genes.info <- top.genes.info[order(top.genes.info$pvalue),]
top5 <- head(top.genes.info, n=5L, na.omit=T)
bottom5 <- tail(top.genes.info, n=5L, na.omit=T)
• 14 views
Read more here: Source link