My data is a file of about 19000 genes from a 100 patients. I tried to use these data to create a network by using igraph.
Firstly, I had all the names of the genes converted to ENTREZID and from the 19000 genes I kept around 14000. Then I had discarded all the genes with zero variance and the final number is around 9000 genes.
Its a lot of code to append in detail but the initial steps I performed are roughly the following:
1) M2 <- graph_from_data_frame(d = mydf[,c(“Node_A”,”Node_B”,”weight”)], vertices = sort(unique(unlist(mydf))))
2) G2 <- igraph::delete.vertices(M2, igraph::graph.strength(M2)==0)
3) M2 <- induced_subgraph(
G2, V(G2)[components(G2)$membership == which.max(components(G2)$csize)])
4) M2.subgraph <- mst(M2, algorithm=”prim”)
5) M2.subgraph.communities <- cluster_louvain(as.undirected(M2.subgraph), weights =
E(M2.subgraph)$weight)
After community detection I used python’s SelectKBest() function to correlate genes to traits, I found the communities which include the largest number of the most correlated genes, and in the top three communities I used Kleinberg’s score to detect the top genes. I used these genes for GO and KEGG enrichment.
But I notice that something is wrong: louvain returns around 200 communities and walktrap more than 1500!! The worst part of all is that KEGG and GO enrichment are always zero no matter the thresholds I have used!!
I haven’t tried the wgcna library yet but I was wondering what might have missed with the above steps!
Below I have included a link to a sample of my data to whoever is interested:
drive.google.com/file/d/1MDwcu0Xk-A3uWW8MR_YLCvTKAy8uGFSr/view?usp=sharing
Thanks guys,
Kostas
Read more here: Source link