I’m trying to make a plot from a dataframe of over 300k rows where the peaks and valleys will be annotated from another column rather than the x and y. How can I do that..!!
Dataframe :
chr start end.x StoZ.x Hscore end.y StoZ.y Tier Gene
1 chr1 1 10000 0.0000000 0 NA NA <NA> <NA>
2 chr1 10001 20000 3.6488202 2 NA NA <NA> <NA>
3 chr1 20001 30000 -0.8475483 0 NA NA <NA> <NA>
4 chr1 30001 40000 2.1279359 2 NA NA <NA> <NA>
5 chr1 40001 50000 1.4119515 0 NA NA <NA> <NA>
................
................
................
256 chr1 2550001 2560000 -2.363378 1 2560000 -2.363378 T1 TNFRSF14
257 chr1 2560001 2570000 -0.796173 0 2570000 -0.796173 T1 TNFRSF14
................
................
................
305 chr1 3040001 3050000 0.0564608 0 NA NA NA NA
306 chr1 3050001 3060000 1.4822029 0 NA NA NA NA
307 chr1 3060001 3070000 1.7718186 0 3070000 1.7718186 T1 PRDM16
308 chr1 3070001 3080000 1.5650776 0 3080000 1.5650776 T1 PRDM16
................
................
................
Till now what I did,
p2<-ggplot(BreastCanT1T2mer, aes(start, StoZ.x)) +
geom_jitter(aes(color = Hscore), pch = 23, size = 0.2) +
scale_colour_manual(values = c("gray70","#525480","#9F5370","blue","red"), labels=c('NEUT', 'LOSS', 'GAIN', "DEL","AMP" ))+
theme(legend.position = c(0.5, .06), legend.direction = "horizontal")+
theme(legend.title = element_blank()) +
geom_hline(yintercept=0, linetype="dashed", color = "black")+
guides(colour = guide_legend(override.aes = list(size=1.5)))+
xlab("Chromosomes") + ylab("Stoffers-Z Score")+
facet_wrap(~factor(chr, c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14","chr15", "chr16","chr17","chr18","chr19","chr20","chr21","chr22", "chrX", "chrY")), nrow =1, strip.position = "bottom") +
theme(panel.spacing.x = unit(0.1, "lines"))+
theme(axis.text.x=element_blank())+
theme( legend.key = element_rect(fill = "transparent", colour = NA))
p2
p2+stat_peaks(span=7,size= 1, ignore_threshold = 0.95, color="brown")+
stat_peaks(geom="text", size= 1.5, span=7, ignore_threshold = 0.95, color="red", angle=90, hjust=-0.1, check_overlap = TRUE) +
stat_valleys(span=7,size= 1, ignore_threshold = 0.95, color="skyblue")+
stat_valleys(geom="text", size= 1.5, span=7, ignore_threshold = 0.95, color="black", angle=90, hjust=1.1, check_overlap = TRUE)
My expectation is that the Gene name will appear only once where the Gene is available (StoZ is probably highest and lowest) and where the StoZ value should be less than -2 or greater than +2.
**Improvement after @Elin’s sugesstion
p2+geom_text(aes(label=Gene))
Here, the Name of the Genes should appear only once and y conditioned on <-2 and >+2.
But when I do condition on StoZ, nothing changed.
p2+geom_text(aes(label=ifelse((StoZ.x < 2), Gene, ifelse(StoZ.x > -2, Gene, ""))), size=2)
Then I did,
p2+geom_text(aes(label= ifelse(StoZ.x < -2, Gene, "")), size=2) +
geom_text(aes(label= ifelse(StoZ.x > 2, Gene, "")), size=2)
Read more here: Source link