Interpreting TFBS enrichment from random genomic regions

Hi all,

To make a long story short, my lab has developed a computational method for analyzing WGBS data that results in genomic regions of interest. These are regions defined by sequencing reads with mixed methylation states (i.e. consecutive CpGs having non-matching methylation). My boss wants me to use some kind of computational tool to evaluate whether these regions contain transcription factor binding site motifs, the idea being to link these epigenetic states with some kind of transcription factor/DNA binding protein.

I’ve looked at the data produced by our lab’s method longer than anyone, even the program’s creator, and I’m convinced that the output is basically meaningless from a biological perspective (at least in non-cancerous tissue).

Regions generated by this approach are always 100bp in length, and usually number in the mid thousands across the genome. When I feed these regions into programs like Homer, anything in the Meme suite (i.e. SEA, MEME-ChIP, AME) or even oPossum, I’ll get a variety of enriched TFBS motifs within my regions. Some of these methods allow me to upload background sequences (chosen as equally sized regions with equivalent GC% and CpG density), and this makes no real difference in the number of enriched motifs I find.

This would be exciting, except for the fact that I get these kinds of results even if I choose an equal number of totally random genomic regions (with equivalent properties), essentially comparing one background set to another.

I guess you can probably tell that I’m looking for help in winning an argument, but I don’t have anyone in my lab or circle of collaborators who can help me with this, and I’m pretty sure my time is being wasted in this pursuit.

So I’ll ask a simple question: can these TFBS motif-finding tools distinguish between biologically meaningful data (i.e. ChIP-seq) and random genomic regions with equivalent properties? Is it inevitable that any set of, say, 5000 100bp regions with >= 2 CpGs will have some TFBS motifs enriched relative to a randomly chosen regions with equivalent properties?

Thanks for reading to the end, and thanks in advance for any input on this!

Read more here: Source link