CTCF: an R/bioconductor data package of human and mouse CTCF binding sites


. 2022 Dec 16;2(1):vbac097.


doi: 10.1093/bioadv/vbac097.


eCollection 2022.

Affiliations

Free PMC article

Item in Clipboard

Mikhail G Dozmorov et al.


Bioinform Adv.


.

Free PMC article

Abstract


Summary:

CTCF (CCCTC-binding factor) is an 11-zinc-finger DNA binding protein which regulates much of the eukaryotic genome’s 3D structure and function. The diversity of CTCF binding motifs has led to a fragmented landscape of CTCF binding data. We collected position weight matrices of CTCF binding motifs and defined strand-oriented CTCF binding sites in the human and mouse genomes, including the recent Telomere to Telomere and mm39 assemblies. We included selected experimentally determined and predicted CTCF binding sites, such as CTCF-bound cis-regulatory elements from SCREEN ENCODE. We recommend filtering strategies for CTCF binding motifs and demonstrate that liftOver is a viable alternative to convert CTCF coordinates between assemblies. Our comprehensive data resource and usage recommendations can serve to harmonize and strengthen the reproducibility of genomic studies utilizing CTCF binding data.


Availability and implementation:

bioconductor.org/packages/CTCF. Companion website: dozmorovlab.github.io/CTCF/; Code to reproduce the analyses: github.com/dozmorovlab/CTCF.dev.


Supplementary information:

Supplementary data are available at Bioinformatics Advances online.

Figures


Fig. 1.



Fig. 1.

Properties of CTCF motifs detected by FIMO. (A) Jaccard overlaps among CTCF binding sites detected in the original and liftOver human genome assemblies. CTCF sites were detected using JASPAR 2022 MA0139.1 PWM. The correlogram was clustered using Euclidean distance and Ward.D clustering. White-red gradient indicate low-to-high Jaccard overlaps. Jaccard values are shown in the corresponding cells. (B) Density plot of the number of motifs depending on the FIMO P-value threshold. Dashed line—1E−6 P-value cutoff. (C) The proportion of true/false positive CTCF binding motifs depending on the FIMO P-value threshold. ENCODE SCREEN data was used as ground truth

References

    1. Bailey T.L. et al. (2009) MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res., 37, W202–W208.



      PMC



      PubMed

    1. Bao L. et al. (2008) CTCFBSDB: a CTCF-binding site database for characterization of vertebrate genomic insulators. Nucleic Acids Res., 36, D83–D87.



      PMC



      PubMed

    1. Castro-Mondragon J.A. et al. (2022) JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res., 50, D165–D173.



      PMC



      PubMed

    1. Chen H. et al. (2012) Comprehensive identification and annotation of cell type-specific and ubiquitous CTCF-binding sites in the human genome. PLoS One, 7, e41374.



      PMC



      PubMed

    1. Cuellar-Partida G. et al. (2012) Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics, 28, 56–62.



      PMC



      PubMed

Read more here: Source link