Topologically associating domain boundaries are required for normal genome function

ENCODE ChIP-seq data analysis and prioritization of TAD boundary deletion loci

Previously published chromosome conformation capture data was used for combined analyses and selection of TAD boundary deletion loci in this study, wherein TAD boundary calls were based on the maximum enrichment of CTCF at the TAD borders and their consistency across cell-types10. Over 3300 genome-wide annotated TAD boundaries5,10 were then ranked on a weighted score (see Supplementary Fig. 1), encompassing strength of insulation based on CCCTC-binding factor (CTCF) occupancy, co-occupancy of subunits of the cohesin complex and the transcription factor Znf14331 and CTCF-binding conservation at orthologous regions in four different mammalian species32 (Fig. 1, Supplementary Fig. 1). We analyzed ~100 individual ChIP-seq datasets to this effect (listed in Supplementary Data 1). CTCF-bound sites genome-wide were spatially clustered and individually scored based on the criteria highlighted above within regions 40 kb upstream and downstream of each TAD boundary. An overall score for each boundary was devised based on the intervening bound CTCFs, excluding boundaries overlapping protein-coding genes, thus enabling unambiguous interpretation of the functional necessity of TAD boundaries in mammalian development. Furthermore, boundaries where flanking TADs harbored genes encoding transcription factors important for development and preferentially (to the extent possible) showing a divergent pattern of tissue-specific expression were prioritized for in vivo deletion. Our selection criteria did not factor in the directionality of CTCF motifs when selecting TAD boundary loci for deletion (Supplementary Fig. 1).

Experimental design of mouse studies

All animal work was reviewed and approved by the Lawrence Berkeley National Laboratory Animal Welfare Committee. Mice were monitored daily for food and water intake, and animals were inspected weekly by the Chair of the Animal Welfare and Research Committee and the head of the animal facility in consultation with the veterinary staff. The LBNL ACF is accredited by the American Association for the Accreditation of Laboratory Animal Care (AAALAC). TAD boundary knockouts were performed in Mus musculus FVB strain mice with an exception for TAD boundary B1 where heterozygous-null mice in a mixed strain background (Cast/Eij and FVB) were necessary to be generated specifically for conducting the Hi-C assays. Mice across developmental stages from embryonic day 10.5 through P0, as well as mice between weeks 4–16 were used in this study. Animals of both sexes were used in the analysis. Sample size selection and randomization strategies are included in individual method sections. Unless otherwise stated, all phenotyped mice described in the paper resulted from crossing heterozygous TAD boundary deletion mice together to allow for the comparison of matched littermates of different genotypes. Samples were dissected and processed blind to genotype where applicable.

Generation of TAD boundary deletion mice, and specific regulatory sequence deletions for TAD boundary B2

Transgenic mice were generated using the Mus musculus FVB strain and a standard CRISPR/Cas9 microinjection protocol50. Briefly, Cas9 protein (Integrated DNA Technologies catalog no. 1081058) at a final concentration of 20 ng/μl was mixed with sgRNA targeting the intended locus (50 ng/μl, for all sgRNAs combined), in microinjection buffer (10 mM Tris, pH 7.5; 0.1 mM EDTA). The mix was injected into the pronuclei of single cell stage fertilized FVB embryos obtained from the oviducts of super-ovulated 7–8 weeks old FVB females mated to FVB males (See Supplementary Data 8 for sgRNA sequences). The injected embryos were then cultured in M16 medium supplemented with amino acids at 37 °C under 5% CO2 for ~2 h. The embryos were subsequently transferred into the uteri of pseudo-pregnant CD-1 surrogate mothers. Founder (F0) mice were genotyped using PCR with High Fidelity Platinum Taq Polymerase (Thermo Fisher) to identify those with the desired NHEJ-generated deletion breakpoints. Sanger sequencing was used to identify and confirm deletion breakpoints in F0 and F1 mice (Supplementary Fig. 11, Supplementary Data 8 for CRISPR sgRNA templates and Supplementary Data 9 for primer sequences and PCR amplicons). Between one and four F0 founders were obtained for each of the TAD boundary deletion loci, each of which were simultaneously assayed for possible inversions by PCR. Only those F0 founders that harbored clean deletion alleles were backcrossed to wild-type mice and bred to procure F1 heterozygous mice. Given that each of the deletions across founders were consistent in the NHEJ-mediated deletion span, only one founder line for each locus was eventually selected to expand breeding for experiments in this paper. Additional confirmation and visualization of the deleted TAD boundaries is evident in Hi-C contact matrices resulting from Hi-C experiments on tissue from homozygous TAD boundary mutants compared to wild-type mice (Supplementary Fig. 11).

Generation of deletion mice for selected regulatory sequences within TAD boundary B2 encompassed individual deletions of a lung enhancer (enhancer Δ), an adjacent CTCF site (CTCF Δ) and deletion of the enhancer along with the CTCF site (enhancer+CTCF Δ) as control. These mice were generated identical to the methods described above and details are provided accordingly (Supplementary Fig. 12, Supplementary Data 8 for CRISPR sgRNA templates and Supplementary Data 9 for primer sequences and PCR amplicons).

In addition, mice heterozygous for TAD boundary B1 were used for conducting Hi-C assays for this TAD boundary knockout line. However, to circumvent the problem of distinguishing the alleles bioinformatically, mice in a mixed stain background were necessary to assess the changes in Hi-C contacts upon boundary deletion. To this end, we obtained wild-type Cast/Eij mice from the Jackson Laboratory and crossed these with our existing FVB mice that were heterozygous for the TAD boundary deletion B1. The heterozygous mice resulting from these crosses resulted in TAD boundary B1 heterozygous animals that harbored Cast/Eij background for the wild-type allele and concomitant FVB background for the deletion allele.

The described mouse lines are made available through the Mutant Mouse Resource and Research Center, www.mmrrc.org, and can be found in the MMRRC catalog using the regulatory region symbol, or the Research Resource Identifiers (RRID) (Supplementary Data 10).

Assessment of Mendelian segregation and viability

Sample sizes were selected empirically based on our previous studies44,51. Mendelian segregation was initially assessed postnatally on animals resulting from heterozygous crosses, thus allowing for comparison of matched littermates of different genotypes. Where applicable, Mendelian ratios were assessed in embryological time points as necessitated by the phenotype on a case-by-case basis. For TAD boundary B1, although we have rarely obtained homozygous-null mutants at E13.5, no viable homozygous-null embryos were observed by E10.5 in a systematic assessment of Mendelian segregation in  213 embryonic samples harvested between embryonic days 8.5-15.5 for this knockout line (Supplementary Data 4).

In situ Hi-C library generation

Hi-C experiments were performed on ex vivo liver tissue from male mice at post-natal day 56. Upon euthanasia, liver samples were harvested, flash frozen in liquid nitrogen and pulverized before 1% formaldehyde cross-linking for 15 min. Thawed crosslinked tissue was dissociated by a gentleMACS Tissue Dissociator using the factory-set program and filtered through a 40 µm BD-cellstrainer. Cell pellets were centrifuged at 1000 × g for 6 min at 4 °C and overlaid with 3 ml 1 M sucrose. The suspension was centrifuged at 2500 × g for 6 min at 4 °C. Pelleted nuclei were resuspended in 50 μL 0.5% SDS and incubated for 10 min at 62 °C. SDS was quenched by adding Triton X-100 and incubation for 15 min at 37 °C. Chromatin was digested using MboI (100U; NEB) at 37 °C overnight with shaking (1000 rpm). The enzyme was inactivated by heating 20 min at 62 °C. Fragmented ends were labeled with biotin-14-dATP (Life Technologies) using Klenow DNA polymerase (0.8 U μl−1; NEB) for 60 min at 37 °C with rotation (900 rpm). Ends were subsequently ligated for 4 h at room temperature using T4 DNA Ligase (4000 units; NEB). Reverse crosslinking was performed using Proteinase K (1 mg, NEB) and incubation at 55 °C overnight. The digestion efficiency and ligation efficiency were checked by gel electrophoresis. Next, DNA was purified by using ethanol precipitation and sheared using a Covaris Focused-ultrasonicator (M220; duty cycle: 10%; Power: 50, Cycles/burst: 200, Time: 70 s). After size selection and purification using SPRI beads (Beckman Coulter), DNA was biotin pulled-down using Dynabeads MyOne Streptavidin T1 beads (Life Technologies). Finally, sequencing libraries were prepared on T1 magnetic beads, and final PCR amplification was performed for seven cycles based on qPCR analysis. Bead-purified libraries were quantified with a Qubit and then diluted for size distribution assessment using High Sensitivity D1000 ScreenTape on a TapeStation (Agilent).

Hi-C was performed on two biological replicates each for both homozygous-null and control samples for each of the TAD boundary deletion loci B2–8. For TAD boundary B1, Hi-C was performed on two biological replicates that were heterozygous-null for the boundary deletion and one wild-type sample, and all these samples were generated from mixed-strain (Cast/FVB) background mice for the following reasons (i) the early embryonic lethality of homozygous mutants for this mouse line, (ii) the requirement of large amount of tissue for performing Hi-C experiments, (iii) limitations of using heterozygous mutants in isogenic FVB background for Hi-C experiments, as these would make allele-specific downstream analyses problematic. Details on generating these mice are described in “Methods”.

Hi-C data analysis

The Hi-C data processing pipeline is available at github.com/ren-lab/hic-pipeline. Briefly, with respect to TAD boundary loci B2–8, Hi-C reads were aligned to the mouse mm10 reference genome using BWA-MEM52 for each read separately, and then paired. For TAD boundary locus B1, where tissue from animals in a mixed strain background were used, we constructed both Cast and FVB genome sequences using the SNP information, then used BWA-MEM46 to align the raw reads to both Cast and FVB genome sequence; next, for each read, we compared the two mapping qualities and the mapped length, we chose the mapping results with higher scores. Beyond this analyses step, downstream analyses for all TAD boundary loci followed the same standards. For chimeric reads, only 5′ end-mapped locations were kept. Duplicated read pairs mapped to the same location were removed to leave only one unique read pair (MarkDuplicates in Picard package). The output bam files were transformed into juicer file format for visualization in Juicebox53,54. Contact matrices were presented as observed/expected contacts at 25-kb resolution and normalized using the Knight–Ruiz matrix balancing method55. Directionality index for each sample was also generated at 25-kb resolution5. Insulation score for each sample was generated at 25-kb resolution with 500-kb square56. For data in Fig. 3, Directionality Index (DI) scores of five bins on the right and five bins on the left were averaged, prior to calculating the difference. A higher DI delta score indicates a stronger boundary. A Wilcoxon rank-sum test was performed between KO and WT samples using DI delta scores, and a p-value ≤ 0.05 considered significant. As a negative control, the same statistical test was performed on ~2900 TAD boundaries that do not overlap with deletions, and differences between WT and KO samples were assessed by the same statistical test, using a significance threshold of p ≤ 0.05.

We did not observe any other boundaries genome-wide that showed a significant difference in DI delta score that exceeded that of deleted boundaries B2, B3, B4, B6, and B8, for which we had observed significant changes in DI delta score. Twelve boundaries (0.4%) genome-wide showed changes that were significant (p ≤ 0.05) and were quantitatively the same or exceeded those observed at deleted TAD boundaries B5 and B7, for which we did not observe significant changes in DI delta score (p > 0.05; Fig. 3 and Supplementary Fig. 4. Interaction frequencies between genomic loci were additionally visualized on Juicebox (Supplementary Fig. 4)53,54.

RNA-seq and quantitative real time PCR

A panel of tissues including forebrain, midbrain, hindbrain, face, heart, upper and lower limbs and neural tube was collected in a standardized manner at E11.5 from homozygous mutants as well as littermate wild-type embryos for each of the TAD boundary deletion loci34. Samples were suspended in 100 μl of commercially available (Qiagen) RLT buffer. Total RNAs were isolated by using the Qiagen RNeasy Mini Kit (catalog no. 74104). A set of two relevant tissue types was further selected for each of the TAD Boundary deletion loci B2-B8 and processed for RNA-seq libraries in a standardized manner. Sequencing libraries were prepared by Novogene, and sequenced on an Illumina NovaSeq6000 (150 bp, paired-end). RNA-seq data was analyzed using the ENCODE Uniform Processing Pipelines (www.encodeproject.org/pipelines/) implemented at DNAnexus (www.dnanexus.com). Using the ENCODE RNA-seq (Long) Pipeline – 1 (single-end) replicate pipeline (code available from github.com/ENCODE-DCC/rna-seq-pipeline), reads were mapped to the mouse genome (mm10) using STAR align (V2.12). Genome-wide coverage plots were generated using bam to signals (v2.2.1). Gene expression counts were generated for gencode M4 gene annotations using RSEM (v1.4.1). Differential expression analyses were performed by using the DESeq program in the R Statistical Package bioconductor.org/packages/3.3/bioc/vignettes/DESeq/inst/doc/DESeq.pdf57,58. Statistically significant differentially expressed genes for relevant tissues for each TAD boundary deletion locus are listed in Supplementary Fig. 5, and Supplementary Data 5. RNA-seq experiments were performed on two biological replicates each for homozygous mutants, as well as wild-type controls.

Quantitative PCR analysis of key developmental genes in the vicinity of each TAD boundary in a larger panel of E11.5 tissues did not identify any additional significant changes in gene expression (Supplementary Fig. 6, and Supplementary Data 6). For the comprehensive panel of tissues collected, RNA was isolated as described above and cDNA was synthesized using Omniscript RT (Qiagen catalog no. 205111) per standard methods. qPCR assays were performed for at least two genes, each in TADs immediately flanking the deleted boundary. Taqman Assay reagents (Life Technologies) were used for all targets including genes that were used to normalize expression levels. Taqman assays (Roche Applied Science) with gene-specific primer sequences were generated using the manufacturer’s online algorithm and are listed in Supplementary Data 11. All amplicons span exon-exon junctions to prevent amplification of genomic DNA. 30 μl assays dispensed in TaqMan Universal PCR 2X master mix (Applied Biosystems) were performed on LightCycler 480 (Roche) according to manufacturer’s instructions. All Ct values were manually checked. Relative gene expression levels were calculated using the 2−ΔΔCT method59 normalized to the Actb housekeeping gene, and the mean of wild-type control samples was set to 1. At least three independent mutant samples and littermate and stage matched controls were assessed for each genotype/condition. We did not observe significant expression changes near deleted boundaries B3, B5 and B7. Considering RNA-seq analysis was performed on bulk tissue from a single developmental timepoint, we cannot exclude that expression changes are restricted to subsets of cells present in these tissues or may be more pronounced at other developmental timepoints.

In situ hybridization assay

To assess the expression of Tbx5 in developing lungs for control and TAD-boundary deletion mutants for locus B2, whole-mount in situ hybridization using digoxigenin-labeled anti-sense RNA probes was carried out on E11.5 (48 somites-stage) mouse embryos following established protocols60,61. Samples were treated with Proteinase K for 15 min; embryos were dissected to remove the hearts to expose the lungs and subsequently imaged with a Leica 125 C stereomicroscope with a Flexacam C1 camera (Supplementary Fig. 7).

Comprehensive mouse phenotyping

Directly relevant results are summarized in Fig. 4. Mutants and wild-type controls for four out of eight TAD boundary deletion loci (B3, B5, B6 and B7) underwent comprehensive phenotyping using a standardized pipeline at the Mouse Biology Program (MBP), University of California, Davis. The pipeline is part of the NIH-funded Knockout Mouse Phenotyping Project (KOMP2), a participant in the International Mouse Phenotyping Consortium (IMPC). Phenotyping tests are derived from the International Mouse Phenotyping Resource of Standardized Screens (IMPReSS), www.mousephenotype.org/impress36, all protocols and metadata are accessible at www.mousephenotype.org/impress/PipelineInfo. The KOMP2 phenotyping (Supplementary Figure 10) and statistical analysis methods are standardized37,38. Supplementary Data 7 summarizes other statistically significant (p < 0.005) results reported. Mutants and wild-type controls for the other three TAD boundary deletion loci (B2, B4 and B8) were phenotyped using standard methods for gross necropsy, organ weights and histopathology for all major organ systems at the Comparative Pathology Laboratory, University of California, Davis.

In vivo transgenic enhancer-reporter assay

Transgenic enhancer-reporter assays for the predicted lung enhancer (852 bp) were performed in a site-directed transgenic mouse assay using a minimal Shh promoter and lacZ reporter gene (Supplementary Fig. 8) at a non-endogenous, safe harbor locus50. The predicted enhancer region was PCR amplified from mouse genomic DNA; chr5:120101603–120102454 (mm10), CTGGGCTACAGGAAGTTGGA (forward primer), CAGAGGGCATGAGAGAGACC (reverse primer), 852 bp PCR amplicon. The PCR amplicon was cloned into a lacZ reporter vector (Addgene #139098) using Gibson assembly (New England Biolabs)62. The final transgenic vector consists of the predicted enhancer–promoter–reporter sequence flanked by homology arms intended for site-specific integration into the H11 locus in the mouse genome50. Sequence of the cloned constructs was confirmed with Sanger sequencing as well as MiSeq. Transgenic mice were generated using pronuclear injection, as described above for generating the TAD boundary deletion mice. F0 embryos were collected for staining at E11.5, E14.5 and E16.5.

β-galactosidase staining was performed in a standardized manner50. Briefly, embryos were washed in cold 1× phosphate-buffered saline (PBS) and fixed with 4% paraformaldehyde (PFA) for 30 min for E11.5 and 60 min for E14.5 and E16.5 embryos, respectively, while rolling at room temperature. The embryos were washed in embryo wash buffer (2 mM magnesium chloride [Ambion, catalog no. AM9530], 0.02% NP-40 substitute [Fluka, catalog no. 74385], 0.01% deoxycholate [Sigma-Aldrich, catalog no. D6750] diluted in 0.1 M phosphate buffer, pH 7.3) three times for 30 min each at room temperature and transferred into freshly made X-gal staining solution (4 mM potassium ferricyanide [Sigma-Aldrich, catalog no. P3667], 4 mM potassium ferrocyanide [Sigma-Aldrich, catalog no. P9387], 20 mM Tris, pH 7.5 [Invitrogen, catalog no. 15567027], 1 mg ml−1 of X-gal [Sigma-Aldrich, catalog no. B4252]). Embryos were incubated in the staining solution overnight while rolling at room temperature and protected from light. Embryos were then washed with 1× PBS three times for 30–60 min per wash and subsequently stored in 4% PFA at 4 °C. The embryos were genotyped for presence of the transgenic construct50. Only those embryos positive for transgene integration into the H11 locus and at the correct developmental stage were considered for comparative reporter gene activity across the three constructs tested. The exact number of embryos are reported in Supplementary Fig. 8.

Statistics and reproducibility

Statistical analyses are described in detail in the Methods. For Mendelian ratios, between 18–43 embryonic samples (TAD boundary locus B1) and between 101–343 (B1–8) mice at weaning stage were systematically collected, and Chi-squared test was used to determine significant deviations. At least two independent biological samples per condition were analyzed for RNA-seq, and at least three independent biological samples per condition were analyzed for qPCR at e11.5; a Likelihood-ratio test (DESeq) and t-test were used to determine significant differences for RNA-seq, and qPCR results respectively. At least two independent biological samples per condition were analyzed for Hi-C experiments, and a Wilcoxon rank-sum test was used to calculate p-values. Between 7 and 12 independent post-natal mice per condition were assessed in the standardized IMPC pipeline, and either a Wilcoxon rank-sum test, Fisher’s exact test or PhenStat Mixed Model or Reference Range tests were used as applicable. For TAD boundary B2, 20 stage- and litter-matched homozygous mutant-control pairs were assessed for gross lung morphology. Between 4–9 independent biological samples were assessed for the in vivo transgenic reporter assays showing enhancer activity in embryonic lungs. For the enhancer-, CTCF-, and enhancer+CTCF-specific deletions in the context of TAD boundary B2, between 3–8 independent mice per genotype were assessed for gross lung morphology. All statistics were estimated, and plots were generated using the statistical computing environment R Version 2022.12.0 + 353 (www.r-project.org).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Read more here: Source link