Retinoic acid affects the transcriptome of zebrafish endodermal cells
To identify genes regulated by RA in zebrafish endodermal cells, we used the transgenic Tg(sox17:GFP) line which drives GFP expression in endodermal cells and allows their selection by fluorescence activated cell sorting (FACS). Tg(sox17:GFP) embryos were treated either with RA, BMS493 (pan-RAR inverse-agonist) or DMSO (control) from 1.25 to 11 hpf, a time window that covers the whole blastula and gastrula periods. Endodermal GFP+ cells were next selected by FACS from embryos at 3-somites (3-S) and 8-somites (8-S) stages (11 hpf and 13 hpf, respectively). Non-endodermal (GFP−) cells were also selected from the DMSO-treated control embryos in order to compare with GFP+ endodermal cells and identify genes displaying endodermal enriched expression. RNA-seq was performed on all these FACS-isolated cells prepared in triplicate (24 samples in total) and transcriptomes were analysed using the bioinformatic pipeline as described in “Materials and methods”. Principal component analysis of all RNA-seq data (Fig. 1A) shows (i) a tight clustering of all triplicate samples confirming a high reproducibility in the experiment, (ii) a strong difference between the transcriptome of endodermal and non-endodermal (NE) cells (discriminated along the first axis of the PCA plot), (iii) relatively similar transcriptomes of cells isolated at 3-S and 8-S stages, and (iv) a clustering of BMS493 samples near DMSO samples indicating a much weaker effect of BMS493 treatments compared to the RA treatments. These conclusions were further confirmed by the differential gene expression analyses between the different conditions. Indeed, more differentially expressed genes were identified between endodermal and non-endodermal cells (1370 and 1410 differentially expressed genes at 3-S and 8-S stages, respectively with a FDR < 0.01) than between endodermal cells treated with RA versus DMSO (756 and 514 RA-regulated genes at 3-S and 8-S stages, respectively with FDR < 0.01) or with BMS493 versus DMSO (32 and 71 BMS493-regulated genes at 3-S and 8-S stages, respectively). We found a large overlap among the sets of genes having an endodermal-enriched expression at 3-S and at 8-S stages (Fig. S1A; list of genes given in Table S1) and these sets include all known endodermal markers including sox17, gata5/6 and foxa1/2/3, validating the accurate sorting of endodermal cells. RA-regulated genes consist of a large set of up- and down-regulated genes (Table S2), many of them being regulated at both 3-S and 8-S stages (Fig. S1B). These RA-regulated genes include known RAR-direct targets such as cyp26b1/a1, dhrs3a, nr2f2 and several hox genes, validating our protocol and the of RA treatments. Interestingly, BMS493 treatment led mostly to down-regulation of gene expression. Indeed at 3-S stage, the 32 BMS493-regulated genes were all repressed, and, at 8-S stage, 68 genes were repressed while only 3 genes were up-regulated by BMS493 treatment (Tables S3). A large overlap was also observed between the genes down-regulated by BMS493 at 3-S and at 8-S stage (Fig. S1C). As expected, a large proportion of genes down-regulated by BMS493 were up-regulated by RA treatment, this observation being evident mostly at 3-S stage where 72% of genes repressed by the RAR inverse-agonist were induced by RA, while this proportion decreased to 40% at 8-S stage (Fig. 1B,C). Tables 1 and 2 show the genes significantly up-regulated by RA and down-regulated by BMS493 as well as those enriched in the endoderm at 3- and 8-somites stage, respectively (shown in bold). gata6, insm1a and ascl1b are the only known pancreatic regulatory genes which were regulated by both RA and BMS493 (Table 1). Other pancreatic transcription factors, like mnx1, insm1b, hnf1ba, nr5a2 or neuroD1, were induced by the RA-treatment but not significantly repressed by BMS493. Inversely, other pancreatic regulators were inhibited by BMS493 but not significantly induced by RA like pdx1, rfx6 and myt1b (Tables S3 and S6). We can assume that the induction of pancreatic fate by RA (and the absence of pancreas upon BMS439 treatment) is mediated, at least in part, by the direct or indirect regulation of these pancreatic regulatory factors. In conclusion, these RNA-seq data highlight all the genes with an enriched expression in zebrafish endodermal cells and which are regulated by RA signalling. This gene set notably includes regulators involved in the AP patterning of the endoderm, such as hox genes, and known factors involved in the specification of pancreatic progenitors.
Identification of RAR binding sites in the zebrafish genome
To further identify the genes directly regulated by RAR and determine if some pancreatic regulatory genes are direct targets of RA signalling, we performed ChIP-seq experiments at the end of gastrulation. In absence of validated commercialized ChIP-grade antibody recognizing zebrafish RAR, we chose to express a tagged RARaa in zebrafish gastrulae by injecting zebrafish fertilized eggs with the mRNA coding for the zebrafish RARaa protein fused to a Myc-tag at its C-terminal end. RARaa was chosen as the RNA-seq data indicated that it is the most highly expressed RA receptor in zebrafish endodermal cells. We injected very small amount of this mRNA (about 30 pg) in order to avoid high non-physiological levels of RARa protein within embryos; the development of embryos were not affected by these injections. Chromatin was prepared at 11.5 hpf (3 somites stage) from about 2000 injected zebrafish embryos and immunoprecipitation was performed with a ChIP-grade Myc antibody. Comparison of reads obtained with the Myc-RAR ChIP and the input negative control led to the identification of 4858 RAR peaks. In order to identify bona fide RAR binding sites showing strong affinity, we selected all peaks with a height score above 50 (Table S4). By choosing such criteria, 2848 robust RAR binding sites were identified in the zebrafish genome. As shown in Fig. 2A, a majority of these sites are located near or within genes: 8% were found in promoters (i.e. 1 kb upstream of the gene TSS, Transcription Start Site), 30% in upstream sequences (from 1 to 10 kb), 22% in introns, 4% in exons, and only 33% in intergenic regions. Sequence analysis of all RAR peaks revealed that the highest represented motif corresponds to the Direct Repeat of the RGKTCA motif separated by 5 bases (reported as DR5) and being the RAR/RXR consensus binding sequence (present in 39% of identified RAR peaks) (Fig. 2B)1. The next most abundant motifs are also repetitions of the RGKTCA motif with different spacings (TR4 being a Direct Repeat separated by 1 and 2 bases : DR1/DR2) and orientations (Rxra being an Inverted Repeat with no base separation : IR0). Furthermore, many RAR peaks were found near genes reported to be RAR-direct target genes in other species, such as cyp26a1, dhrs3, nr2f2 or the hoxb1a-hoxb4a genomic region (Fig. 2C and data not shown). All these observations confirm the accuracy of the ChIP-seq data. Interestingly, many identified zebrafish RAR sites are located in evolutionary conserved genomic sequences as shown by the fish PhastCons track (Fig. 2C and see below).
The RAR ChIP-seq peaks located within 250 kb from the TSS of a gene were assigned to this gene and when several genes were lying in the vicinity of a RAR site, the closest gene was considered as the putative RAR-regulated gene. Using this strategy, amongst the 2848 RAR sites, 2144 were linked to a gene. Correlation analysis between RA gene expression regulated by RA and the number of RAR sites near the gene showed that RAR acts mainly as a transcriptional activator (Fig. 3A), supporting the classical model where RAR/RXR heterodimers mainly recruit co-activators upon RA ligand binding1. However, this correlation is not very high and genes down-regulated by RA can harbour nearby RAR sites. Indeed, from the gene set having RAR sites, 61 were down-regulated and 94 genes were up-regulated by RA at 3-S stage (Fig. 3B, Tables S5). Among the genes up-regulated by RA and harbouring a RAR site, we identified the pancreatic regulatory genes hnf1ba/b, gata6, insm1b, jag2b and mnx1 indicating that these genes are direct targets of RA. As for the genes down-regulated by BMS493, we found that a large number of them (i.e. 18 genes out of 32) contain RAR sites and amongst them 13 are also upregulated by RA (Fig. 3B, see legends for gene names). In conclusion, the ChIP-seq data allowed us to define the zebrafish RAR cistrome at the end of gastrulation and identify putative RAR direct target genes.
Conservation of RAR binding sites from fish to mammals
Functionally important regulatory regions are expected to be conserved during evolution. To determine which RAR binding sites are conserved in vertebrates, we compared our zebrafish ChIP-seq data with those of Chatagon and colleagues19 who identified RAR binding sites in the murine genome using the F9 embryonal carcinoma cells whose differentiation into primitive endodermal cell is induced by RA treatment. This comparison revealed that, among the 2144 zebrafish genes harbouring a RAR site, 722 have also a RAR site near the murine orthologous genes. This list of conserved RAR-bound genes comprises notably cyp26a1/b1, dhrs3a, nr2f2, many hox genes, raraa/b as well as pancreatic genes gata6, hnf1ba, insm1 and mnx1. We next determined which of these RAR binding sites are located in conserved regulatory sequences. To that end, we retrieved the list of conserved non-coding elements (CNEs) identified in zebrafish by comparing multiple fish and tetrapod genomes34. Amongst the 722 conserved RAR-bound genes, 116 RAR binding sites were located in CNEs, supporting a regulatory function (Table S6). Amongst them, 24 CNEs were even conserved from fish to mouse and are called here HCNE for highly conserved non-coding elements. They are found for example near the meis1/2, srsf6, qki, nrip1 and ncoa3 genes (Table S6). As already reported, several RAR sites controlling the expression of hox genes are located in these HCNEs17. Interestingly, we identified here a novel RAR site in a HCNE located 25 kb downstream of the zebrafish hoxb cluster in the fourth intron of the skap1 gene. This intron contains 4 strong RAR sites in the zebrafish and murine sequences (green boxes in Fig. 4A,B) and the sequence of the second RAR site has been maintained throughout vertebrate evolution and shows a motif similar to a DR5 RARE (Fig. 4C). The transcriptional regulatory function of this HCNE was tested by transgenesis by inserting one copy of this element upstream of a minimal cfos promoter driving GFP. As shown in Fig. 5A, this reporter transgene is expressed in the gut and in the neural tube. Highest GFP levels were detected in the posterior hindbrain, in a pattern highly reminiscent of hoxb1b gene expression (shown on Fig. 5D). Furthermore, when transgenic embryos were treated with exogenous RA, GFP expression was drastically increased and detected in the whole morphologically affected embryos in a similar manner to hoxb1b (Fig. 5B,E). Conversely, treatments with the RA inverse-agonist BMS493 turned off the expression of this DR5-RAR-skap1:GFP transgene (Fig. 5C) as well as of hoxb1b, except in the tailbud region (Fig. 5F). This confirms the RARE function of this element and indicates a role in hoxbb cluster regulation.
We also verified the regulatory role of some other RAR sites identified near zebrafish pancreatic regulatory genes. For example, we tested the activity of a RAR site located near a CNE downstream of the zebrafish mnx1 gene. A RAR binding site is also found at a similar location downstream of the murine Mnx1 gene (Fig. S2A). This RAR element was able to target GFP expression to the posterior endoderm at low level and in the dorsal pancreatic bud at higher levels (Fig. S2B–D). When the RAR-mnx1:GFP transgenic embryos were treated with RA, GFP expression was slightly increased in whole endoderm; conversely, treatment with BMS493 abolished GFP expression (Fig. S2D–F).
In conclusion, these analyses show that a large fraction of RAR sites has been conserved during evolution and transgenic assays confirm that some elements are sufficient to drive expression in endoderm and confer a RA-response.
RA affects chromatin accessibility in endodermal cells
RAR–RXR complexes control gene expression through the recruitment of corepressors (e.g. NCoR/Smart) and coactivators (NCoA) which control chromatin compaction via HDACs and HATs, respectively2, 3. Thus, one possible strategy to identify functional RAREs is to perform a genome-wide analysis of chromatin accessibility modifications following RA or RA inverse-agonist treatments by ATAC-seq assays33 allowing the identification of open chromatin and nucleosome-free regions induced or repressed by RA signaling. As most nucleosome-free regions correspond to regulatory sequences, ATAC-seq can also highlight the enhancers or promoters whose accessibility is modified by RA signalling indirectly, i.e. through the binding of transcription factors whose expression is induced by RA. Thus, sequence analysis of all RA-induced ATAC-seq peaks can give clues on the identity of transcription factors acting in the subsequent steps of the RA-induced regulatory cascade.
As for the RNA-seq assays, zebrafish embryos were treated with RA, BMS493 and DMSO during blastula and gastrulation and about 10,000 endodermal cells were selected by FACS at 3-somites stage (11 hpf). Non-endodermal cells from control DMSO-treated embryos were also analysed in parallel. Cell preparations and ATAC-seq experiments were done in triplicate for each condition and analysed as described in “Materials and methods”. We first verified the accuracy of the data by several quality control analyses. First, for all samples, the analysis of the ATAC-seq fragment size distribution revealed the expected pattern with abundant short (< 150 bp) fragments corresponding to nucleosome-free regions and larger fragments of about 200 and 400 bp corresponding to mono- and bi-nucleosome regions, respectively (Fig. S3A). Secondly, as reported previously33, 35, genome mapping of the nucleosome-free fragments showed a clear enrichment in promoter regions immediately upstream of transcriptional start sites (TSSs), while mono-nucleosomes were depleted from TSSs and rather mapped just downstream of the TSSs in a periodic manner (Fig. S3B). Thirdly, we verified that the ATAC-seq fragments correspond to many zebrafish regulatory regions by comparing them with regions harbouring the histone modifications H3K4me3 and H3K27ac marks identified in zebrafish embryos at 10 hpf36. Heatmap plots of ATAC-seq reads from all samples showed an obvious enrichment at loci harbouring these two histone modifications (Fig. 6A and Fig. S4). As regulatory regions often display sequence conservation, we also compared our ATAC-seq reads to the collection of zebrafish evolutionary-conserved non-coding elements (zCNEs)34. Heat-maps of ATAC-seq reads from each sample also showed a strong correlation with zCNEs (Fig. 6A and Fig. S4). These observations confirm that regions identified by ATAC-seq exhibited hallmarks of active regulatory elements. The reproducibility of ATAC-seq data was also analysed by PCA (Fig. 6B). This PCA showed that (i) triplicate samples are tightly clustered, and (ii) endodermal and non-endodermal (NE) cell clusters are separated along the PC2 axis, while the RA-treated cluster is separated from the DMSO- and BMS-cell cluster along the PC3 axis. Thus, as observed for the RNA-seq data (Fig. 1A), stronger differences are observed between GFP+ and GFP− cells compared to the differences between RA-treated and DMSO-treated endodermal cells. The samples corresponding to the BMS493-treated cells and DMSO-treated cells were not clearly segregated and did not reveal significant effects of BMS493 on the chromatin accessibility.
From all 12 ATAC-seq samples, a total of 156,604 nucleosome-free regions were identified within the zebrafish genome. Differential peak intensity analyses revealed that 9722 and 12,974 regions are more accessible in endodermal and non-endodermal cells, respectively (with FDR < 0.05, Tables S7). Interestingly, sequence analysis of all endodermal-specific ATAC-seq regions highlight a significant enrichment of the binding site motifs for the Gata, Fox and Sox protein families (Fig. 6C), in accordance with the well-known function of Gata4/5/6, Foxa1/2/3 and Sox32/17 in endodermal cell differentiation6. Also, these endoderm-enriched ATAC-seq peaks are often identified near endodermal and pancreatic regulatory genes, such as in the foxa2, nkx6.1, hnf4g, sox17 or mnx1 loci (Fig. S5 and data not shown), suggesting the presence of endoderm-specific enhancers at these locations.
As expected, the RA treatments had less influence on ATAC-seq peaks compared to the cell type identity (i.e. endoderm versus non-endoderm). Still, 1240 genomic regions were found to be more accessible and 749 regions were less accessible in the RA-treated endoderm compared to the DMSO-treated controls (with FDR < 0.05). If the RA-treated samples were directly compared with the BMS493-treated samples, more peaks were detected as treatment-dependent: 3277 regions were more accessible in RA-treated while 1762 were more accessible in the BMS493-treated cells (Tables S8). Annotation of the RA-induced ATAC-seq peaks to the closest gene revealed that they were often located near RA-upregulated genes identified above by RNA-seq. Moreover, we found a significant correlation between the level of RA-induced gene expression and the number of RA-induced ATAC-seq elements (Fig. 6E). Interestingly, 11% of RA-induced ATAC-seq peaks corresponded to RAR binding sites identified by ChIP-seq and possess the DR5 motif recognized by the RAR/RXR complex (Fig. 6D). This was the case for RA-induced peaks in the dhrs3a, cyp26a1/b1 and insm1b genes (see black boxes in Fig. S6 and data not shown). However, many identified RAR sites did not display a significant increase in accessibility, such as those located in the hoxba locus (see green boxes in Fig. S6). Furthermore, the majority of RA-induced ATAC-seq peaks did not harbour RAR binding sites although they were usually found near RA-upregulated genes; instead, such peaks often harbour Gata or Hnf1b binding motifs (38% and 13% of all RA-induced elements, respectively; Fig. 6D). Thus, it can reasonably be assumed that these genes are indirectly stimulated by RA signalling. For example, the pancreatic regulatory genes pdx1, insm1a, rfx6 or neurod1, all upregulated by RA-treatment but devoid of RAR binding sites, had RA-induced ATAC-seq peaks located in their genomic neighbourhood and contained either Hnf1b or Gata binding motifs in such peaks (Fig. S7 and data not shown). To identify the GATA and Hnf1b family members involved in these indirect RA-regulations, we searched in the list of RA-stimulated genes (Table S2) for these 2 types of transcription factors and we also determined if RAR sites are detected in their genomic loci (Tables S4 and S5). hnf1ba and hnf1bb were both induced by RA but the level of expression of hnf1ba was about 200-fold higher than hnf1bb in endodermal cells. Furthermore, hnf1ba contains three RAR binding sites located in evolutionary-conserved and nucleosome-free regions (Fig. 7A), while hnf1bb has only one weak RAR site (data not shown). Out of the 10 members of the GATA family, only gata4 and gata6 were significantly induced by RA but gata6 was expressed in the endoderm at about 40-fold higher level compared to gata4. Furthermore, gata6 has a high affinity RAR binding site in a nucleosome-free region located about 30 kb upstream from its TSS (Fig. 7B) while no such RAR site was present around the gata4 gene locus (data not shown). Thus, all these analyses strongly suggest that, upon RA induction, RAR–RXR complexes directly activate expression of Gata6 and Hnf1ba, which in turn will open regulatory chromatin regions of many genes, including those coding for several pancreatic regulatory factors.
Read more here: Source link