PhaeoEpiView: an epigenome browser of the newly assembled genome of the model diatom Phaeodactylum tricornutum

Despite being an established model, the genome and annotation of P. tricornutum are not concordant and the existing epigenetic resources generated so far lack a well-defined framework for accurate and user friendly utilization. To address this limitation, we sought to establish a coherent resource rendering the multiple genomic and epigenomic sequencing data sets exploitable from a single platform which we named PhaeoEpiView.

Because P. tricornutum recent genome assembly and annotation exhibit inconsistencies, PhaeoEpiView was built using two steps (i) Phatr3 gene annotation lifting onto the new 25 chromosomes assembly (GCA_914521175.1) and (ii) mapping of the previously published epigenetic marks and transcripts on the new assembly. Next, we proceeded with Phatr3 gene annotation lifting onto the new 25 chromosomes assembly (Phatr4). In the first step, instead of whole genome-based comparison, we adapted a gene-based sequence alignment for lifting the annotation from Phatr3 to Phatr4 assembly. Features such as mRNA, CDS and exons from the reference Phatr3 were used to infer genes and transcripts in target assembly. Unplaced genes and genes with extra copy number are tagged and separated (Table S2). Any remapped duplicated genes are also included in Table S2 for thoroughness. Out of the 12,178 genes from Phatr3 annotation, 11,739 were lifted successfully (Supplementary File 1). The new assembly is missing around 439 genes that were not mapped and lifted from the reference Phatr3 genome. But remapping of these 439 unaligned genes resulted in recovery of 295 genes. These are duplicated genes in the reference annotation that are assembled into one copy in the newly assembled genome. This 1.8% (144 genes) that are not found in the telomere-to-telomere (T2T) assembly are assembled into a unique set of genes. These non-transferable genes might be part of retrotransposons or repetitive element regions. We could recover these genes by allowing overlapped gene model remapping. This suggests that the telomere-to-telomere genome assembly represents a comprehensive and thorough assembly, and that our liftover gene annotation successfully captures all important functional genes.

In order to validate the lifted annotation, we aligned RNAseq reads to both the previous and the new genome version, then compared every gene quantification. Out of 8754 genes with a non-zero transcripts count, 5795 had a difference of quantification (Fig. S1) and 5836, a difference of length lower than ± 0.1% between Phatr3 and Phatr3_lift (Phatr4). Moreover, 7687 of these genes had a difference of quantification and 8609 of length below ± 5%. Missing genes were then examined: 178 out of 439 (40%) were found to be located on short regions that are no longer present in Phatr4 assembly according to whole-genome alignment provided in Giguere et al.17, half of them clustered on previous chr_5 and chr_21 (Table S2). Most of the remaining 261 missing genes showed similarity to already lifted genes suggesting that they are either duplicated or allelic. Both scenarios suggest the presence of haplotigs in the new assembly that are now collapsed leading to previously falsely duplicated genes competing for the same genomic location on the T2T assembly. Consequently, this led to incorrect interpretation of duplicated genes or genomic regions as two distinct copies generating artificial duplications in downstream analysis.

In the second step, transposable elements annotation available from Giguere et al.17 was added to PhaeoEpiView as Phatr4 TEs track. Finally, previously published expression data at two different time points, DNA methylation and PTMs tracks were implemented in the browser, and systematic comparison was made with the previous assembly mapping using unchanged regions as anchors (Supplementary File 1). Our DNA methylation data obtained through bisulfite conversion and MCrBC digest was compared to DNA methylation data from Nanopore sequencing. The results demonstrated an overall good coefficient correlation indicating consistency between the three data sets (Fig. S2).

For more accuracy and homogeneity of the used data, chromatin immunoprecipitation with deep sequencing was carried out using commercially available monoclonal antibodies to replace two marks, H3K27me3 and H3K9me3 for which polyclonal antibodies were used in the previous study9 (Fig. 2A–D, Fig. S3A,B). With few exceptions, H3K27me3 antibody used in this study provided an overall similar pattern of distribution over the genome supported by a strong Pearson’s correlation coefficient (0.80) (Fig. S3A). We used ChIP-qPCR and validated randomly chosen loci which further supports the accuracy of H3K27me3 mapping (Fig. 2C). H3K27me3 polyclonal antibody was previously tested in a peptide competition assay and showed very little cross reactivity which is in line with the strong correlation coefficient between the two mapped datasets (26, this study). However, 69 genes and 127 TEs were not recovered using H3K27me3 monoclonal antibody indicating that these regions were false positives in the previous mapping using the polyclonal antibody (Fig. 2A).

Figure 2
figure 2

(A) Venn diagram of H3K27me3 on genes (left) and TEs (right). (B) Venn diagram of H3K9me3 on genes (left) and TEs (right). (C) ChIP-qPCR validation of randomly chosen loci for H3K27me3 (C) and H3K9me3 (D). Mock values are indistinguishable from 0 in (C). (E) The enrichment profile of H3K4me2, H3K4me3, H3K9_14Ac, H3K9me2, H3K9me3MonoAb, H3K27me3MonoAb along genes (upstream TSS, coding region, downstream TES). Average tag density is the number of sequence reads per gene.

Using a monoclonal H3K9me3 antibody, the current study revealed a negative correlation with the previous work that employed a polyclonal antibody, demonstrating a higher specificity of the former (Fig. S2D). The polyclonal antibody which was not tested for cross reactivity is likely recognizing other epitopes/marks such as H3K9me1/me2. Randomly chosen loci, targets of H3K9me3 monoclonal antibody were validated by ChIP-qPCR indicating the reliability of chromatin immunoprecipitation and mapping (Fig. 2D).

Using monoclonal H3K9me3 antibody revealed a different pattern of distribution of genes and transposable elements. H3K9me3 monoAb covered fewer genes and transposable elements which is expected considering the specificity of the antibody. H3K9me3 targeted mainly genes compared to the previous mapping where the majority of targets were TEs. A total of 323 genes and 664 TEs were shared between the two antibodies (Fig. 2B). Both H3K9me3 and H3K37me3 are broad marks and do not have clearly defined peak summits compared to more localized marks such as lysine9/14 acetylation of histone H3 (Fig. 2E). For both marks, peaks overlapped significantly downstream transcriptional start sites (TSS) over gene bodies (Fig. S3E,F).

Considering the differences in targeted genic regions using H3K9me3 monoclonal antibody compared to previous mapping, we looked at the effect of H3K9me3 localization on transcriptional regulation of genes and TEs. First, we considered H3K9me3 only marked genes and TEs, and found a clear repression when TEs are marked by H3K9me3 with levels comparable to H3K27me3 marked TEs, while genes seem to show moderate downregulation compared to marks known to be repressive such as H3k27me3 (Fig. 3A,B). When we consider H3K9me3 co-occurrence with other histone marks, there is a downregulation of genes when it co-occurs with repressive marks and an upregulation when co-localized with permissive marks (Fig. 3C). Overall, H3K9me3 seems to be a repressive mark in P. tricornutum.

Figure 3
figure 3

Expression profiles of genes and TEs marked by different histones marks in P. tricornutum. Expression levels of TEs (A) and genes (B) marked by H3K9me3 or H3K27me3 alone or combined with repressive or permissive marks. (C) Co-occurrence analysis of epigenetic marks in P. tricornutum. A co-occurrence plot for the ChIP-seq peaks of different histone modifications. For each set of intersection, a black filled circle is drawn. The size of the intersections is depicted in vertical bar. The vertical black line connects the different dataset that intersect. The number of genes and TEs for each chromatin state is shown on the top of the bars.

PhaeoEpiView was implemented as a Jbrowse2 instance34 and made public on a virtual machine hosted at Nantes University datacenter (Fig. 1). It can currently display one track for each of the genes, TEs, transcript levels, McrBC, Bisulfite-seq and Nanopore DNA methylation tracks and five histone PTMs (H3K9/14Ac, H3K4me2, H3K4me3, H3K9me2, H3K9me3, H3K27me3). The browser will be regularly updated with relevant epigenomic data when published in the future, turning PhaeoEpiView into a dynamic platform for a comprehensive genomic and epigenomic resource of the model microalgae P. tricornutum.

Read more here: Source link