A first insight into the Polish Bochnia Salt Mine metagenome

Sequencing statistics

NGS was used across the V3-V4 region of the 16S rRNA gene to sequence 24 environmental samples. The mean number of ASVs per sample obtained after DADA2 was 72,050 ± 31,480, ranging from 22,568 to 143,267. From a total of 1,729,209 contigs remaining after quality control and removal of chimeras, a total of 3655 unique ASVs were obtained. These ASVs represented 46 phyla, 93 classes, 209 orders, 352 families, and 635 genera within the domains of bacteria and archaea.

Sampling season comparison

To compare possible changes between sampling seasons, the results of 16S rRNA amplicon sequencing are presented for 17 samples that were selected from sites analyzed in both seasons. The Bray–Curtis distance was used for Principal Coordinate Analysis (PCoA); the results indicate that the samples were clustered based on sampling sites (Fig. 2). Only samples from the RDB site (purple) were not placed in a homogeneous cluster. However, clustering by season was not demonstrated. A comparison of Bray–Curtis distances showed no statistically significant difference between seasons (pseudo-F = 1.020224, p = 0.381). The difference between locations was globally significant (pseudo-F = 3.67932, p = 0.001). The pairwise tests identified significant differences between RDG and RA10 (pseudo-F = 9.047075, p = 0.043), RDG and RDB (pseudo-F = 3.361130, p = 0.023), and RDB and RD10 (pseudo-F = 3.245796, p = 0.029). No statistically significant differences were found between locations based on the Shannon index. The results suggest that the microbiome of each site was stable over time, allowing samples from both seasons to be used to compare the microbiome of the mine sampling sites in the further analyses.

Fig. 2
figure 2

PCoA calculated on Bray–Curtis distance for samples collected in a single location in both seasons. Yellow—RDG, blue—RA6, orange—RD10, purple—RDB, green—RD13, red—RA10

Sampling site and mine level comparison

The microbiomes of all 24 brine samples from individual sampling sites were compared. PCoA based on Bray–Curtis distances confirmed partial clustering of samples depending on the sampling site (Figure S1). Separate clusters were formed by samples from the RDG and RD10/Wernier sites. The results also indicate that samples from other sampling sites formed one common, mixed cluster. No statistically significant differences were found between individual sites or mine levels based on the Shannon index. However, analysis based on the Bray–Curtis distance revealed a statistically significant difference at the global level (pseudo-F = 2.707996, p = 0.001), pairwise differences between Level 1 and Level 3 (pseudo-F = 3.633673, p = 0.002) and between Level 1 and Level 4 (pseudo-F = 2.268532, p = 0.015). Figure 3 shows the biplot projecting the taxon abundance onto the principal component matrix generated from the Bray–Curtis distance. The taxa most strongly influencing the location of samples on the PCoA were Halobacteria, Gammaproteobacteria, and Clostridia, at the class level, and Halomicrobiaceae, Thiohalorhabdaceae, and Halomonadaceae at the family level.

Fig. 3
figure 3

Biplot projecting taxonomic abundance at (a) class and (b) family level onto a principal component matrix (PCM) generated based on the Bray–Curtis distance

ANCOM analysis showed that microorganisms belonging to the Chlamydiae class of the phylum Verrucomicrobiota were significantly more abundant in the samples from level 1 of the mine than other levels, where the members of this class were almost absent. The mean content of the Chlamydiae sequence was 3.5% in samples from level 1 and 0.04% for level 4; the sequence was absent from level 3. The highest occurrence of this taxon was observed in samples from the RD13 site, where Chlamydiae accounted for 10.07% of all ASVs in the 2017 season, and 12.28% in the 2019 season. Samples from the RDB and RDG sites on level 1, like the other mine levels, had low levels of ASVs belonging to the Chlamydiae (< 1%). Within this class, most sequences belonged to the Simkaniaceae, but sequences belonging to families such as Omnitrophaceae or Puniceicoccaceae were also found at lower frequencies. Samples from locations RA3, RD13, and RDB had an increased proportion of Desulfobacteraceae representatives in the microbiota, and samples from RD13 and RDB sites additionally contained more Thioalkalispiraceae than other locations. The latter are microorganisms that require sulphur for growth, and their presence may be related to the structure of the mine rocks, especially the presence of anhydrite which may be a source of this element.

Samples from level 1 and level 4 of the mine contained mainly bacterial ASVs (74.8% ± 12.2% and 61.5% ± 14.8%, respectively), while samples from level 3 had a more balanced microbiome, consisting mainly of archaea (52.2% ± 7%) and bacteria (47.4% ± 7.1%). The two predominant phyla were Proteobacteria and Halobacterota. The samples from the first level of the mine contained the most Proteobacteria (31.7% ± 18.7%); the main contributors were the two samples from the RDB site, which contained 81.9% and 62.9% of Proteobacteria, respectively. In contrast, the samples from levels 3 and 4 contained 22.5% ± 10.1% and 29.7% ± 10.7% of Proteobacteria, respectively. The mean prevalence of Halobacterota was 19.5% ± 10.4% for samples from level 1, 42.3% ± 6.2% for samples from level 3, and 30.0% ± 13.8% for samples from level 4.

It is also worth noting that all samples contained bacteria belonging to the phylum Desulfobacterota, capable of metabolizing sulphur compounds. This taxon was most abundant in the first mine level, and particularly the sample from the RDB sampling site (31.5%), where the mean value for this level was 9.7%. Most of the Desulfobacterota in the RDB site were found to belong to the genera Desulfovermiculus and MSBL7; these were commonly accompanied by the genus Bradymonadaceae in other samples. The phyla Nanohaloarchaeota and Firmicutes were also present in all samples, both at an average level of 6.4%.

Samples from the RDG site were characterized by a high percentage of Halanaerobiaeota: its mean prevalence was 18.7%, compared to 3.7% for all samples. The main genera in the Halanaerobiaeota were Halanaerobium and Halanaerobacter, which were responsible for most of the ASVs assigned to this phylum. One interesting case was presented by the sample from the RA6 site, where an important element of the microbiome was the phylum Patescibacteria, comprising microorganisms with highly reduced genomes (< 1Mbp). In this sample, Patescibacteria accounted for 21.6% of the total ASVs, while the mean value for all samples was 3.5%.

In the RA6 sample, the dominant species within the Patescibacteria was Candidatus Falkowbacteria, which accounted for 18.6% of the microbial composition of the site. The taxonomic composition of the samples is shown in Fig. 4.

Fig. 4
figure 4

Taxonomic composition of samples at phylum level. Samples are sorted by salt mine level and sampling sites

In total, 46 phyla were identified throughout all the 24 analyzed samples. Nine of these phyla were represented in all samples, making up the main part of their microbiome: Halobacterota, Proteobacteria, Bacteroidota, Desulfobacterota, Nanohaloarchaeota, Firmicutes, Patescibacteria, Halanaerobiaeota, and Acetothermia. Only two of these common phyla: Nanohaloarchaeota and Halobacterota, belonged to the Archaea. Moreover, the presence of Patescibacteria in all samples may be of interest since they are microorganisms with strongly reduced genomes, which is associated with limited possibilities of adaptation to stress conditions (Tian et al. 2020).

The taxa present in all samples comprised 14 of 96 classes, 20 of 209 orders, 22 of 352 families, and 26 of 635 genera. The genera found in all samples included Natronomonas, Nanosalinaceae, Thiohalorhabdus, Salinisphaera, Halofilum, Haloplanus, Halobacterium, Halolamine, Desulfovermiculus, Bradymonadaceae, Alcanivorax, Halodesulfurarchaeum, Acetothermiia, Abscondimielaslaas, Pseudohlaxla, and Abscondimielaslaas. The other genera were non-cultivable bacteria (designated unculturable in the SILVA database) belonging to the families Balneolaceae, Halomonadaceae, Balneolaceae, Halomicrobiaceae, and Micavibrionaceae or to the orders and classes (lowest level designated) Thermovenabulales, Chitinophagales, and Bacteroidia. At genus level, the most common taxa also belonged to the Bacteria (19 taxa) and only a small part were representatives of the Archaea (seven taxa); this is interesting as the very high salinity of the brine should favour the presence of extreme halophiles from the Archaea domain. The most representative phylum was Proteobacteria, which included 10 of the designated 26 genera. The presence of numerous taxa with sulphur-based metabolism may be associated with the occurrence of anhydrite in the mine (Garlicki 2008).

Metabolic pathway analysis

The ASVs were annotated with BioCyc ID metabolic pathways using the PICRUSt2 tool. PCoA based on Bray–Curtis distance was used to cluster samples in three main groups, as previously with the taxa: one group contained samples from the Wernier and RA10 sites; a second group from the RDG site, with one sample from the 2019 season being an outlier, unlike the analysis at the ASVs level; and a third group that included the remaining samples, although in this case a clear clustering pattern was observed (Figure S2).

Based on the biplot, five main pathways influencing the placement of samples on the PCoA were distinguished: PWY-5304 (sulphur oxidation superpathway (Acidianus ambivalens)), FAO-PWY (fatty acid β-oxidation I (generic)), PWY-7904 (tRNA- uridine 2-thiolation (thermophilic bacteria)), PWT-5971 (palmitate biosynthesis II (type II fatty acid synthase)), ANAEROFRUCAT-PWY (homolactic fermentation) (Fig. 5). The samples from the Wernier and RA10 sites demonstrated increased abundance of the PWY-5304, PWY-7904, and PWT-5971 pathways, while samples from the RDG site favoured the ANAEROFRUCAT-PWY. ANCOM analysis indicated that the samples from level 4 of the mine had higher participation of the P162-PWY, i.e. L-glutamate degradation V (via hydroxyglutarate), and PWY-5177 (glutaryl-CoA degradation) pathways than the other samples. The samples from the RA6 site demonstrated the highest share of the PWY-5177 (glutaryl-CoA degradation) pathway. In contrast, samples from RA3, RA6, RD13, RDG, RDB sites were characterized by increased levels of the P161-PWY, i.e. acetylene degradation (anaerobic) pathway. The only pathway present in samples from the RDB site was PWY-7046, i.e. the 4-coumarate degradation (anaerobic) pathways. In addition, samples from sites RA10, RD10, RD3, and Wernier had lower levels of the PWY-6749 (CMP-legionaminate biosynthesis I) pathway than the others.

Fig. 5
figure 5

Biplot projecting metabolic pathway abundance onto a principal component matrix (PCM) generated from the Bray–Curtis distance. Red—RA10 (level 4), cadet blue—Wernier (level 3), light blue—RDB (level 1), purple—RD13 (level 1), dark blue—RA3 (level 4), green—RD10 (level 1), yellow—RD3 (level 1), orange—RA6 (level 4), pink—RDG (level 1)

Core feature analysis showed that 305 of 415 (73.5%) of the identified metabolic pathways were common to all samples. At least half of the samples contained 364 of the 415 lanes (87.7%). Based on the Shannon index, a statistically significant difference was demonstrated between mine level 1 and mine level 3 (KW global test result p = 0.007 and q = 0.008 for this pair), with samples from level 3 showing significantly lower path diversity than samples from level 1. This observation was also confirmed by the Bray–Curtis distance (p = 0.003; q = 0.003). This relationship was also evident in PCoA.

Shotgun metagenomics analysis

Four samples were included in the analysis: W1 from the first level (Danielowiec), WSO4 from the second level (Sobieski), POP27 from the third level (Wernier), and W81 from the sixth level (Sienkiewicz). The total length of the metagenomes ranged from 25,265,906 to 71,844,617 bp. Each was composed of between 12,477 and 25,733 contigs longer than 1000 bp. The range of GC content was 52.0–61.7%. The number of CDS ranged from 14,004 to 49,587. The statistics concerning de novo assembly and binning are presented in Table S1, and the taxonomic assignment of reads at the phylum level is presented in Table 1.

Table 1 Taxonomic composition of metagenomic samples at the phylum level. The category “Others” contains reads assigned to viruses and phyla with an abundance lower than 0.1% in each of the samples. Unassigned reads were excluded from taxa abundance calculation

The results indicate the dominance of Euryarchaeota with its prevalence ranging from 62.63% in W1 to 92.67% in W81. The second most represented phylum was Proteobacteria (1.79%—W81, 17.34%—POP27). Moreover, in sample W1, where the greatest diversity of taxa was observed, a significant proportion of Actinobacteria (2.89%) was found. At the genus level, all samples were characterized by a microbiome composition typical of high-salinity environments. Almost all identified genera belonged to the Halobacteria family, with Halorubrum being the most abundant genus: it was the most abundant genus in POP27, WSO4, and W81 samples, with 12.1%, 14.5%, and 16.1%, respectively. In sample W1, the most common genus was Natronomonas, accounting for 4.7% of the microbial community, and Halorubrum was the second most common, with an abundance of 3.9%. Natronomonas was also present as an important part of the microbiomes of the other samples. Another important taxon was Halobacterium, which was detected in all samples, and accounted for 3.3 to 16.1%.

Sample W1 was found to demonstrate a unique microbiome composition. First, more than 25% of the microbial community in this sample comprised bacteria, which was not the case in the other samples. It was also the only sample where bacteria other than Proteobacteria, belonging to the Bacteroidetes, Balneolaeota, Firmicutes, and Actinobacteria, accounted for more than 1% of the microorganisms. In addition, as mentioned earlier, the dominant type of archaea in this sample was Natronomonas, which was not the case with other samples. These findings emphasize the slightly different nature of the W1 sample compared to the first level of the mine, thus confirming the observations made through 16S rRNA analysis. Krona charts for each sample at the genus level are shown in Fig. 6.

Fig. 6
figure 6

Taxonomic composition at the genus level based on shotgun sequencing data

The dRep tool was used for dereplication of the obtained MAGs. Ultimately, 16 unique MAGs with less than 25% contamination and more than 75% completeness were selected. The bin pairs WSO4.1-W81.7, WSO4.13-POP27.10, WSO4.4-POP27.1, WSO4.22-POP27.15, and WSO4.8-POP27.16 had paired Average Nucleotide Identity (ANI) values above 99%, which indicated that they belonged to the same species and probably the same strain. Therefore, only one genome from each pair was selected for further analysis as representative of the group. Finally, the following number of MAGs was identified in each sample: three in POP27, two in W1, seven in W81, and four in WSO4. The highest number of MAGs was identified in W81; this sample demonstrated the least diversity in the microbial community, which could have resulted in a better recovery of contigs belonging to the identified MAGs. The completeness of MAGs ranged from 76.2 to 98.28%, and contamination from 1.4 to 24.61%. The resulting MAGs were between 826,353 and 5,341,881 bp in length. Detailed statistics on the quality of the bins are presented in Table S2.

Taxa were assigned to bins using the GTDB-Tk tool, as shown in Table 2. Classification to species level was only possible in bin WSO4.4; the results indicated the species Halobacterium bonnevillei. In the remaining cases, the classification ended at the genus or family level, which may indicate that the bins belong to unknown species. Only four of the obtained MAGs were identified as bacterial: MAG POP27.15, assigned to the genus Halospina, W1.7 assigned to the order T1Sed10-126, and two MAGs, POP27.16 and WSO4.8, assigned to the family Salinisphaeraceae. The other MAGs identified as archaeal belonged to the Halobacteria and Nanosalinia classes. Interestingly, only two MAGs belonged to the Nanosalinia, and both were identified in the W81 sample.

Table 2 Taxonomic classification of unique bins

The BGCs were identified using two methods. The first one, antiSMASH, is based on the use of rules related to the construction of a cluster, and strongly focuses on the already existing BGCs. The second method, DeepBGC, is based on neural networks that give a better chance to identify previously undescribed BGCs, where functional domain analysis was applied. Metagenomes were also screened for the presence of AMP-encoding genes. The total number of BGCs detected in the metagenomes, the number of BGCs for each class, and the total number of detected AMPs are presented in Table 3.

Table 3 BGC and AMP statistics

It can be seen that DeepBGC enabled the detection of significantly more potential BGCs than AntiSMASH. Both methods identified the highest number of potential BGCs in the sample from the third level of the mine, and the lowest number in the sample from level 1. It was also noteworthy that both methods detected BGCs for terpenes in all samples. In sample POP27, antiSMASH identified the complete operon ectABC, related to the biosynthesis of ectoine, and the operon iucABCD, encoding aerobactin biosynthetic proteins. A BGC related to thiopeptide synthesis was also identified. The metagenome of POP27 contained numerous BGCs organized around genes coding phytoene synthase, which may suggest that these clusters were related to the biosynthesis of carotenoids. In contrast, antiSMASH identified only three clusters in sample W1; these appear to be incomplete and, therefore, cannot be properly characterized. In sample W81, two BGCs associated with the production of class II lanthipeptides were identified. One of the clusters can most likely produce three lanthipeptides: MSVAIDNKAVIGG – RKKQFDAEFEDhbNDhbDDKDhaDDhbLGPAKLCIGDhaFDhbCVLNDhaRVVVP, VCERDASRLVNVAVRSRSQHSRFAVFNVSLSSSWLSIDTETNELLTSQTA – LDhbRQCRDhbNYCCNQDhbPIFDhbWLDhbV, LSIDTETNELLTSQTG – LDhbRQCLRAFRVRQELYRKPVCEALKHAVLADhbQQRFEINDVLRVDhaVAWKYALMDhbKWLYALRERF and the other is related to the production of four lanthipeptides: MSVAMSDDAGATVNKQAYAEEFNQSAPSTDHDKGEDTVGA – NCYLNDhaIVCDhbFDhbDhbGGQ, VPIIRTFIFLKTAATA – FERWGVRLCRARDhbPAEFLKLIIRLHLKNIAFKHDhaDhaFR, MAMAAPRRSGTSNESRSRSPAA – ADhbNDhaADhbGADhbMPDhbNCADhbVLAIDha, VSHHRHDEVPSA – LVEIAQQDhbAQQERERHVRECAAPGADVV. In the case of the WSO4 sample, as in POP27, the complete ectABC operon was identified. BGCs associated with the synthesis of a RiPP-like protein, most likely similar to Linocin M18, were also identified in this sample.

Read more here: Source link