Molecular pathways identified from single nucleotide polymorphisms demonstrate mechanistic differences in systemic lupus erythematosus patients of Asian and European ancestry

Identification of ancestry-dependent and independent non-HLA SLE-associated variants and downstream target genes

Despite the success achieved by GWAS in mapping polygenic disease risk loci in SLE, the biological implications of the majority of identified variants has remained unknown. To gain a broader view of how inherited genetic variation impacts disease risk, we took the global approach of integrating SNPs with a range of association significance to generate a cohort of predicted genes that could ultimately be pruned and mapped to functional pathways for analysis. Immunochip-based association analyses have identified 700 single-nucleotide polymorphisms (SNPs) reported as significantly associated with SLE in patients of East Asian (AsA) ancestry6 and 757 SNPs associated with disease in European (EA) populations (Fig. 1A)8. Twenty SNP associations (< 1.5%) were shared between ancestries. In both ancestries, approximately 70% of SNPs were found in non-coding regions (intergenic and intronic), and 8% of SNPs were in coding regions (3′UTRs, 5′UTRs, synonymous and non-synonymous) (Fig. 1B and Supplemental Table S1). AsA populations had a significantly higher percentage of SLE-associated SNPs in non-coding (nc)RNAs (lncRNA and miRNA), whereas EA populations had more SLE-associated SNPs located within regulatory regions, including enhancers, promoters, open chromatin, and transcription factor binding sites (Fig. 1B).

Figure 1
figure 1

Mapping the functional genes associated with SLE-Immunochip SNPs. (A) Venn diagram depicting the ancestral overlap of all SLE-associated Immunochip SNPs. (B) Distribution of genomic functional categories for all EA and AsA non-HLA associated SLE SNPs. Genomic category comparisons between ancestral groups were performed using a 2-proportion z test. P values were 2-tailed, and asterisks indicate a significance threshold of p < 0.05. (C) Functional SNP-associated genes are derived from 4 sources, including eQTL analysis (E-Genes), regulatory regions (T-Genes), coding regions (C-Genes) and proximal gene-SNP annotation (P-Genes). (D, E) Venn diagrams showing the overlap of all EA (D) and AsA (E) associated E-, T-, C- and P-Genes.

We used a bioinformatics-based approach to identify the most plausible genes affected by the SLE-SNP association. As previously described11,12, we first determined whether there was evidence that the SNP was an expression quantitative trait loci (eQTL) using the GTEx database (version 8) and the Blood eQTL browser13. 226 EA- and 405 AsA- SLE-associated eQTLs linked to 730 and 1225 expression genes (E-Genes), respectively (Fig. 1C–E and Supplemental Table S2). eQTLs were identified for nearly 60% of the total number of AsA Immunochip SNPs, compared to 29% of SLE-associated SNPs in the EA data (Supplemental Fig. S1). Whereas the number of AsA eQTLs was higher across all genomic categories (coding, non-coding and regulatory regions as well as ncRNAs), the proportion of AsA SLE-associated SNPs linked to ncRNAs (51%; 42/82) was nearly three times higher than that observed in EA populations (18%; 8/44) (Supplemental Fig. S1 and Supplemental Table S1). Next, we sought to identify SNPs within distal and cis regulatory elements (e.g., enhancers and promoters), using the computational tools GeneHancer and HACER (Human ACtive Enhancers to interpret Regulatory variants), both of which connect regulatory SNPs with downstream target genes (T-Genes)14,15. Together, GeneHancer and HACER identified 105 SLE-associated SNPs (59 EA, 36 AsA) overlapping distal regulatory elements or promoters predicted to impact the expression of 964 T-Genes (617 EA, 350 AsA) (Fig. 1C–E and Supplemental Table S2). For variants located in coding regions, 44 SNPs (21 EA, 23 AsA) were associated with either non-synonymous or nonsense changes in 47 genes (C-Genes; 20 EA, 27 AsA) (Supplemental Table S2). The remaining SNPs that were not linked to E-, T- or C-Genes were assigned to the closest proximal gene (P-Gene), identifying an additional 956 P-Genes (487 EA, 469 AsA) (Supplemental Table S2).

Overlapping EA and AsA SNP-linked E-, T-, C- and P-Genes are depicted in Fig. 1D,E, respectively. No genes were shared within all four groups within either ancestry, and we observed limited commonality between T-, P- and E-Genes, with only 20 genes shared among the three groups in EA and 15 genes shared in AsA. It is notable that of the total of 3,432 SNP-linked genes, < 10% (327) overlapped between AsA and EA lupus cohorts (Fig. 2A).

Figure 2
figure 2

Functional characterization of SNP-associated genes. (A) Venn diagram depicting the overlap between all EA- and AsA-SNP associated genes. (B, C) Bubble plots depict ancestry-dependent and independent SNP-associated genes analyzed to determine enrichment using functional definitions from the BIG-C (Biologically Informed Gene Clustering) annotation library and I-Scope for hematopoietic cell enrichment. Enrichment was defined as any category with an odds ratio (OR) > 1 and a − log (p-value) > 1.33. (D) Heatmap (generated by GraphPad Prism 8.3; www.graphpad.com) visualization of the top five significant IPA canonical pathways and (E) bubble plot showing gene ontogeny (GO) terms for each gene list organized by ancestry. Top pathways with OR > 1 and − log (p-value) > 1.33 are listed.

Characterization of gene signatures

We next completed a series of bioinformatic analyses to determine the overall biological function of the 1349 genes associated with SLE in EA and 1429 genes associated with SLE in AsA, as well as the 327 SLE-associated genes common to both ancestries. Analysis of genes linked through EA revealed enrichment in processes related primarily to adaptive immune function, including the functional category for interferon stimulated genes, canonical pathways for TH1 and TH2 activation pathway and Macrophage classical activation signaling pathway, and GO terms for the Regulation of B cell proliferation (GO:0030888) and the Regulation of T cell proliferation (GO:0042129). Genes linked to SNPs associated with SLE in the AsA cohort were enriched in categories related to pathogen-influenced signaling, such as Role of PRRs in the recognition of bacteria and viruses, and the Positive regulation of lymphocyte differentiation (GO:0045621), as well as those representing more diverse biological functions, such as Regulation of oxidative stress-induced neuron death (GO:1903203) and DNA ligation involved in DNA repair (0051103). Shared genes were distributed in a range of adaptive and innate immune gene categories (Fig. 2B,D,E).

In addition, EA- and AsA-derived gene sets were examined using a clustering program that detects immune and inflammatory cell type signatures within large gene lists to identify dominant immune cell populations driving disease pathology within each ancestry16. Consistent with our pathway analysis, EA exhibited strong enrichment in cellular categories for myeloid, T, and B cells, whereas SLE-associated genes in AsA were not enriched in any cellular category (Fig. 2C). Independent analysis of shared genes revealed enrichment in the T, B and myeloid, and the NK or T cell categories. Finally, parallel analyses examining P-Genes separately from E-, T-, and C-Genes were conducted to assess the potential overrepresentation of immune-based processes because of the Immunochip design bias17. As expected, P-Genes (384 EA, 253 AsA) were enriched in immunologically-driven functional categories and pathways; exclusion of P-Genes resulted in only minor alterations to overall categorization in either ancestral background (Supplemental Fig. S2).

Delineation of signaling pathways identified by ancestry-specific SNP-associated genes

To assess ancestry-driven key signaling pathways in greater detail, ancestry-based protein–protein interaction (PPI) networks consisting of EA-associated, AsA-associated, or ancestry-independent genes were constructed using STRING, visualized in Cytoscape and clustered using MCODE (Supplemental Table S3). To provide an additional level of functional annotation, clusters contributing to overall immune function, tissue repair, mechanisms of cellular stress, cell motility, metabolic function or general cell function were grouped together. EA-associated genes were dominated by the functional category for interferon stimulated genes observed in cluster 2 (118 genes) (Fig. 3A), along with multiple canonical pathways related to the activation of pattern recognition receptors and downstream type I interferon signaling (Supplemental Table S4). Cluster 7 revealed additional enrichment in lymphocyte activation and differentiation, such as the TH1 and TH2 activation pathway that was also represented in the shared gene network, and cellular enrichment for cells of myeloid and/or lymphoid origin. Notably, the EA-associated network lacked evidence of cell motility and cell stress/injury, whereas metabolic function was represented by clusters 12 and 13 enriched in retinoid X receptor activation (LXR/RXR activation, PPARα/RXRα activation) involved in the regulation of lipid metabolism, inflammation, and cholesterol bile acid catabolism. Pathways associated with SLE in AsA were indicative of a diverse range of biological processes with protein metabolic functions dominating clusters 2 and 17 (Fig. 3B), whereas clusters 3 and 6 were enriched in multiple canonical pathways related to cytokine production and signaling (Supplemental Table S4). Interestingly, genes linked to SNPs associated in the AsA cohort did not include a unique interferon signature, but instead coalesced into multiple small clusters related to mitochondrial dysfunction (clusters 9 and 19) and metabolism, evident in clusters 16, 22 and 30. Additionally, AsA-associated gene clusters were enriched in chromatin remodeling found in cluster 1, along with evidence of cell motility (clusters 11, 12, 23 and 25). AsA cellular enrichment was dominated by monocytes and myeloid lineage cells.

Figure 3
figure 3

Key pathways determined by EA and AsA-associated genes. Cluster metastructures for EA (A), AsA (B) and the shared gene cohort (C) were generated based on PPI networks, clustered using MCODE and visualized in Cytoscape. Cluster size indicates the number of genes per cluster, edge weight indicates the number of inter-cluster connections and color indicates the number of intra-cluster connections. Enrichment for each cluster was determined by BIG-C and IPA; clusters were then grouped and categorized according to overall function (immune, tissue repair, metabolic, motility or general). Grey boxes indicate categories lacking relevant clusters. (D) Venn diagram showing the number of overlapping pathways motivated by EA or AsA predicted genes. Representative pathways are listed.

Pathways exemplified by SLE-linked genes in both EA and AsA appear to be a blend of the pathways enriched within each ancestry. Common pathways included Interferon signaling, TH1 and TH2 activation pathway, Complement system and Leukocyte extravasation (Fig. 3C and Supplemental Table S5). Figure 3D depicts a selection of both the unique and overlapping canonical pathways motivated by the EA-associated and AsA-associated gene sets. We also carried out a parallel series of bioinformatics analyses to determine the biological function of the full array of EA (1676) and AsA (1756) SNP-predicted genes, including those associated with both ancestries (Supplemental Fig. S3 and Supplemental Table S5). As expected, shared genes were evenly distributed throughout each large network and subsequent connectivity mapping revealed the addition of several new clusters to both the EA and AsA networks. For example, the full EA network gained several clusters contributing to cell motility enriched in integrin signaling and granulocyte diapedesis (clusters 34 and 35), whereas the enlarged AsA network gained multiple clusters enriched in immune function (clusters 9, 12 and 31) and interferon signaling (cluster 3), as well as enrichment in a more diverse array of cell types, including T and, B cells, neutrophils and NK/T cells. Both networks acquired a small cluster enriched in the functional category for reactive oxygen species (ROS) protection (EA cluster 22, AsA cluster 24) driven by the glutathione peroxidases GPX4, an antioxidant enzyme involved in ferroptosis18, and GPX3 an ROS scavenger. Overall, the addition of the shared gene cohort to each network highlights many common pathways and biological functions, but still revealed ancestry-driven differences. Taken together, unique pathways identified via EA analyses appear dominated by immune-related signaling as well as T, B and myeloid cells, whereas those in AsA analyses are dominated by biological processes related to altered metabolism and mitochondrial dysfunction.

Validation of AsA-enriched molecular pathways using summary GWAS data

To test our pathway predictions, we combined summary data from previously reported GWA studies7,9 that identified 1350 SNPs associated with SLE in patients of East AsA ancestry (Supplemental Table S6). Of these SNPs, 68% were located in non-coding regions, 6.5% were in coding regions, 2.7% were in regulatory regions and 22% were located within or proximal to non-coding RNAs (Supplemental Fig. S4). Validation AsA-associated GWAS SNPs exhibited limited commonality when compared to Immunochip SNPs, with < 1% of either EA- or AsA-associated Immunochip SNPs overlapping GWAS SNPs, and only 3 SNPs common to all 3 datasets (Supplemental Fig. S4). We next applied our same bioinformatics-driven methodology to generate a validation gene cohort composed of 1321 E-Genes, 307 T-Genes, 17 C-Genes and 974 P-Genes (Supplemental Table S7). Connectivity mapping of all validation genes were used to create PPI networks that were clustered as described above (Fig. 4A). Examination of each cluster revealed functional similarity to those derived from AsA Immunochip-associated genes. For example, clusters 1, 3, 4, 5 and 6 share hallmarks of tissue repair and remodeling exemplified by categories for mRNA processing, pro-cell cycle and protein degradation (proteasome, lysosome, endocytosis). Additionally, we observed smaller clusters (21, 27 and 28) representative of processes involved in metabolic function, and clusters (13, 18 and 24) characteristic of cell stress and injury, including the Inhibition of ARE-mediated degradation pathway and Mitochondrial dysfunction canonical pathways (Supplemental Table S8). Cluster 9 contained a small interferon-stimulated gene signature consisting of IFI27, IFI44 and RSAD2 (Supplemental Table S3). Cellular categories were again dominated by monocytes, T cells, NK cells, B cells and plasmacytoid (p)DCs and are consistent with findings observed with AsA Immunochip-associated genes. In contrast to AsA-associated genes, where we observed large, highly connected clusters, an equivalent cohort of apparently random genes generally formed smaller clusters, exhibited fewer intra- and inter-cluster connections, and were primarily enriched in functional categories mostly related to basic cell function (general cell surface, secreted and ECM) (Fig. 4B,C). As shown in Fig. 4D, which displays the number of genes (and percentage of total genes) assigned to each functional category, random genes are skewed toward general cell function, whereas AsA-associated genes are more prevalent in the overall immune (15.3% of genes), tissue repair (53.4%) and cell stress (7%) categories. The random gene network also lacked evidence of cell movement and the diversity of cellular enrichment identified from AsA SNP-associated genes (Fig. 4B).

Figure 4
figure 4

AsA Immunochip-based pathways are supported by summary GWAS from AsA SLE patients. Using SNP-predicted genes from the AsA GWAS validation SNP-set (A) or an equivalently sized cohort of random genes (B) metastructures were generated based on PPI networks, clustered using MCODE and visualized in Cytoscape. Cluster size indicates the number of genes per cluster, edge weight indicates the number of inter-cluster connections and color indicates the number of intra-cluster connections. Enrichment for each cluster was determined by BIG-C and IPA; clusters were then grouped and categorized according to overall function (immune, tissue repair, metabolic, motility or general). Grey boxes indicate categories lacking relevant clusters. (C) Quantitation of cluster size, intra-cluster connections and inter-cluster connections network is displayed. Error bars represent the 95% confidence interval; asterisks (***) indicate a p-value < 0.001 using Welch’s t-test. (D) Quantitation of AsA GWAS (black bars) and random (red) genes falling into each BIG-C category and grouped by overall functionality.

Validation of AsA-enriched molecular pathways using gene expression data

SNP-based pathway predictions were also tested in differential gene expression analyses from whole blood samples collected from AsA SLE patients with active disease and renal involvement compared with healthy controls (E-MTAB-11191, Supplementary Table S11). Overall, differential expression (DE) analysis revealed 5886 DE genes (DEGs) enriched in functional categories for interferon stimulated genes, gene expression, RNA processing and metabolism (Fig. 4 and Supplemental Fig. S5). A total of 685 AsA and 300 EA SNP-predicted genes were shared with AsA SLE DEGs, and 144 genes, representative of type I and type II interferon signaling, were shared among all three groups (Supplemental Fig. S6). Genes common to AsA DEGs and AsA SNP-predicted genes were enriched in RNA processing and translation, whereas DEGs shared with EA SNP-predicted genes were specifically enriched in type I interferon/cytokine signaling. Connectivity mapping and pathway analysis of AsA DEGs again revealed striking commonality with AsA SNP-associated genes from both the Immunochip and summary GWAS, exemplified by cluster enrichment in RNA splicing/processing, ubiquitylation, chromatin remodeling and metabolic function (glycolysis, TCA cycle, fatty acid biosynthesis and peroxisomes), and canonical pathways for Spliceosome cycle (cluster 2), Inhibition of ARE-mediated RNA degradation (clusters 12, 20 and 25), and Mitochondrial dysfunction (cluster 16) (Fig. 5A and Supplemental Table S9). The distribution of DE genes in AsA for each overall category were also similar to that observed with AsA-associated genes (Immunochip and GWAS) but differed from both the EA-Immunochip predicted or random genes (Fig. 5B).

Figure 5
figure 5

Asian-associated pathways are validated with gene expression data from AsA SLE patients. (A) Using differentially expressed (DE) genes from AsA whole blood samples (E-MTAB-11191), metastructures were generated based on PPI networks, clustered using MCODE and visualized in Cytoscape. Cluster size indicates the number of genes per cluster, edge weight indicates the number of inter-cluster connections and color indicates the number of intra-cluster connections. Enrichment for each cluster was determined by BIG-C and IPA; clusters were then grouped and categorized according to overall function (immune, tissue repair, metabolic, motility [Continued from previous page] or genera cell funtionl. (B) Bar graph showing the precent of associated (EA/AsA immunochip and AsA GWAS), differentially expressed and random genes falling into each overall functional category.

To test the pathway predictions, Gene Set Variation Analysis (GSVA)19 was applied to determine the relative enrichment of gene signatures identified in peripheral blood mononuclear cell (PBMC) samples from SLE patients (EA and AsA) and controls (Supplemental Table S10). In FDAPBMC1, a dataset composed of EA patients (Supplemental Table S11), all 7 IFN gene signatures (IGS) and signatures for the RIG-I pathway and DNA/RNA sensors were strongly enriched in SLE PBMCs compared to controls (Fig. 6A). In contrast, only the signatures for IFNA2, IFNB1, IFNW1 and the Type I core were enriched in SLE PBMCs from AsA patients in GSE81622 (Fig. 6B). GSVA using a random group of genes did not separate SLE from controls in either dataset.

Figure 6
figure 6

SNP-associated pathways inform gene signatures for GSVA analysis in patient PBMC datasets. GSVA enrichment scores were generated for PBMCs in EA and AsA SLE patients and healthy controls from FDAPBMC1 (EA-only patients and controls) and GSE81622 (AsA-only patients and controls). GSVA scores for type I and type II interferon-based gene signatures (A, B), metabolic gene signatures (C, D), cellular processes (E, F) and individual cell type signatures (G, H) are shown. Asterisks (*) indicate a p-value < 0.05 using Welch’s t-test comparing SLE to control.

GSVA enrichment scores using signatures for complement activation and metabolic pathways, including mitochondrial oxidative phosphorylation (OXPHOS), the TCA cycle and glycolysis were able to separate AsA SLE patients, but not EA patients, from healthy controls (Fig. 6C,D). Consistent with our predictions, AsA SLE patients exhibited significant enrichment in signatures for mitochondrial dysfunction and oxidative stress. As shown in Fig. 6E–H, enrichment of a number of pathways and cell types in both EA and AsA SLE cohorts were noted, including TLR signaling, inflammasome, TNF signaling and monocyte/myeloid lineage cells, whereas AsA patients exhibited additional enrichment in pro-cell cycle, low density granulocytes (LDGs) and B cells. Varying degrees of T cell/NK cell lymphopenia were evident in both ancestral populations.

Differential Inflammatory cell metabolism and activation status by ancestry

Because dysfunctional metabolic reprogramming can directly influence and exacerbate defective immune responses, we carried out linear regression analysis between the GSVA scores for individual cell signatures and the glycolysis and oxidative phosphorylation gene signatures to interrogate the metabolic status of immune cells from EA and AsA PBMC datasets. Enrichment scores for the glycolysis gene signature in AsA SLE patients exhibited positive correlation with both the monocyte/myeloid (R2 = 0.14, p = 0.03) and B cell signatures (R2 = 0.12, p = 0.051) (Fig. 7A). In contrast, glycolysis was not associated with gene signatures representing monocyte/myeloid cells, T cells or NK cells from EA patients, whereas the B cell category exhibited significant negative correlation (R2 = 0.26, p = 0.03). The gene signature for oxidative phosphorylation lacked positive correlation with all cell types in AsA patients, but did exhibit significant correlation (R2 = 0.6, p = 0.0003) with T cells in individuals of EA ancestry (Fig. 7A).

Figure 7
figure 7

Linear regression to examine the relationship between cell types, biological processes and inflammatory cytokines. (A) Linear regression analysis showing the relationship between GSVA scores for glycolysis, oxidative phosphorylation or oxidative stress and individual cell types (pDCs, monocyte/myeloid, B cells, T cells and NK cells) for FDAPBMC1 (EA, upper panels) and GSE81622 (AsA, lower panels). In (B), GSVA enrichment scores for the indicated cellular processes were generated for purified CD14+ monocytes from EA and AsA SLE patients (GSE164457). Using GSE164457, linear regression was used to examine the relationship between cellular processes and SLEDAI (C), anti-dsDNA titers in active patients (SLEDAI ≥ 6) (D) and GSVA scores for IFNA2 (E). Categories with linear regression p values < 0.05 are in bold; R2 predictive values are listed after the GSVA enrichment category. *Asterisks indicate significant relationship between functional categories. N.s., not significant.

Since altered immune cell metabolism is associated with heightened oxidative cell stress responses in SLE20, we used the same linear regression approach to test the relationship between different immune cell types and GSVA enrichment scores for oxidative stress (Fig. 7A). Monocyte/myeloid cells from AsA patients were the only cell type demonstrating a significant positive relationship with the gene signature for oxidative stress (R2 = 0.3, p = 0.0008). These results were confirmed in purified CD14+ monocytes isolated from EA and AsA SLE patients (California Lupus Epidemiology Study, CLUES; GSE16445721) (Supplemental Table S11) showing significant enrichment of gene signatures for oxidative stress, glycolysis and IFNA2, along with a trend toward elevated OXPHOS and mitochondrial dysfunction, specifically in AsA SLE patients (Fig. 7B). Given that enhanced oxidative stress and concomitant mitochondrial dysfunction have been shown to correlate with disease activity, DNA damage, and autoantibody production22,23,24, we next examined the potential link between enrichment scores for mitochondrial dysfunction and SLE disease activity index (SLEDAI) score in these patients. GSVA scores for mitochondrial dysfunction demonstrated a significant positive relationship with SLEDAI in AsA but not EA SLE monocytes (Fig. 7C). Similarly, enrichment scores for oxidative stress and mitochondrial dysfunction showed a significant positive correlation with anti-dsDNA titers among AsA SLE patients with active disease (SLEDAI ≥ 6) compared to EA patients (Fig. 7D). In addition, overall complement C3 levels were lower in AsA patients and demonstrated a significant negative correlation with both anti-dsDNA and SLEDAI (Supplemental Fig. S7) in accordance with the observation that complement depletion and anti-dsDNA antibodies are often associated with elevated disease activity25.

Finally, given that mitochondrial dysfunction and nucleic acid sensing are potent inducers of interferons26, we tested for a relationship between IFNA2 enrichment scores and enrichment scores for the mitochondrial dysfunction, DNA/RNA sensors and TLR pathways (Fig. 7E). In EA patients, GSVA scores for the DNA/RNA sensors (R2 = 0.62, p < 0.0001) and TLR pathway (R2 = 0.17, p = 0.0013) signatures, but not mitochondrial dysfunction, were associated with IFNA2. In contrast, all three signatures exhibited a positive, significant correlation with IFNA2 in AsA patients.

Read more here: Source link