Plasmodium falciparum is evolving to escape malaria rapid diagnostic tests in Ethiopia

Study design and data collection

We performed a cross-sectional, multisite study in 11 districts along Ethiopia’s borders with Eritrea, Sudan and South Sudan, located within three of its nine administrative regions. On average, ten health facilities were selected from each district, including four districts of Amhara Region (northwest Ethiopia), six districts of Tigray Region (north Ethiopia) and one district of Gambella region (southwest Ethiopia) during the 2017–2018 peak malaria transmission season (September–December, although enrolment in Gambella was completed in April 2018) (Fig. 1). Per WHO protocol30, each facility passively enroled participants presenting with symptoms of malaria (fever, headache, joint pain, feeling cold, nausea and/or poor appetite), with sample size proportionally allocated to each facility based on the previous year’s malaria case load. All participants provided informed consent, participated in interview questionnaires, underwent blood collection for RDT testing using two types of RDT and were treated according to national guidelines. Data were double-entered into Epi Info (v.3.2), and discrepancies resolved using original paper forms by consensus. Ethical approval was obtained from the Ethiopia Public Health Institute (EPHI) Institutional Review Board (IRB; protocol EPHI-IRB-033-2017) and WHO Research Ethics Review Committee (protocol ERC.0003174 001). Processing of de-identified samples and data at the University of North Carolina at Chapel Hill (UNC) was determined to constitute non-human subjects research by the UNC IRB (study 17-0155). The study was determined to be non-research by the Centers for Disease Control (CDC) and Prevention Human Subjects office (0900f3eb81bb60b9). Experiments were performed in accordance with relevant guidelines and regulations.

Field sample evaluation

Study participants were evaluated using both a CareStart Pf/Pv (HRP2/Pv-pDH) RDT (Access Bio, catalogue no. RM VM-02571) and an SD Bioline Malaria Ag P.f. (HRP2/Pf-LDH) RDT (Alere, catalogue no. 05FK90). For the CareStart RDT, 5 µl of capillary whole blood was collected by fingerprick and transferred to the RDT sample well, along with 60 µl of buffer solution. Results were read at 20 min. The SD Bioline RDT followed the same protocol, but with four drops of buffer added and the results read in a 15–30 min window. Participants testing positive by either RDT were first prescribed treatment, according to Ethiopian national guidelines50.

Cases with any positive HRP2 or Pf-LDH RDT band were considered positive for P. falciparum malaria. Cases that were Pf-LDH-positive but HRP2-negative on both RDTs were considered potential candidates for pfhrp2/3 gene deletion and defined as ‘discordant’. These participants, along with a subset of HRP2-positive and HRP2-negative controls, provided further informed consent for additional blood collection for dried blood spot (DBS) preparation. At least two DBS samples (50 µl per spot) were collected on Whatman 903 protein saver cards (GE Healthcare; catalogue no. 10534612) from consenting participants. DBS samples were stored in plastic bags with desiccant. A randomly selected subset of DBS samples was sent for molecular analysis to UNC and for serological analysis to the CDC.

DNA extraction and PCR assays

DNA was extracted from three 6 mm punches per DBS sample using Chelex-100 (Bio-Rad, catalogue no. 1422822) and saponin (MilliporeSigma, catalogue no. 47036-250G-F) as described previously51. Quantitative PCR (qPCR) assays were first performed in duplicate for the P. falciparum lactate dehydrogenase gene (pfldh)52. To avoid the risk of misclassification due to DNA concentrations below the limit of detection for pfhrp2/3 PCR assays, further analysis was restricted to samples with >100 parasites per µl by qPCR (Extended Data Fig. 1)13. PCR assays targeting exon 2 of pfhrp2 and pfhrp3 were then performed in duplicate as described previously6, except that PCR reactions were performed as single-step, 45-cycle assays, using 10 µl of template and AmpliTaq Gold 360 Master Mix (Thermo Fisher Scientific, catalogue no. 4398813) in a 25 µl reaction volume. In addition to no-template and P. falciparum 3D7 strain (pfhrp2+/3+) positive controls, pfhrp2 assays included an additional DD2 strain (pfhrp2−/3+) control and pfhrp3 assays included an additional HB3 strain (pfhrp2+/3−) control. Finally, an additional single-copy gene, real-time PCR assay targeting P. falciparum β-tubulin was performed to confirm that sufficient parasite DNA remained in samples with a negative pfhrp2/3 PCR result13. Pfhrp2/3 genotyping calls were made in samples with pfldh qPCR parasitaemia >100 parasites per µl to avoid misclassification in the setting of amplification failure due to low target DNA concentration. A pfhrp2 or pfhrp3 positive call required one or more replicate with distinct band(s) with the expected fragment length. A negative call required both pfhrp2 or pfhrp3 replicates to be negative. Detailed reaction conditions for all PCR assays are described in the Supplementary File.

Serological assays

The presence of HRP2, pan-LDH and aldolase antigenaemia was assessed in a subset of DBS samples (single 6 mm punch) using a multiplex bead-based immunoassay exactly as described previously12. Within this multiplex assay, capture and detection antibodies against the HRP2 antigen would also recognize similar epitopes on the HRP3 antigen, so unique signals for these two antigens cannot be obtained.

Prevalence estimates

We estimated the prevalence of P. falciparum infections expected to have false-negative HRP2-based RDT results due to pfhrp2 deletions as follows. First, we calculated the proportion of all RDT-positive P. falciparum cases (HRP2+ or Pf-LDH+ on any RDT) with the discordant RDT profile (HRP2− on both RDTs, but Pf-LDH+), overall and by region. Second, we calculated the observed concordance between the discordant RDT profile and a pfhrp2-negative PCR call, overall. Prevalence estimates and 95% CIs were then back-transformed overall and by region using the ci.impt function within the asbio R package (v.1.5-5), which generates CI values for the product of two proportions using delta derivation. This allowed us to estimate with confidence the proportion of P. falciparum infections with both pfhrp2 deletions and false-negative HRP2-based RDT results, overall and by region. As a sensitivity analysis, we also estimated the proportion of those with a discordant RDT and a pfhrp2-negative PCR call (directly multiplying the true proportion of P. falciparum-positive individuals with a discordant RDT profile, overall and by region, by 0.727, or the overall proportion of discordant RDT samples that had a pfhrp2− PCR result). The 95% CI values were then generated using bootstrapping (1,000 iterations). The prevalence estimates and CI values generated by the two approaches were similar (Supplementary Table 3).

Pfhrp2/3 molecular inversion probe (MIP) development

Pfhrp2, pfhrp3 and the flanking regions within a 100 kb window surrounding each gene were targeted for MIP designs using MIPTools53. A tiled design strategy was employed that involved multiple, overlapping probes spanning each gene target. Twenty-two genes flanking pfhrp2 and 31 genes flanking pfhrp3 were used in the design, of which 11 and 19 were successful on the first design try, respectively. A second attempt was not made for designs for the flanking genes. A total of 241 probes were designed: 9 for pfhrp2, 9 for pfhrp3 and 223 for the flanking genes. MIPs were designed using the 3D7 (v.3) reference genome avoiding hybridization arms in variant regions when possible. Eighty alternative probes accommodating potential variants in the highly variable pfhrp2 and pfhrp3 genes were also created. A 15.5 kb segment centromeric to pfhrp3 on chromosome 13 between positions 2,792,000 and 2,807,500 is duplicated on chromosome 11 between positions 1,918,007 and 1,933,488, with 99.4% sequence identity. Therefore, the target genes falling into this region were multicopy genes and their probes were designed to bind to both loci on the genome (see Supplementary Table 5 for the design overview including all genes targeted, MIPs designed and genomic coordinates). Probes were ordered from Integrated DNA Technologies as 200 pmol ultramer oligos. Probe sequences are provided in the Supplementary Table 6.

MIP capture and deep sequencing of clinical samples

All DNA samples extracted by UNC underwent MIP capture using the capture and amplification methods exactly as described by Verity et al.54, with the exception of oligonucleotides (the pfhrp2/3 MIP oligonucleotide panel described above was used) and controls (we selected a different set of controls that are informative for pfhrp2/3 deletion characterization). All MIP captures included multiple controls: 3D7 (pfhrp2+/3+), DD2 (pfhrp2−/3+), HB3 (pfhrp2+/3−) laboratory strains; as well as low- and high concentration mixes (1% HB3, 10% DD2, 89% 3D7) at densities of 250 and 1,000 parasites per µl, respectively. Samples were sequenced on the Illumina NextSeq 550 instrument using 150 bp paired-end sequencing and dual indexing.

Subtelomeric profiling and variant calling with MIP data

Read mapping and variant calling were carried out using MIPTools (v. MIPTools uses the MIPWrangler algorithm (v.1.2.0)55 to create high-quality consensus sequences from sequence read data utilizing unique molecular indexes (UMIs) of MIPs, maps those sequences to the reference genome using bwa (v.0.7.17) and removes off-target sequences as described previously33,54. Deletion calls were limited to samples that had high coverage to avoid false positives. Considering the high frequency of large deletions present in the sample set, the coverage threshold was based on a subset of probes that were present on >60% of the samples, none of which overlapped with the chromosome 8 or 13 deletions. Samples with a median coverage of fewer than five UMIs for this subset of probes were excluded from analysis.

Structural profiling was performed using the UMI count table (Supplementary Table 7). The count table was converted to a presence/absence table such that if a probe had more than one UMI for a given sample, it was accepted as present (that is, not deleted). Samples were clustered into subtelomeric structural profile groups based on this table using the hierarchical clustering algorithm AgglomerativeClustering of the Python module Scikit-learn (v.0.20)56 using only the regions involved in the deletion events of the corresponding chromosome (position >1,372,615 for chromosome 8 and position > 2,806,319 for chromosome 13). Samples were grouped into their final subtelomeric structural profile based on visual inspection of the resulting clusters.

Initial variant calls were made using freebayes (v1.3.1) via MIPTools with the following options:–pooled-continuous–min-base-quality 1–min-alternate-fraction 0.01–min-alternate-count 2–haplotype-length -1–min-alternate-total 10–use-best-n-alleles 70–genotype-qualities. Variants were processed using MIPTools to filter for: variant quality >1, genotype quality >1, average alternate allele quality >15, minimum depth >2 UMIs; and make final genotype calls based on the major allele (within-sample allele frequency >0.5). In addition, the following variants were removed from the final call set: those that were observed as a major allele in less than two samples (singletons), not supported by more than two UMIs in at least three samples, present on multicopy genes, and indels. Variant calls were further filtered for missingness to avoid imputation in EHH calculations: samples missing calls for >50% of the variants were removed, variants missing calls in >50% of the samples were removed. Variants calls were converted and.hap files (Supplementary Table 8) for use with the rehh package (v3.1.2) in R.

Assessment of MIP calls using whole-genome sequencing

We performed WGS on a subset of samples selected by convenience to assess the accuracy of MIP pfhrp2/3 deletion calls. DNA extracted from samples with discordant RDT results were selected for P. falciparum selective whole-genome amplification (sWGA) and whole-genome sequencing exactly as described previously31. In brief, DNA was first subjected to two separate sWGA reactions using the Probe_10 primer set described by Oyola et al57 and the JP9 primer set31. sWGA products were then pooled in equal volumes and acoustically sheared using a Covaris E220 instrument before to sequencing library preparation using Kappa Hyper library preps (Roche, catalogue no. KK8504). Indexed libraries were then pooled and sequenced on an Illumina HiSeq 4000 instrument using 150 bp, paired-end sequencing. Sequencing reads were deposited into NCBI’s Sequence Read Archive (PRJNA742125).

Published whole-genome sequencing data retrieval

Fastq files from 25 Ethiopian samples included in the MalariaGEN genome variation project19 and three laboratory strains (3D7, HB3 and DD2) from MalariaGEN genetic crosses project58 were downloaded from the European Nucleotide Archive using fasterq-dump (v.2.10.8) and sample accession numbers on 19 September 2020 (Supplementary Table 9).

WGS data analysis

All fastq files were processed as follows. Adaptor and quality trimming was performed using Trimmomatic (v.0.39) with the recommended options (seed mismatches:2, palindrome clip threshold:30, simple clip threshold:10, minAdapterLength:2, keepBothReads LEADING:3 TRAILING:3 MINLEN:36). Trimmed fastq files were mapped to 3D7 reference genome (v.3.0) concatenated to human genome (hg38, downloaded from the US National Institutes of Health National Center for Biotechnology and Information database on 2 December 2015, and available at to avoid incorrect mapping of reads originating from host DNA using bowtie2 (v.2.3.0) with the ‘–very-sensitive’ option. Reads mapping to the parasite chromosomes were selected and optical duplicates were removed using the sambamba (v.0.7.1) view and markdup commands, respectively. Read coverage was calculated using samtools (v.1.9) depth command with options ‘-a -Q1 -d0’, filtering reads with mapping quality of zero. Variants were called only for the regions of interest using freebayes (v.1.3.1) with the following options: ‘–use-best-n-alleles 70–pooled-continuous–min-alternate-fraction 0.01–min-alternate-count 2–min-alternate-total 10–genotype-qualities–haplotype-length -1–min-mapping-quality 15 -r region’. Regions of interest were from 300 kb centromeric to the deletions to chromosome ends (positions 1,074,000 to 1,472,805 and 2,505,000 to 2,925,236 for chromosomes 8 and 13, respectively).

Variants were filtered for: variant quality >20, genotype quality >15, average alternate allele quality >15, minimum depth >4 reads. In addition, the following variants were removed from the final call set: those that were never observed as a major allele in any sample, not supported by more than ten reads in at least one sample, and indels. Final genotype calls were based on the major allele (within-sample allele frequency >0.5). Variant calls were further filtered for missingness to avoid imputation in EHH calculations: samples missing calls for >95% of the variants were removed, variants missing calls in >10% of the samples were removed.

Telomeric profiling of the published genomes was carried out by visual inspection of depth-of-coverage plots (Extended Data Figs. 8 and 9). Summary statistics were generated (Supplementary Table 10) using the Python pandas module (v.0.23).

Statistical and population genetic analysis

Data collected during the participant’s study visit (clinical data and RDT results) were linked to laboratory results via the barcode number transcribed on DBS sent to the UNC and CDC laboratories. Samples in the dataset with missing or duplicate barcodes were arbitrated using original paper questionnaires by the EPHI data centre. An analysis dataset that included both PCR and field data was created including all samples that we could confidently merge by both barcode number and region label.

Statistical analysis was performed using R (v.3.6.0, R Core Team; All 95% CI values were generated using one-sample proportions test with Yates’ continuity correction (R package binom.confint, v.1.1-1), and unweighted Cohen’s kappa estimates were generated using the psych (v.1.8.1) and epiR (v.1.0-15) packages. Lilliefors test was used to test for normality, and P values calculated using chi-squared test for sex and the Kruskal–Wallis test for age and parasite density. ArcGIS (desktop v.10.5, ESRI) was utilized for mapping, with additional annotation performed using PowerPoint (v.16.31, Microsoft).

EHH statistics were calculated to evaluate the regions flanking the pfhrp2 and pfhrp3 genes for signatures of recent positive selection36 using the rehh package (v.3.1.2)59. EHH statistics were calculated using the data2haplohh and calc_ehh functions, haplotype furcations were calculated using calc_furcation, and plots were generated using the package’s plot function and annotated using Inkscape (v.0.92).

Complexity of infection (COI) for each sample was calculated using McCOILR (v.1.3.0,, an Rcpp wrapper for THE REAL McCOIL60 with the options maxCOI = 25, totalrun = 2,000, burnin = 500, M0 = 15, err_method = 3. The same variant set used in the EHH analysis was used for the COI calculations, except that variants whose within-sample allele frequency were between 0.05 and 0.95 were called heterozygote for COI analysis.

Statistics and reproducibility

Sample sizes were chosen based on the WHO protocol30. Sources of data and samples included in the study are outlined in Fig. 2. DBS samples were only collected from a subset of participants based on the WHO protocol. Molecular, immunological and sequencing assays were performed on random subsets selected by EPHI. Most analyses were limited to samples that could be matched unambiguously across datasets. For example, any DBS samples found to have identical participant IDs were excluded from analysis. Similarly, DBS labelled with a participant and region ID that did not match clinical data were excluded from most analyses. These accounted for a minority of participants. Discordances in participant IDs and DBS sample labels were resolved whenever possible.

Field staff were not blinded to malaria RDT results because they were used to inform clinical care according to national guidelines. Pfhrp2/3 deletion calls using MIP sequencing were made by an investigator who was blinded to clinical data (including RDT results), HRP2 immunoassay results and pfhrp2/3 deletion calls using PCR.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Read more here: Source link