rprimer: an R/bioconductor package for design of degenerate oligos for sequence variable viruses | BMC Bioinformatics

Case study

To illustrate and evaluate the functionality of rprimer, two assays targeted at norovirus GI were designed.

Assay A: RT-qPCR for quantitative detection of norovirus GI

The first task was to design a broadly reactive RT-qPCR assay for quantitative detection. For this, we decided to use the ‘ambiguous’ primer design strategy with strict settings for degeneracy (no more than two variants for both primers and probes), but with a high threshold for ambiguous bases (10%). We had to shorten the minimum probe length to 16 to find any potential assays (the default length is 18–22). These options resulted in 498 assay candidates, all within a conserved stretch from position 5324 to 5429 in the input alignment (corresponding to position 5276 to 5381 in the norovirus GI reference sequence, NC_001959.2) (Fig. 3). To select a final candidate, we first filtered the design output dataset upon score, which reduced the number of assays to 24. For these, we ran checkMatch to identify the best candidates regarding both on- and off-target matches. Figure 4 shows the nucleotide distribution at the oligo binding regions of the selected final assay (Assay A), together with the proportion of matching and mismatching target sequences.

Fig. 3
figure 3

Visual summary of the design of Assay A. The workflow and function calls are displayed to the left and the output from the generic plotData function to the right. Top: Consensus profile from the input alignment. High identity in combination with low gap proportion indicate high sequence conservation. Dots show the value at each position and black lines represent centred running averages. The highlighted area indicates target regions of the candidate assays (added after the design was completed). Middle: All oligo candidates. Bottom: All assay candidates

Fig. 4
figure 4

Binding region and match information for Assay A and Assay B. For each assay, the upper plots indicate the nucleotide distribution in the target alignment within the oligo binding regions (left: forward primer; middle: probe; right: reverse primer). The sequence is shown in the same direction as the oligo (i.e. in oligo 5′-3′ direction). The lower plots show the proportion of target sequences in the input alignment that matches perfectly, or with one, two, three or four or more mismatches to the oligo within the intended binding region (on-target match). They also show the proportion of target sequences that match (with maximum of two mismatches) to at least one sequence variant of the oligo outside the intended binding region (off-target match). Assay B was designed using the mixed strategy and the consensus and degenerate part of the primer is indicated. The plots were generated using the plotData function

We evaluated Assay A together with a standardised reference assay (Table 1) [26] on 10 stool samples previously identified as testing positive for norovirus GI. All samples were successfully amplified and quantified using both assays. The assays agreed well in quantification: the estimated concentration was on average 1.1 times higher (95% CI 0.89–1.3 times) with Assay A than with the reference assay (Table 2). Both assays provided acceptable sigmoidal-shaped amplification curves (Fig. 5). Based on the standard curve, the amplification efficiency was 105% for Assay A and 92% for the reference assay (for a well-performing qPCR assay, the value often lies between 95 and 105% [1]). The R2 value was 1.00 for both assays (0.98 or higher is considered acceptable [1]), and the y-intercept (the Cq value corresponding to 1 copy/µl in the sample well) was 40.20 for Assay A and 40.72 for the reference assay.

Table 2 Quantification and genotyping results from the case study
Fig. 5
figure 5

Evaluation of assay performance using 10 stool samples testing positive for norovirus GI. Top: Fluorescence data from RT-qPCR with Assay A (designed using rprimer), in comparison to a reference assay. Each sample was run in duplicate wells. Bottom: Gel picture following RT-PCR with Assay B (designed using rprimer). NTC: no template control. The intended amplicon size is 924 bp

The conserved region described above constitutes the target for most previously published RT-qPCR assays for norovirus GI (e.g. [31,32,33,34,35]. To the best of our knowledge, no previously published assay is identical to Assay A, although some oligos/assays differ by only a few bases.

Assay B: RT-PCR and Sanger sequencing for genotyping of norovirus GI

The next task was to design a conventional RT-PCR assay for genotyping. Norovirus genotyping is ideally performed by sequencing parts of both the polymerase and capsid gene. From an analysis of publicly available sequences, we identified a minimum region of interest (ROI) required to obtain reliable genotyping results. We masked all positions except for 1000 nucleotides on each side of the ROI in the target alignment. The remaining potential primer binding sites were highly variable, and hence we opted for the ‘mixed’ primer design strategy. To provide sufficiently high melting temperature despite any potential 5′ end mismatches, we decided to design relatively long primers (25–40 nt). The maximum allowed degeneracy was set to four, and we opted for amplicon lengths of 300–1000 base pairs.

In the assays generated, we filtered for primers with high 3′ coverage (provided by the coverage variable in the output dataset) and high 5′ conservation (provided by the identity variable). The binding regions of the final candidate primers (Assay B) are shown in Fig. 4. The average coverage was 1 for both the forward and reverse primer, meaning that their 3′ ends matched perfectly to all sequences in the input alignment. Consequently, all mismatches shown in Fig. 4 are located within the 5′ end consensus part of the primers.

To enable amplification of mismatching sequences in the initial amplification rounds, we used a PCR protocol with an annealing temperature of 50 °C for the first 10 cycles, followed by 60 °C for the remaining 30 cycles. We used the consensus parts of the PCR primers as sequencing primers (Table 2) since they matched the amplified products perfectly. All samples were successfully amplified, sequenced and genotyped with Assay B (Fig. 5 and Table 2).

We identified two potential limitations with Assay B. First, one of the sequence variants of the reverse primer is self-complementary at the six terminal bases at the 3′ end (the change in Gibb’s free energy, delta G, for the formation is − 7.12 kcal/mol, as estimated by OligoAnalyzer 3.1). This probably causes primer-dimers (see the gel picture in Fig. 5), which can lower the RT and PCR efficiency. Second, 29% of the target sequences in the input alignment mismatched, with at least four mismatches to the consensus part of the forward primer (Fig. 4). Further optimisation, such as nesting, may therefore be needed for use on low-concentration targets. Overall, this underlines a difficulty in global primer design towards sequence variable targets, i.e. it is not always possible to find conserved binding regions that fulfil all desired design criteria.

We could not find any previous publication using primers identical to those in Assay B. However, one assay for polymerase-capsid-based genotyping of norovirus GI, described by van Beek et al. [36], covers position 4499–5683 of the norovirus GI reference sequence (NC_001959.2). Assay B covers position 4758–5678.

Package performance

To assess package performance, we estimated the time required for a generic workplace laptop to design Assay A and B. Both tasks were completed within a few seconds (Fig. 6, left). We also estimated how long it took to perform checkMatch (a complementary step to the workflow) on all oligos generated from each design process (A, n = 184, B, n = 128) towards all 128 target sequences. The median time to run checkMatch was 1 min and 8 s for A and 31 s for B (Fig. 6, right).

Fig. 6
figure 6

Evaluation of rprimer package performance. The violin plots show the time required to perform each step of the oligo design process in assay A and B, on a generic workplace laptop using a multiple DNA sequence alignment of 128 norovirus GI sequences. The checkMatch function was run with all oligo candidates (A, n = 184, B, n = 128) towards 128 target sequences. Each step was repeated 20 times

These tests were performed on oligos with low degeneracy (a maximum of two variants per oligo was allowed for Assay A and four for Assay B). When higher degeneracy is selected, the time to generate oligos will increase because more regions will qualify as binding sites, and the parameters (Tm, GC-content etc.) will be calculated for more sequence variants. To illustrate this, we changed the degeneracy to 64 (the highest possible value) and re-ran designOligos. For this task, the median time to generate oligos was 20 s for assay A (instead of 1.5 s when the degeneracy was two; Fig. 6) and 4 min and 10 s for assay B (instead of 7.2 s when the degeneracy was two; Fig. 6). With a maximum degeneracy of 64, 691 oligo candidates were identified for A and 4971 for B.

Comparison with other software, strengths and limitations

Direct comparisons with other software are difficult. Many available programs are highly specialised to address specific needs, and there was an inevitable risk of bias when comparing a tool we developed ourselves with tools that we had not used previously. With these limitations in mind, we assessed rprimer together with three other freely available programs: Gemi [10], DECIPHER [16, 17], and openPrimeR [18] (the two latter are R packages). We imported all 128 norovirus GI sequences used for designing the assays in this study, in the format requested by the respective software. To obtain comparable results, we tried to mimic the design settings used for Assay A as much as possible. We did not attempt to replicate the design of Assay B because none of the other programs provided the option to generate partially degenerate primers.

Gemi and DECIPHER designed assays rapidly (within a few seconds) and identified the same region as Assay A as the top candidate (although designing assays with probes was not possible with DECIPHER). For DECIPHER, the top-scoring forward primer was identical to Assay A and the reverse primer differed by only a few bases. For our purposes, both Gemi and DECIPHER were less extensive than rprimer, but enabled specification of e.g. primer and amplicon length and the maximum number of degenerate positions (Gemi) or degeneracy (DECIPHER, referred to as permutations).

openPrimeR is also designed to sequence variable targets and aims to find the minimal number of multiplex compatible primers required to amplify all specified target sequences of interest. rprimer is not designed to generate multiplex primers and does not attempt to minimise the number of primers in the same sense. Instead, it calculates the degeneracy of each oligo obtained from the IUPAC consensus sequence, which may result in oligos with redundant sequence variants (i.e. full coverage could have been achieved with fewer sequence variants). In addition, openPrimeR is more extensive than rprimer and evaluates the potential for e.g. self-dimerisation and secondary structure formation.

With openPrimeR, we first attempted to design primers from all 128 (unaligned) full-length genomes, but interrupted the process after several hours. Based on this, we decided to design only forward primers, from a subset of five randomly selected genomes (and only from a 500 nt region covering the most conserved part of the genome). We decided to allow a maximum of mismatches and not more than two primer sequence variants for full coverage. The process took about 10 min and the top scoring primer covered position 5318–5335 in the norovirus reference sequence (NC_001959.2), which is similar to the probe of Assay A (position 5319–5334, Table 1). However, it should be noted that openPrimeR (with the option we selected) aligns target sequences as part of the primer design procedure, whereas our software uses aligned sequences as input.

A strength of rprimer lies in its simplicity – the approach to oligo and assay design is intuitive and widely established. The workflow consists of several steps and the user has complete control over the parameters that constrain the design. When no oligos or assays are generated, the user can simply go back and adjust (relax) the parameters. To verify that the oligos have been generated correctly, the user can check how the oligos match each target sequence (checkMatch) and examine the intended binding region in the input alignment (plotData; see Fig. 4).

Compared with other software, additional strengths of rprimer are its capacity for visualisation (exemplified in Figs. 3, 4) and its flexibility in oligo design. For instance, rprimer can design assays with probes (possible with Gemi, but not with the two other programs), provides the possibility to set a threshold value for degenerate bases (not possible with any of the three programs) and gives two options for primer design (fully or partially degenerate, not possible with any of the three programs). By using partially degenerate primers, we were able to amplify and sequence a highly variable region of the norovirus GI genome. Moreover, both openPrimeR and DECIPHER recommend/require external third-party software, while rprimer has no such dependencies, which makes installation easier.

The most apparent limitation with rprimer is the inability to check for primer-dimer and hairpin formation, which openPrimeR can do. Moreover, rprimer checks for off-target interactions within the target alignment but cannot analyse them in reference datasets, which DECIPHER can do. To tackle these limitations, we made great efforts to select the most widely-adopted data structures as inputs and outputs, to promote interoperability with other software capable of performing these or other additional analyses.

Intended use

In addition to norovirus GI, we have used rprimer to develop oligos and assays for several viral species with different sequence variability, ranging from those with moderate variability (e.g. hepatitis A virus) to extremely high variability (e.g. hepatitis C virus). We did not evaluate these assays in the laboratory, but compared them with previously published assays. This showed that the assays generated by rprimer were similar to several widely used assays, confirming the accuracy of rprimer (unpublished data). Note, however, that rprimer is not recommended for highly conserved target species, and we have not evaluated the tool for other targets than viruses.

Future development

For future versions of rprimer, we plan to implement calculations of delta G for possible self- and heterodimer structures for candidate oligos and assays, as an integral part of the oligo design process. We also intend to further evaluate and update the oligo and assay scoring or filtering system.

Read more here: Source link