extendedSequences length is not the required for DeepCpf1 (34bp)

Hi,

I’m using CRISPRseek dev v. 1.35.2, installed from github (hukai916/CRISPRseek).

I wanted to calculate the CFD, and the grna efficacy of a Cas12 sgRNA (my_sgrna.fa file) using Deep Cpf1.

my_sgrna.fa, TTTT (PAM) + sgRNA (20bp):

>sgrna1
TTTTTGTCTTTAGACTATAAGTGC

Command:

offTargetAnalysis(inputFilePath = "my_sgrna.fa",
                  format = "fasta",
                  header = FALSE,
                  exportAllgRNAs = "fasta",
                  findgRNAs = FALSE,
                  findgRNAsWithREcutOnly = FALSE,
                  findPairedgRNAOnly = FALSE,
                  annotatePaired = FALSE,
                  annotateExon = TRUE,
                  scoring.method = "CFDscore",
                  min.score = 0,
                  topN = 10,
                  topN.OfftargetTotalScore = 10,
                  calculategRNAefficacyForOfftargets = F,
                  PAM = "TTTN",
                  PAM.pattern = "^TTTN",
                  PAM.location = "5prime",
                  allowed.mismatch.PAM = 2,
                  baseBeforegRNA = 8,
                  PAM.size = 4,
                  gRNA.size = 20,
                  baseAfterPAM = 26,
                  overlap.gRNA.positions = c(19, 23),
                  subPAM.position = c(1,2),
                  rule.set = "DeepCpf1",
                  chromToSearch = c("chr1"),
                  max.mismatch = 4,
                  BSgenomeName = BSgenome.Hsapiens.UCSC.hg19,
                  txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,
                  orgAnn = org.Hs.egSYMBOL,
                  enable.multicore = T,
                  n.cores.max = 1,
                  outputDir = ".",
                  overwrite = T)

log gives a Warning, that apparently the extended Sequence is not right:

Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: ‘S4Vectors’
Loading required package: IRanges
Loading required package: XVector
Attaching package: ‘Biostrings’
Loading required package: BSgenome
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: rtracklayer
Attaching package: ‘rtracklayer’
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase

[...]

Validating input ...
>>> Finding all hits in sequence chr1 ...
>>> DONE searching
Building feature vectors for scoring ...
Calculating scores ...
Annotating, filtering and generating reports ...
Done annotating
Add RE information...
write gRNAs to bed file...
Scan for REsites in flanking region...
Done. Please check output files in directory 
 ./ 
Warning message:
In deepCpf1(extendedSequence = extendedSequence, chrom_acc = chrom_acc) :
  None of the extendedSequences has length of 34 which is required for DeepCpf1 algorithm!

Summary.xls gives a NA for the sgRNA efficacy:

names   forViewInUCSC   extendedSequence    gRNAefficacy    gRNAsPlusPAM    top5OfftargetTotalScore top10OfftargetTotalScore    top1Hit.onTarget.MMdistance2PAM topOfftarget1MMdistance2PAM topOfftarget2MMdistance2PAM topOfftarget3MMdistance2PAM topOfftarget4MMdistance2PAM topOfftarget5MMdistance2PAM topOfftarget6MMdistance2PAM topOfftarget7MMdistance2PAM topOfftarget8MMdistance2PAM topOfftarget9MMdistance2PAM topOfftarget10MMdistance2PAM    REname  uniqREin200 uniqREin100
grna1   chr1:212469989-212470012    GTGCCTGCTTTTTGTCTTTAGACTATAAGTGCTTTGAGACCCAAGACCATATTTTCCT  NA  TTTNTGTCTTTAGACTATAAGTGC    1.19845 1.369338    NMM 2,11,12,17  2,4,9,17    2,4,9,19    1,2,5,15    5,9,10,20   4,9,13,19   2,9,11,13   12,13,14,20 9,11,13,20  4,13,14,19

This is due to the extendedsequence being 8 bases + 4 bases (PAM) + 20 bases (sgRNA) + 26 bases != 34 bases, instead of the 4 + 4 + 20 + 6 = 34 required for DeepCpf1 to work…

  • GTGCCTGC prePAM
  • TTTT PAM
  • TGTCTTTAGACTATAAGTGC sgRNA
  • TTTGAGACCCAAGACCATATTTTCCT postsgRNA

According to the manual, the flags in the command should result in 34 bases, but they don’t..

                  baseBeforegRNA = 8, # 4 + PAM (4)
                  PAM.size = 4,
                  gRNA.size = 20,
                  baseAfterPAM = 26,  # sgRNA (20) + 6

which I expected to sum 4 + 4 + 20 + 6 = 34

Is this a bug, or am I doing something wrong?

UPDATE
Actually, it does not matter the values set in baseBeforegRNA nor baseAfterPAM flags, e.g. setting them both as ‘1’ yields the same result as the one above, and the same extendedSequence, so there must be a bug.

Thanks a lot,
Miguel

Read more here: Source link