Control: found -1 38.90+dfsg-1 Control: tag -1 confirmed Hi all,
Andreas Tille, on 2021-09-30: > Am Thu, Sep 30, 2021 at 01:22:23PM -0400 schrieb Robert: > > The bbmap package does not ship the needed resource files which causes some > > of > > the included tools not to work, e.g. bbduk when trying to process some fastq > > data, crashes with output like [1]. > > Thanks a lot for the report. Its extremely helpful since several of our > maintainers are not using this software and we really need to rely on > user input. Thank you Robert! Your report is very useful indeed! […] > > $ bbduk.sh in1=fwd.fastq in2=rev.fastq ktrim=r k=21 mink=8 hdist=2 ftm=5 > > tpe tbo threads=48 out=out.fastq > > java -ea -Xmx76702m -Xms76702m -cp /usr/share/java/bbmap.jar jgi.BBDuk > > in1=fwd.fastq in2=rev.fastq ktrim=r k=21 mink=8 hdist=2 ftm=5 tpe tbo > > threads=48 out=out.fastq > > Executing jgi.BBDuk [in1=fwd.fastq, in2=rev.fastq, ktrim=r, k=21, mink=8, > > hdist=2, ftm=5, tpe, tbo, threads=48, out=out.fastq] > > Version 38.90 > > > > Set threads to 48 > > maskMiddle was disabled because useShortKmers=true > > Warning! Cannot find primes.txt.gz > > /tmp/bbduk_test/file:/usr/share/java/bbmap.jar!/primes.txt.gz > > at jgi.BBDuk.main(BBDuk.java:78) > > If we could turn this into a test I could upload including test. Andreas, I pulled some data files from python-biopython-doc, and I think I managed to reproduce the problem on my end: $ bbduk.sh in1=/usr/share/doc/python-biopython-doc/Tests/Quality/example.fastq in2=/usr/share/doc/python-biopython-doc/Tests/Quality/solexa_example.fastq ktrim=r k=21 mink=8 hdist=2 ftm=5 tpe tbo threads=48 out=out.fastq java -ea -Xmx7195m -Xms7195m -cp /usr/share/java/bbmap.jar jgi.BBDuk in1=/usr/share/doc/python-biopython-doc/Tests/Quality/example.fastq in2=/usr/share/doc/python-biopython-doc/Tests/Quality/solexa_example.fastq ktrim=r k=21 mink=8 hdist=2 ftm=5 tpe tbo threads=48 out=out.fastq Executing jgi.BBDuk [in1=/usr/share/doc/python-biopython-doc/Tests/Quality/example.fastq, in2=/usr/share/doc/python-biopython-doc/Tests/Quality/solexa_example.fastq, ktrim=r, k=21, mink=8, hdist=2, ftm=5, tpe, tbo, threads=48, out=out.fastq] Version 38.93 Set threads to 48 maskMiddle was disabled because useShortKmers=true Warning! Cannot find primes.txt.gz /home/emollier/tmp/bbduk_test/file:/usr/share/java/bbmap.jar!/primes.txt.gz java.lang.Exception at dna.Data.findPath(Data.java:1247) at dna.Data.findPath(Data.java:1194) at shared.Primes.fetchPrimes(Primes.java:167) at shared.Primes.<clinit>(Primes.java:177) at kmer.ScheduleMaker.<clinit>(ScheduleMaker.java:155) at jgi.BBDuk.<init>(BBDuk.java:964) at jgi.BBDuk.main(BBDuk.java:78) Exception in thread "main" java.lang.ExceptionInInitializerError at kmer.ScheduleMaker.<clinit>(ScheduleMaker.java:155) at jgi.BBDuk.<init>(BBDuk.java:964) at jgi.BBDuk.main(BBDuk.java:78) Caused by: java.lang.NullPointerException at fileIO.ByteFile.<init>(ByteFile.java:43) at fileIO.ByteFile1.<init>(ByteFile1.java:98) at fileIO.ByteFile1.<init>(ByteFile1.java:94) at shared.Primes.fetchPrimes(Primes.java:169) at shared.Primes.<clinit>(Primes.java:177) ... 3 more I tested the patch from Robert and applied by Andreas, and it seems I could get much further in the processing. For the autopkgtest, note that I had to pick an appropriate dataset with same dimensions in both files, otherwise the processing fails, because of intrinsic data inconsistencies I presume: $ bbduk.sh in1=/usr/share/doc/python-biopython-doc/Tests/Quality/wrapping_as_sanger.fastq in2=/usr/share/doc/python-biopython-doc/Tests/Quality/wrapping_as_solexa.fastq ktrim=r k=21 mink=8 hdist=2 ftm=5 tpe tbo threads=48 out=out.fastq java -ea -Xmx7140m -Xms7140m -cp /usr/share/java/bbmap.jar jgi.BBDuk in1=/usr/share/doc/python-biopython-doc/Tests/Quality/wrapping_as_sanger.fastq in2=/usr/share/doc/python-biopython-doc/Tests/Quality/wrapping_as_solexa.fastq ktrim=r k=21 mink=8 hdist=2 ftm=5 tpe tbo threads=48 out=out.fastq Executing jgi.BBDuk [in1=/usr/share/doc/python-biopython-doc/Tests/Quality/wrapping_as_sanger.fastq, in2=/usr/share/doc/python-biopython-doc/Tests/Quality/wrapping_as_solexa.fastq, ktrim=r, k=21, mink=8, hdist=2, ftm=5, tpe, tbo, threads=48, out=out.fastq] Version 38.93 Set threads to 48 maskMiddle was disabled because useShortKmers=true 0.018 seconds. Initial: Memory: max=7486m, total=7486m, free=7467m, used=19m ****** WARNING! A KMER OPERATION WAS CHOSEN BUT NO KMERS WERE LOADED. ****** ****** YOU NEED TO SPECIFY A REFERENCE FILE OR LITERAL SEQUENCE. ****** Input is being processed as paired Changed from ASCII-33 to ASCII-64 on input 9: 57 -> 26 Started output streams: 0.032 seconds. Processing time: 0.148 seconds. Input: 6 reads 820 bases. FTrimmed: 4 reads (66.67%) 10 bases (1.22%) Trimmed by overlap: 0 reads (0.00%) 0 bases (0.00%) Total Removed: 0 reads (0.00%) 10 bases (1.22%) Result: 6 reads (100.00%) 810 bases (98.78%) Time: 0.182 seconds. Reads Processed: 6 0.03k reads/sec Bases Processed: 820 0.00m bases/sec Maybe this can be used as a stub for autopkgtest? By the way, this problem is also reproducible in bullseye. Have a nice day, :) -- Étienne Mollier <emoll...@emlwks999.eu> Fingerprint: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da Sent from /dev/pts/2, please excuse my verbosity.
signature.asc
Description: PGP signature
Read more here: Source link