Make.contigs() problem with the oligos option? – mothur bugs

Dear All,

Can anybody explain the problem for make.contigs()?

I used mother v.1.48.0.

  1. When the oligos option is NOT included in the make.contigs(), the result looks normal and correct.

make.file(inputdir=D:\City_bumblebees\z_analysis\microbes\datasets, type=fastq, prefix=framgement)
make.contigs(file=framgement.files)
summary.seqs(fasta=framgement.trim.contigs.fasta, count=framgement.contigs.count_table)

The groups have the Group_0, with a total of 1259009 sequences (see below)

Group count:
Group_0 62092
Group_1 65104
Group_10 64332
Group_11 66286
Group_12 50398
Group_13 58625
Group_14 42531
Group_15 42162
Group_16 63950
Group_17 64915
Group_18 68091
Group_19 45224
Group_2 64619
Group_20 48305
Group_3 68941
Group_4 59792
Group_5 62512
Group_6 65391
Group_7 65739
Group_8 66878
Group_9 63122

Total of all groups is 1259009

It took 433 secs to process 1259009 sequences.

Output File Names:
framgement.trim.contigs.fasta
framgement.scrap.contigs.fasta
framgement.contigs_report
framgement.contigs.count_table

mothur > summary.seqs(fasta=framgement.trim.contigs.fasta, count=framgement.contigs.count_table)

Using 12 processors.

            Start   End     NBases  Ambigs  Polymer NumSeqs

Minimum: 1 301 301 0 3 1
2.5%-tile: 1 456 456 0 4 31476
25%-tile: 1 479 479 0 4 314753
Median: 1 481 481 0 6 629505
75%-tile: 1 481 481 0 6 944257
97.5%-tile: 1 481 481 4 6 1227534
Maximum: 1 601 601 48 291 1259009
Mean: 1 476 476 0 5

total # of seqs: 1259009

It took 17 secs to summarize 1259009 sequences.

Output File Names:
framgement.trim.contigs.summary

  1. However, when the oligos option is included in the make.contigs(), the first group will be automatically removed and it is NOT included in the result any more.

The same data were used as before.

make.file(inputdir=D:\City_bumblebees\z_analysis\microbes\datasets, type=fastq, prefix=framgement)
make.contigs(file=framgement.files, oligos=oligos_file_framgement.txt)
summary.seqs(fasta=framgement.trim.contigs.fasta, count=framgement.contigs.count_table)

My oligos_file_framgement.txt is:
primer ACTCCTACGGGAGGCAGCAG GGACTACHVGGGTWTCTAAT v3-v4
barcode ATGAAG TGCAAG LYH01
barcode AGCATG TTGACG LYH02
barcode GTGAAC CTGTTC LYH10
barcode CGCATA GTACTC LYH11
barcode TGTGCA CCGTAA LYH12
barcode AGTTCC TGAATG LYH13
barcode GTACTT CCAGCT WXC01
barcode CAGATC GTGAAA WXC02
barcode TAATCG ACTTGA WXC03
barcode ATCACG TACAGC WXC04
barcode GAGATA CTAGCT WXC05
barcode CGCGGT GAGTGG WXC06
barcode ACCTAA TCATTC km01
barcode GTTTCG CTATAC km02
barcode CATTCG GACTTC km04
barcode TCCACA ATTGCG km05
barcode CGGAAT GGTAGC km11
barcode TAACGA ATATGT km12
barcode AGAGTA TTAGGC km13
barcode AGAGCT TGCCAA km14
barcode GCACAA CCTTCT km15

Group count:
Group_1 57332
Group_10 55196
Group_11 56969
Group_12 41310
Group_13 53239
Group_14 37998
Group_15 35616
Group_16 56717
Group_17 58338
Group_18 61664
Group_19 40230
Group_2 57177
Group_20 42633
Group_3 60340
Group_4 52715
Group_5 53496
Group_6 56403
Group_7 56783
Group_8 57145
Group_9 53064

Total of all groups is 1044365
It took 259 secs to process 1259009 sequences.

Here, the first group (Group_0) is NOT included in the results any more, with a total of 1044365 sequences. The length of contigs is trimmed by its primer and barcode.

mothur>summary.seqs(fasta=framgement.trim.contigs.fasta, count=framgement.contigs.count_table)

Using 12 processors.

            Start   End     NBases  Ambigs  Polymer NumSeqs

Minimum: 1 277 277 0 3 1
2.5%-tile: 1 404 404 0 4 26110
25%-tile: 1 428 428 0 4 261092
Median: 1 429 429 0 6 522183
75%-tile: 1 429 429 0 6 783274
97.5%-tile: 1 429 429 4 6 1018256
Maximum: 1 549 549 37 264 1044365
Mean: 1 424 424 0 5

total # of seqs: 1044365

It took 10 secs to summarize 1044365 sequences.

Output File Names:
framgement.trim.contigs.summary

Zhenghua.

Read more here: Source link