Line length limit on input FASTA file: 65,536 characters (limit imposed by bioperl)

Hello,

I’m trying to run the following command:

agat_sp_extract_sequences.pl -g JU2526_Y39G10AR.22.gff -f JU2526*_region.fa -p

And it throws the following error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Each line of the file must be less than 65,536 characters. Line 2 is 67824 chars.
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/lgs6452/.conda/envs/exonerate_env/lib/site_perl/5.26.2/Bio/Root/Root.pm:447
STACK: Bio::DB::IndexedBase::_check_linelength /home/lgs6452/.conda/envs/exonerate_env/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm:757
STACK: Bio::DB::Fasta::_calculate_offsets /home/lgs6452/.conda/envs/exonerate_env/lib/site_perl/5.26.2/Bio/DB/Fasta.pm:227
STACK: Bio::DB::IndexedBase::_index_files /home/lgs6452/.conda/envs/exonerate_env/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm:659
STACK: Bio::DB::IndexedBase::index_file /home/lgs6452/.conda/envs/exonerate_env/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm:487
STACK: Bio::DB::IndexedBase::new /home/lgs6452/.conda/envs/exonerate_env/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm:364
STACK: /home/lgs6452/.conda/envs/exonerate_env/bin/agat_sp_extract_sequences.pl:125
-----------------------------------------------------------

It would appear the use of BioPerl means that your scripts won’t accept single-line FASTAs with sequences longer than 65kb. Would it be possible to do pre-processing (ie converting from single-line to multi-line) of the FASTAs within your scripts so that they work regardless of the input format? While it’s straightforward enough to convert the FASTA file prior to running your scripts, it would be far more straightforward to have it done by the script itself. Would probably save you a tonne of time with confused users, too.

Thanks,

Lewis

PS: I’ve only begun using AGAT but it seems like it will largely solve the constant pain of working with GFF3 files. Huge thanks for developing it!

Read more here: Source link