Tag: Seqio.Parse

MultiProcessing on SeqIO biopython

MultiProcessing on SeqIO biopython 0 Hello, I would like to parse a wheat genome (13Gb) quickly, in order to cut each Sequence and count the fragment lengths and store it in a pandas dataframe. Is it recommendable to use multiprocessing on the SeqIO.parse command? Does it save time? Any experiences/recommendations…

Continue Reading MultiProcessing on SeqIO biopython

Extracting organism and seq from fasta

Extracting organism and seq from fasta 0 Hi, I am trying to extract sequences from a fasta file from a database with a specific organism species keyword from a .txt file containing the relevant headers. Do you know how I can do this in python as the biopython guide I’ve…

Continue Reading Extracting organism and seq from fasta

Fasta file reading python

Answer by Aidan Golden I think you can just use Biopython,It is indeed wrong today. I edited the answer since it has been possible to use str(sequence) for a long time now.,Very useful answer from 7 years ago! FYI, in current version of biopython(1.69), fasta.seq.tostring() is obsolete, use str(fasta.seq) instead.,Nicely…

Continue Reading Fasta file reading python

Question : Improve genbank feature addition

Question Improve genbank feature addition * 60 visibility 0 arrow_circle_up 0 arrow_circle_down I am trying to add more than 70000 new features to a genbank file using biopython. I have this code: from Bio import SeqIO from Bio.SeqFeature import SeqFeature, FeatureLocation fi = “myoriginal.gbk” fo = “mynewfile.gbk” for result in…

Continue Reading Question : Improve genbank feature addition

Biopython: Bio.SeqUtils.molecular_weight for a fasta file

I must write a function, given a file_name that can calculate the molecular weight of only the unambiguous sequences and gives as return sequence id and the corresponding molecular weight. I tried to use the Bio.SeqUtils.molecular_weight to calculate the molecular weight, but I couldn’t do it since SeqUtils.molecular_weight works with…

Continue Reading Biopython: Bio.SeqUtils.molecular_weight for a fasta file

FastTree error while constructing tree

Hey All, I am trying to infer a phylogeny from a multiple sequence alignment using FastTree program, however the program is giving me an error when I run it over the multiple sequence alignment and I can not figure out what the error is saying (not really that informative). My…

Continue Reading FastTree error while constructing tree

What is the correct syntax for BioPythons SeqIO.parse()

What is the correct syntax for BioPythons SeqIO.parse() 0 When reading in an assembly with BioPython’s SeqIO the tutorial indicated when reading in multiple records one should do the following: records = list(SeqIO.parse(“somefile.fasta”, “fasta”)) This produces the expected behaviour of a subscriptable list of records. However this syntax also functions…

Continue Reading What is the correct syntax for BioPythons SeqIO.parse()

Replace fasta header using bash : bioinformatics

Hello people, I got stucked with my new script and perhaps you can help me. Its goal is to take an input table with querys and subjects (originated by a local blast) and replace query names with subject names in the corresponding fasta file. In detail, the table input file…

Continue Reading Replace fasta header using bash : bioinformatics

Remote blast query limit

Remote blast query limit 0 Hello! How many blast queries can be processed by remote blast calls with biopython’s Bio.Blast.NCBIWWW.qblast or BLAST+ with -remote flag? When I go above 1 sequence I get the following message near the top of my XML results file (and no results: internal_error: (Severe Error)…

Continue Reading Remote blast query limit

parsing gbk files (antismash result)

parsing gbk files (antismash result) 0 Hello I used antismash from the CLI and I got 700 gbk files (1 gbk file per each analyzed genome). I used the following script to retrieve the predicted products from the gbk files: from Bio import SeqIO import glob for files in glob.glob(“*.gbk”):…

Continue Reading parsing gbk files (antismash result)

Seqio.Parse Some Error

Seqio.Parse Some Error 2 I am a beginner in bioinformatics world. I am following exercise on biopython but i am stuck here. I am not sure why print command is not working. Please let me know to correct this step. > from Bio import SeqIO > for seq_record in SeqIO.parse(“…

Continue Reading Seqio.Parse Some Error

Linearize fasta files

Program versions used: BBMap – v. 38.32Seqtk – v. 1.3-r106Seqkit – v. 0.8.1Perl – v. 5.16.3Python – v. 3.6.6sed – v. 2.2.2 $ time (cat Homo_sapiens.GRCh38.dna.primary_assembly.fa > /dev/null) real 0m1.050s user 0m0.002s sys 0m1.045s With BBMap – reformat.sh $ time reformat.sh -Xmx40g in=Homo_sapiens.GRCh38.dna.primary_assembly.fa fastawrap=0) java -ea -Xmx40g -cp bbmap/current/ jgi.ReformatReads…

Continue Reading Linearize fasta files

Get chromosome sizes from fasta file

Get chromosome sizes from fasta file 4 Hello, I’m wondering whether there is a program that could calculate chromosome sizes from any fasta file? The idea is to generate a tab file like the one expected in bedtools genomecov for example. I know there’s the fetchChromSize program from UCSC, but…

Continue Reading Get chromosome sizes from fasta file

Fastest way to perform BLAST search using a multi-FASTA file against a remote database

Fastest way to perform BLAST search using a multi-FASTA file against a remote database 0 I have a multi-FASTA file having ~125 protein sequences. I need to perform a BLASTP seach against remote nr database. I tried using NcbiblastpCommandline, but the issue is that it only accepts files as input….

Continue Reading Fastest way to perform BLAST search using a multi-FASTA file against a remote database