Tag: Seqio.Parse

‘SeqRecord’ object has no attribute ‘transcribe’

‘SeqRecord’ object has no attribute ‘transcribe’ 1 I am learning how to use python and I need to get the RNA sequence from the DNA sequences of a Multi-Fasta file, but when I try to do it I get the same error. Here is my code: from Bio import SeqIO…

Continue Reading ‘SeqRecord’ object has no attribute ‘transcribe’

biopython – Parsing a gene bank file and outputting specific feature information to a csv using Bio Python

So I am trying to parse through a gene bank file, extract particular feature information and output that information to a csv file. The example gene bank file looks like this: SBxxxxxx.LargeContigs.gbk LOCUS scaffold_31 38809 bp DNA UNK 01-JAN-1980 DEFINITION scaffold_31. ACCESSION scaffold_31 VERSION scaffold_31 KEYWORDS . SOURCE . ORGANISM…

Continue Reading biopython – Parsing a gene bank file and outputting specific feature information to a csv using Bio Python

How can I print and write the strain /isolate/voucher number of a SeqRecord objec in biopython?

The isolate is a qualifier of the source feature that you can access like so: from Bio import SeqIO from pprint import pprint # Read genbank file for rec in SeqIO.parse(“genome.gb”, “genbank”): source = rec.features[0] pprint(source.qualifiers) will print: OrderedDict([(‘organism’, [‘Amauroderma calcitum’]), (‘mol_type’, [‘genomic DNA’]), (‘isolate’, [‘FLOR 50931’]), (‘db_xref’, [‘taxon:1774182’]), (‘country’,…

Continue Reading How can I print and write the strain /isolate/voucher number of a SeqRecord objec in biopython?

Parsing GenBank file: get locus tag vs product

As your sample GenBank file was incomplete, I went online to find a sample file that could be used in an example, and I found this file. Using this code and the Bio::GenBankParser module, it was parsed guessing what parts of the structure you were after. In this case, “features”…

Continue Reading Parsing GenBank file: get locus tag vs product

Append assembly accession to nucleotide accession number in RefSeq Genbank file

Append assembly accession to nucleotide accession number in RefSeq Genbank file 0 Hi everyone, When I want to append the filename to the contig header in a multi-fasta file, I usually use for F in *.fasta; do N=$(basename $F .fasta) ; bbrename.sh in=$F out=${N}_mod.fasta prefix=$F addprefix=t ; done However, this…

Continue Reading Append assembly accession to nucleotide accession number in RefSeq Genbank file

bioinformatics – how to replace seqIDs in a fasta file with new seqIDs using biopython

I have a fasta file that reads like so: >00009c1cc42953fb4702f6331325c7cc TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGTTGTTAAGTCAGTGGTGAAATCGTGTGGCTCAACCATACGGAGCCATTGAAACTGGCGACCTTGAGTGTAAACGAGGTAGGCGGAATGTGACGTGTAGCGGTGAAATGCTTAGATATGTCACAGAACCCCGATTGCGAAGGCAGCTTACCAGCATACAACTGAC >000118a5e731455e942c61a82a40367a623088d0 AGAGTTTTATCCTGGCTCAGGATGAACGCTAGCGGCAGGCCTAATACATGCAAGTCGGACGGGATCTAAATTTAAGCTTGCTTAAGTTTAGTGAGAGTGGCGCACGGGTGCGTAACGCGTGAGCAACCTACCCATATCAGGGGGATAGCCCGAAGAAATTCGGATTAACACCGCATAACACAGCAATCTCGCATGAGATCACTGTTAAATATTTATAGGATATGGATGGGCTCGCGTGACATTAGCTAGTTGGTAAGGTAACGGCTTACCAAGGCAACGATGTCTAGGGGCTCTGAGAGGAGAATCCCCCACACTGGTACTGAGACACGGACCAGACTCCTACGGGAGGCAGCAGTAAGGATTATTGGTCAATGGAGGGAACTCTGAACCAGCCATGCCGCGTGCAGGATGACTGCCCTATGGGTTGTAAACTGCTTTTGTCTGGGAATAAACCTTGATTCGTGAATCAAGCTGAATGTACCAGAAGAATAAGGATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGCTTTATAAGTCAGAGGTGAAAGACGGCAGCTTAACTGTCGCAGTGCCTTTGATACTGTATAGCTTGAATATCGTTGAAGATGGCGGAATGAGACAAGTAGCGGTGAAATGCATAGATATGTCTCAGAACTCCGATTGCGAAGGCAGCTGTCTAAGCGGCAATTGACGCTGATGCACGAAAGCGTGGGGATCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGATAACTGGATGTTGGCGATACACAGTCAGCGTCTTAGCGAAAGCGTTAAGTTATCCACCTGGGGAGTACGCCCGCAAGGGTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGAGGAGCATGTGGTTTAATTCGATGATACGCGAGGAACCTTACCCGGGCTTGAAAGTTAGTGAATGCGACAGAGACGTCTCAGTCCTTCGGGACACGAAACTAGGTGCTGCATGGCTGTCGTCAGCTCGTGCCGTGAGGTGTTGGGTTAAGTCCCGCAACGAGCGCAACCCCTATGTTTAGTTGCCAGCATGTAATGATGGGGACTCTAAACAGACTGCCTGCGTAAGCAGCGAGGAAGGTGGGGACGACGTCAAGTCATCATGGCCCTTACGTCCGGGGCTACACACGTGCTACAATGGATGGTACAGCGGGCAGCTACACAGCAATGTGATGCTAATCTCTAAAAGCCATTCACAGTTCGGATAGGGGTCTGCAACTCGACCCCATGAAGTTGGATTCGCTAGTAATCGCGTATCAGCAATGACGCGGT And I want to basically add microbial taxonomy to the seq IDs like so: d__Bacteria; p__Bacteroidota; c__Bacteroidia; o__Bacteroidales; f__Bacteroidales_RF16_group; g__Bacteroidales_RF16_group; s__uncultured_bacterium|00009c1cc42953fb4702f6331325c7cc d__Bacteria; p__Bacteroidota; c__Bacteroidia; o__Sphingobacteriales; f__Sphingobacteriaceae; g__Sphingobacterium; s__uncultured_bacterium|000118a5e731455e942c61a82a40367a623088d0 Where the original seqID is appended to the taxonomy…

Continue Reading bioinformatics – how to replace seqIDs in a fasta file with new seqIDs using biopython

Optimize a script that extract features from Fasta file using biopython

Hey, I have a script that extract features from a large fasta file (1767 MB) using biopython. I am sending it as a bash job via ssh remote server. The job is running for two days now.. Is there a way to optimize my script? I think maybe the problem…

Continue Reading Optimize a script that extract features from Fasta file using biopython

Analyzing and slicing FASTQ file entries using Python

Analyzing and slicing FASTQ file entries using Python 1 I have the code pasted below for running on FASTQ file entries in order to compare specific parts and remove the redundancy of the same sequences (based on the miRNA + umi_seq combination). I save the entry IDs and then make…

Continue Reading Analyzing and slicing FASTQ file entries using Python

Fasta File Python

Fasta File Python 2 How do I go about extracting elements from a fasta file. For example, if I want a list of all the IDS and then length of a sequence in another list how do I do that in base python without using any libraries? for line in…

Continue Reading Fasta File Python

Bioinformatics script using Python/Biopython/Clustalw using stdout to iterate over a directory of proteins

What exactly is the error you are seeing? You shouldn’t set sys.sterr and sys.stdout to string values (the clustalw_cline() function returns the clustal stderr and stdout as strings), as you won’t be able to write anything to stdout from python. I tried to clean up and correct your code below….

Continue Reading Bioinformatics script using Python/Biopython/Clustalw using stdout to iterate over a directory of proteins

Replace sequences between files using Biopython

As you have written it, every time you write a new sequence, you’re overwriting the previous one. Try storing your records in a list and then writing out the list when the loop is completed. to_write = [] for seq1 in SeqIO.parse(r”c:UsersSergioDesktopnsp.fasta”, “fasta”): for seq2 in SeqIO.parse(r”c:UsersSergioDesktopwsp.fasta”, “fasta”): if seq2.id…

Continue Reading Replace sequences between files using Biopython

How to print the first few records using SeqIO from Biopython

There are numerous ways to do this. The most similar to your current structure would be to add a break when the index hits 19 (that is the 20th number since counting starts at 0): from Bio import SeqIO for index, record in enumerate(SeqIO.parse(“e_coli_k12_dh10b.faa”, “fasta”)): print(record.description, len(record.seq)) if index ==…

Continue Reading How to print the first few records using SeqIO from Biopython

SeqIO object get cleared away after being accessed

I’m using Biopython to parse a fastq file, and I found that the SeqIO object get cleared away once I accessed it. from Bio import SeqIO record_fastqIO = SeqIO.parse(‘SRR835775_1.first1000.fastq’,’fastq’) for record in record_fastqIO: print(record.id) This script works perfectly. But if I add one line to the script: from Bio import…

Continue Reading SeqIO object get cleared away after being accessed

MultiProcessing on SeqIO biopython

MultiProcessing on SeqIO biopython 0 Hello, I would like to parse a wheat genome (13Gb) quickly, in order to cut each Sequence and count the fragment lengths and store it in a pandas dataframe. Is it recommendable to use multiprocessing on the SeqIO.parse command? Does it save time? Any experiences/recommendations…

Continue Reading MultiProcessing on SeqIO biopython

Extracting organism and seq from fasta

Extracting organism and seq from fasta 0 Hi, I am trying to extract sequences from a fasta file from a database with a specific organism species keyword from a .txt file containing the relevant headers. Do you know how I can do this in python as the biopython guide I’ve…

Continue Reading Extracting organism and seq from fasta

Fasta file reading python

Answer by Aidan Golden I think you can just use Biopython,It is indeed wrong today. I edited the answer since it has been possible to use str(sequence) for a long time now.,Very useful answer from 7 years ago! FYI, in current version of biopython(1.69), fasta.seq.tostring() is obsolete, use str(fasta.seq) instead.,Nicely…

Continue Reading Fasta file reading python

Question : Improve genbank feature addition

Question Improve genbank feature addition * 60 visibility 0 arrow_circle_up 0 arrow_circle_down I am trying to add more than 70000 new features to a genbank file using biopython. I have this code: from Bio import SeqIO from Bio.SeqFeature import SeqFeature, FeatureLocation fi = “myoriginal.gbk” fo = “mynewfile.gbk” for result in…

Continue Reading Question : Improve genbank feature addition

Biopython: Bio.SeqUtils.molecular_weight for a fasta file

I must write a function, given a file_name that can calculate the molecular weight of only the unambiguous sequences and gives as return sequence id and the corresponding molecular weight. I tried to use the Bio.SeqUtils.molecular_weight to calculate the molecular weight, but I couldn’t do it since SeqUtils.molecular_weight works with…

Continue Reading Biopython: Bio.SeqUtils.molecular_weight for a fasta file

FastTree error while constructing tree

Hey All, I am trying to infer a phylogeny from a multiple sequence alignment using FastTree program, however the program is giving me an error when I run it over the multiple sequence alignment and I can not figure out what the error is saying (not really that informative). My…

Continue Reading FastTree error while constructing tree

What is the correct syntax for BioPythons SeqIO.parse()

What is the correct syntax for BioPythons SeqIO.parse() 0 When reading in an assembly with BioPython’s SeqIO the tutorial indicated when reading in multiple records one should do the following: records = list(SeqIO.parse(“somefile.fasta”, “fasta”)) This produces the expected behaviour of a subscriptable list of records. However this syntax also functions…

Continue Reading What is the correct syntax for BioPythons SeqIO.parse()

Replace fasta header using bash : bioinformatics

Hello people, I got stucked with my new script and perhaps you can help me. Its goal is to take an input table with querys and subjects (originated by a local blast) and replace query names with subject names in the corresponding fasta file. In detail, the table input file…

Continue Reading Replace fasta header using bash : bioinformatics

Remote blast query limit

Remote blast query limit 0 Hello! How many blast queries can be processed by remote blast calls with biopython’s Bio.Blast.NCBIWWW.qblast or BLAST+ with -remote flag? When I go above 1 sequence I get the following message near the top of my XML results file (and no results: internal_error: (Severe Error)…

Continue Reading Remote blast query limit

parsing gbk files (antismash result)

parsing gbk files (antismash result) 0 Hello I used antismash from the CLI and I got 700 gbk files (1 gbk file per each analyzed genome). I used the following script to retrieve the predicted products from the gbk files: from Bio import SeqIO import glob for files in glob.glob(“*.gbk”):…

Continue Reading parsing gbk files (antismash result)

Seqio.Parse Some Error

Seqio.Parse Some Error 2 I am a beginner in bioinformatics world. I am following exercise on biopython but i am stuck here. I am not sure why print command is not working. Please let me know to correct this step. > from Bio import SeqIO > for seq_record in SeqIO.parse(“…

Continue Reading Seqio.Parse Some Error

Linearize fasta files

Program versions used: BBMap – v. 38.32Seqtk – v. 1.3-r106Seqkit – v. 0.8.1Perl – v. 5.16.3Python – v. 3.6.6sed – v. 2.2.2 $ time (cat Homo_sapiens.GRCh38.dna.primary_assembly.fa > /dev/null) real 0m1.050s user 0m0.002s sys 0m1.045s With BBMap – reformat.sh $ time reformat.sh -Xmx40g in=Homo_sapiens.GRCh38.dna.primary_assembly.fa fastawrap=0) java -ea -Xmx40g -cp bbmap/current/ jgi.ReformatReads…

Continue Reading Linearize fasta files

Get chromosome sizes from fasta file

Get chromosome sizes from fasta file 4 Hello, I’m wondering whether there is a program that could calculate chromosome sizes from any fasta file? The idea is to generate a tab file like the one expected in bedtools genomecov for example. I know there’s the fetchChromSize program from UCSC, but…

Continue Reading Get chromosome sizes from fasta file

Fastest way to perform BLAST search using a multi-FASTA file against a remote database

Fastest way to perform BLAST search using a multi-FASTA file against a remote database 0 I have a multi-FASTA file having ~125 protein sequences. I need to perform a BLASTP seach against remote nr database. I tried using NcbiblastpCommandline, but the issue is that it only accepts files as input….

Continue Reading Fastest way to perform BLAST search using a multi-FASTA file against a remote database