Solved below code is giving error . please assist. To

below code is giving error . please assist.

To obtain the human protein sequences in multiple FASTA format, you can use the following script:

I have written the code in Python:

# Load necessary modules

from Bio import SeqIO

import gzip

# Read in human genome file

genome_file=”hg38.fa.gz”

with gzip.open(genome_file, ‘rt’) as f:

genome = SeqIO.parse(f, ‘fasta’)

# Read in RefSeq table

refseq_file=”[path to RefSeq table file]”

with open(refseq_file, ‘r’) as f:

refseq = SeqIO.parse(f, ‘tab’)

# Create dictionary of gene sequences

gene_dict = {}

for record in genome:

gene_name = record.id.split()[0]

gene_dict[gene_name] = record.seq

# Create dictionary of protein sequences

protein_dict = {}

for record in refseq:

if record.features:

for feature in record.features:

if feature.type == ‘CDS’:

gene_name = feature.qualifiers[‘gene’][0]

gene_seq = gene_dict.get(gene_name, None)

if gene_seq is not None:

protein_seq = gene_seq[feature.location.start.position:feature.location.end.position].translate()

protein_name = f”>{record.id}:{record.name}:{gene_name}:{feature.qualifiers[‘protein_id’][0]}”

protein_dict[protein_name] = protein_seq

# Write output file

output_file=”[output file name]”

with open(output_file, ‘w’) as f:

for protein_name, protein_seq in protein_dict.items():

f.write(f”{protein_name}\n{protein_seq}\n”)

Error .

Each line should have one tab separating the title and sequence, this line has 11 tabs: 'chr1\t67092164\t67109072\tXM_011541469.2\t0\t-\t67093004\t67103382\t0\t5\t1440,187,70,145,44,\t0,3070,4087,11073,16864,\n'

Requirement :

The ID field describes what the sequence is. You should use the concatenation (with colon
“:” as the delimiter) of the RefSeq table name and name2 fields as the ID. For example, for
the first record in the RefSeq table, the corresponding ID should be. “>NM_001276352.2:Clorf141”. The sequence field simply records the corresponding sequence, all in one line. For
example:
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGS
AQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHC
LLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR.

Ref table data.

#”bin” name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames
0 XM_011541469.2 chr1 67092164 67109072 67093004 67103382 5 67092164,67095234,67096251,67103237,67109028, 67093604,67095421,67096321,67103382,67109072, 0 C1orf141 cmpl cmpl 0,2,1,0,-1,
0 XM_017001276.2 chr1 67092164 67131227 67093004 67127240 9 67092164,67095234,67096251,67103237,67111576,67115351,67125751,67127165,67131141, 67093604,67095421,67096321,67103382,67111644,67115464,67125909,67127257,67131227, 0 C1orf141 cmpl cmpl 0,2,1,0,1,2,0,0,-1,
0 XM_011541467.2 chr1 67092164 67131227 67093004 67127240 9 67092164,67095234,67096251,67103237,67111576,67115351,67125751,67127165,67131141, 67093604,67095421,67096321,67103343,67111644,67115464,67125909,67127257,67131227, 0 C1orf141 cmpl cmpl 0,2,1,0,1,2,0,0,-1,

Read more here: Source link