Get chromosome sizes from fasta file
Hello,
I’m wondering whether there is a program that could calculate chromosome sizes from any fasta file? The idea is to generate a tab file like the one expected in bedtools genomecov for example.
I know there’s the fetchChromSize program from UCSC, but not all genomes are available over there (I need TAIR10 for instance). I’ve read this topic already.
I would like a tool that can deal with any genome regardless of the database. If it doesn’t exist I guess it’s possible to just parse fasta files, but I’d be surprised if no one else had done it before!
Cheers
• 23k views
-
Another good option is to use the faSize command from UCSC tools. It works on any fasta file, not just UCSC genomes.
-
You run it like this:
faSize -detailed -tab file.fasta
-
Default behavior is to print to STDOUT, so you can redirect the output to a file like this:
faSize -detailed -tab file.fasta > output.txt
-
The output will be a tab delimited file with sequence name on the first column and the length on the second.
-
You can easily install faSize and other tools from UCSC using Bioconda anaconda.org/bioconda/ucsc-fasize
samtools was crashing because I included the HLA’s,
HLA-A*01:01:38L 3374
Traceback (most recent call last):
File "/usr/local/bin/faidx", line 11, in <module>
load_entry_point('pyfaidx==0.5.2', 'console_scripts', 'faidx')()
File "/usr/local/lib/python2.7/dist-packages/pyfaidx/cli.py", line 197, in main
write_sequence(args)
File "/usr/local/lib/python2.7/dist-packages/pyfaidx/cli.py", line 50, in write_sequence
outfile.write(transform_sequence(args, fasta, name, start, end))
File "/usr/local/lib/python2.7/dist-packages/pyfaidx/cli.py", line 120, in transform_sequence
line_len = fasta.faidx.index[name].lenc
KeyError: 'HLA-A*01'
Finally just did:
from Bio import SeqIO
for rec in SeqIO.parse("hg38-Mix.fa","fasta"):
print rec.id+"t"+str(len(rec.seq))
Traffic: 1103 users visited in the last hour
Read more here: Source link