I’m wondering whether there is a program that could calculate chromosome sizes from any fasta file? The idea is to generate a tab file like the one expected in bedtools genomecov for example.
I know there’s the fetchChromSize program from UCSC, but not all genomes are available over there (I need TAIR10 for instance). I’ve read this topic already.
I would like a tool that can deal with any genome regardless of the database. If it doesn’t exist I guess it’s possible to just parse fasta files, but I’d be surprised if no one else had done it before!
samtools was crashing because I included the HLA’s,
Traceback (most recent call last):
File "/usr/local/bin/faidx", line 11, in <module>
load_entry_point('pyfaidx==0.5.2', 'console_scripts', 'faidx')()
File "/usr/local/lib/python2.7/dist-packages/pyfaidx/cli.py", line 197, in main
File "/usr/local/lib/python2.7/dist-packages/pyfaidx/cli.py", line 50, in write_sequence
outfile.write(transform_sequence(args, fasta, name, start, end))
File "/usr/local/lib/python2.7/dist-packages/pyfaidx/cli.py", line 120, in transform_sequence
line_len = fasta.faidx.index[name].lenc
Finally just did:
from Bio import SeqIO
for rec in SeqIO.parse("hg38-Mix.fa","fasta"):