Get chromosome sizes from fasta file

Get chromosome sizes from fasta file



I’m wondering whether there is a program that could calculate chromosome sizes from any fasta file? The idea is to generate a tab file like the one expected in bedtools genomecov for example.

I know there’s the fetchChromSize program from UCSC, but not all genomes are available over there (I need TAIR10 for instance). I’ve read this topic already.

I would like a tool that can deal with any genome regardless of the database. If it doesn’t exist I guess it’s possible to just parse fasta files, but I’d be surprised if no one else had done it before!





pip install pyfaidx
faidx input.fasta -i chromsizes > sizes.genome

That’s what you’re looking for if you want to use bedtools genomecov, but you can transform to a BED file as well using -i bed.

Just found out faidx is available with samtools so another possibility is:

samtools faidx input.fa
cut -f1,2 input.fa.fai > sizes.genome

updated 2.3 years ago by


written 5.6 years ago by



  • Another good option is to use the faSize command from UCSC tools. It works on any fasta file, not just UCSC genomes.

  • You run it like this: faSize -detailed -tab file.fasta

  • Default behavior is to print to STDOUT, so you can redirect the output to a file like this:
    faSize -detailed -tab file.fasta > output.txt

  • The output will be a tab delimited file with sequence name on the first column and the length on the second.

  • You can easily install faSize and other tools from UCSC using Bioconda

samtools was crashing because I included the HLA’s,

HLA-A*01:01:38L 3374


Traceback (most recent call last):
  File "/usr/local/bin/faidx", line 11, in <module>
    load_entry_point('pyfaidx==0.5.2', 'console_scripts', 'faidx')()
  File "/usr/local/lib/python2.7/dist-packages/pyfaidx/", line 197, in main
  File "/usr/local/lib/python2.7/dist-packages/pyfaidx/", line 50, in write_sequence
    outfile.write(transform_sequence(args, fasta, name, start, end))
  File "/usr/local/lib/python2.7/dist-packages/pyfaidx/", line 120, in transform_sequence
    line_len = fasta.faidx.index[name].lenc
KeyError: 'HLA-A*01'

Finally just did:

from Bio import SeqIO
for rec in SeqIO.parse("hg38-Mix.fa","fasta"):

updated 2.3 years ago by


written 2.9 years ago by


before adding your answer.

Traffic: 1103 users visited in the last hour

Read more here: Source link