Accesing reference genome from Genome database (ncbi) with biopython

Accesing reference genome from Genome database (ncbi) with biopython

1

Hello all,

I would like to acces to the reference genome RefSeq UID given a taxonomy id using the Genome database with biopython.

I will try to explain with images what I mean. I search in the Genome database using a taxonomy id. It returns me a single result, then i click on the “Reference genome” link.

search of a determinated genome with taxonomy id

Now I scroll to the bottom of the page and get RefSeq reference genome UID for the given taxonomy ID.

After clicking the link i can get the RefSeq uid

Is it possible to achieve this using biopython ?


taxonomyID


genome


reference


biopython

• 36 views

Using Entrezdirect (truncated to save space).

$ esearch -db taxonomy -query "1005566  [taxID]" | elink -target nuccore | efetch -format docsum | xtract -pattern DocumentSummary -if SourceDb -contains refseq -element Caption,Title,SourceDb
NZ_AMUP00000000 Escherichia coli 07798, whole genome shotgun sequencing project refseq
NZ_JH964525 Escherichia coli 07798 strain 7798 E07798.contig.252, whole genome shotgun sequence refseq
NZ_JH964524 Escherichia coli 07798 strain 7798 E07798.contig.251, whole genome shotgun sequence refseq
NZ_JH964523 Escherichia coli 07798 strain 7798 E07798.contig.249, whole genome shotgun sequence refseq
NZ_JH964522 Escherichia coli 07798 strain 7798 E07798.contig.248, whole genome shotgun sequence refseq
NZ_JH964521 Escherichia coli 07798 strain 7798 E07798.contig.247, whole genome shotgun sequence refseq
NZ_JH964520 Escherichia coli 07798 strain 7798 E07798.contig.246, whole genome shotgun sequence refseq
NZ_JH964519 Escherichia coli 07798 strain 7798 E07798.contig.245, whole genome shotgun sequence refseq
NZ_JH964518 Escherichia coli 07798 strain 7798 E07798.contig.244, whole genome shotgun sequence refseq
NZ_JH964517 Escherichia coli 07798 strain 7798 E07798.contig.241, whole genome shotgun sequence refseq

If you only want NC* accessions then

$ esearch -db taxonomy -query "511145  [taxID]" | elink -target nuccore | efetch -format docsum | xtract -pattern DocumentSummary -if SourceDb -contains refseq -element Caption,Title,SourceDb | grep NC
NC_000913   Escherichia coli str. K-12 substr. MG1655, complete genome  refseq


Login
before adding your answer.

Traffic: 1374 users visited in the last hour

Read more here: Source link