How to find newly submitted accessions in NCBI

How to find newly submitted accessions in NCBI

2

Dear all,

I want to automate a process to identify newly submitted plant accessions in NCBI. I am scanning the NCBI FTP server, but I have not yet found any address to locate all SRA accessions.

ftp.ncbi.nlm.nih.gov/

Does anybody have an idea where I could find this list?


ncbi

• 581 views

updated 2 hours ago by

▴

10

written 4 months ago by

▴

310

If you are looking for SRA accession numbers you should search the SRA database

or from the command line, I gave it a go:

esearch -db sra -query '"2022/11/28"[Publication Date]' | efetch -format runinfo > 2022-11-18.csv

how may lines?

cat 2022-11-18.csv | wc -l

prints:

4638

looks like today Nov 11, 2022 there were 4638 datasets deposited at SRA … whoa, I did not expect that … I am extraordinarily surprised to be honest. That is a lot of data.

What is the size of all that data?

 cat 2022-11-18.csv | csvcut -c size_MB | grep -v size | datamash sum 1

prints:

 2471916

which ends up about 2.4 terrabytes.

NCBI publishes a file containing SRA accession numbers. It is updated daily (file is almost a gigabyte so a largeish download). It appears to have accession numbers that start a ways back and are current up to a given date.

$ head NCBI_SRA_Datalist 
Submission  Run Date
DRA000001   DRR000001   2014-05-26T10:22:28Z
DRA000002   DRR000002   2014-05-26T11:00:19Z
DRA000003   DRR000003   2014-05-26T11:07:49Z
DRA000003   DRR000004   2014-05-26T11:07:46Z

$ tail NCBI_SRA_Datalist 

SRA1548151  SRR22428598 2022-11-28T18:25:46Z
SRA1548154  SRR22428656 2022-11-28T18:34:47Z
SRA1548154  SRR22428657 2022-11-28T18:33:44Z
SRA1548154  SRR22428658 2022-11-28T18:33:31Z


Login
before adding your answer.

Traffic: 944 users visited in the last hour

Read more here: Source link