How to find newly submitted accessions in NCBI
Dear all,
I want to automate a process to identify newly submitted plant accessions in NCBI. I am scanning the NCBI FTP server, but I have not yet found any address to locate all SRA accessions.
Does anybody have an idea where I could find this list?
• 581 views
If you are looking for SRA accession numbers you should search the SRA database
or from the command line, I gave it a go:
esearch -db sra -query '"2022/11/28"[Publication Date]' | efetch -format runinfo > 2022-11-18.csv
how may lines?
cat 2022-11-18.csv | wc -l
prints:
4638
looks like today Nov 11, 2022 there were 4638 datasets deposited at SRA … whoa, I did not expect that … I am extraordinarily surprised to be honest. That is a lot of data.
What is the size of all that data?
cat 2022-11-18.csv | csvcut -c size_MB | grep -v size | datamash sum 1
prints:
2471916
which ends up about 2.4 terrabytes.
NCBI publishes a file containing SRA accession numbers. It is updated daily (file is almost a gigabyte so a largeish download). It appears to have accession numbers that start a ways back and are current up to a given date.
$ head NCBI_SRA_Datalist
Submission Run Date
DRA000001 DRR000001 2014-05-26T10:22:28Z
DRA000002 DRR000002 2014-05-26T11:00:19Z
DRA000003 DRR000003 2014-05-26T11:07:49Z
DRA000003 DRR000004 2014-05-26T11:07:46Z
$ tail NCBI_SRA_Datalist
SRA1548151 SRR22428598 2022-11-28T18:25:46Z
SRA1548154 SRR22428656 2022-11-28T18:34:47Z
SRA1548154 SRR22428657 2022-11-28T18:33:44Z
SRA1548154 SRR22428658 2022-11-28T18:33:31Z
Traffic: 944 users visited in the last hour
Read more here: Source link