Biopython’s Esearch for Pubmed does not give the same results as web search

I can think of two factors that might cause different results between Biopython and the web search:

  1. Depending on how specific the query you give Biopython is, it will be translated before retrieving results. Example: <sclerosis> will be translated to <“sclerosis”[MeSH Terms] OR “sclerosis”[All Fields]>
  2. As GenoMax pointed out, the database version that Biopython is using might be older than that of the webpage.

You can find out what your query is translated to as well as the database build and last update as follows:

from Bio import Entrez

def search(query):
    Entrez.email="example@mail.com"
    handle = Entrez.esearch(db='pubmed',
                            sort="pub date",
                            retmax='10',
                            retmode="xml",
                            term=query)
    results = Entrez.read(handle)
    print('Count: ' + results['Count'])
    print('QueryTranslation: ' + results['QueryTranslation'])
    return results

def get_info(db):
    Entrez.email="example@mail.com"
    handle = Entrez.einfo(db=db)
    results = Entrez.read(handle)
    print('DbBuild: ' + results['DbInfo']['DbBuild'])
    print('LastUpdate: ' + results['DbInfo']['LastUpdate'])
    return results['DbInfo']

def fetch_details(id_list):
    ids=",".join(id_list)
    Entrez.email="example@mail.com"
    handle = Entrez.efetch(db='pubmed',
                           retmode="xml",
                           id=ids)
    results = Entrez.read(handle)
    return results

if __name__ == '__main__':
    query = 'sclerosis'
    results = search(query)
    db_info = get_info('pubmed')
    id_list = results['IdList']
    papers = fetch_details(id_list)
    for i, paper in enumerate(papers['PubmedArticle']):
        print("%d) %s" % (i + 1, paper['MedlineCitation']['Article']['ArticleTitle']))

Output:

Count: 170232

QueryTranslation: “sclerosis”[MeSH Terms] OR “sclerosis”[All Fields]

DbBuild: Build210622-2217m.2

LastUpdate: 2021/06/23 06:55

1) Fibrosis as a common trait in amyotrophic lateral sclerosis
tissues.

2) Lower and upper motor neuron involvement and their impact on
disease prognosis in amyotrophic lateral sclerosis.

3) Predictive value of sub classification of focal segmental
glomerular sclerosis in Oxford classification of IgA nephropathy.

4) Bushen Yijing Decoction (BSYJ) exerts an anti-systemic sclerosis
effect via regulating MicroRNA-26a /FLI1 axis.

5) Hodgkin lymphoma involving extranodal sites in head and neck:
report of twenty-nine cases and review of three-hundred and
fifty-seven cases.

6) Galangin ameliorates experimental autoimmune encephalomyelitis in
mice via modulation of cellular immunity.

7) 11C-PK11195 plasma metabolization has the same rate in
multiple sclerosis patients and healthy controls: a cross-sectional
study.

8) Multiple sclerosis: why we should focus on both sides of the
(auto)antibody.

9) Teriflunomide provides protective properties after
oxygen-glucose-deprivation in hippocampal and cerebellar slice
cultures.

10) Neuroimmune connections between corticotropin-releasing hormone
and mast cells: novel strategies for the treatment of
neurodegenerative diseases.

Comparing the Biopython and web search results for the translated query, I get 170,232 vs. 170,426 results. The top 10 results are the same, albeit in a slightly different order.

Read more here: Source link