Accessing Uniprot Info Via Python

Accessing Uniprot Info Via Python


Does anyone here regularly access uniprot info using python? If so how?

I tried downloading through github but was unable to figure out the installation. What does everyone here use?




updated 3 hours ago by



written 7.5 years ago by



I do not regularly access Uniprot from Python, but just today solved a matching Rosalind task. My solution uses the urllib library to download the data:

import urllib
code = "Q7Z7W5"
data = urllib.urlopen("" + code + ".txt").read()

And then uses split() to process the file line by line. Each line has some structure and starts with a two character code, like “DR”. The content of the lines is reasonably well structured, and, as the Rosalind task requires, allows you to extract GO ontology term annotation.

It amazes me that this simple thing is not answered correctly on any of the biostars questions related to this issue (“how can I get uniprot sequences via python”). It took me some time to find out how to add multiple ids in the query… hopefully this is useful for future visitors …

def get_uniprot_sequences2(uniprot_ids: List) -> pd.DataFrame:
        Retrieve uniprot sequences based on list of uniprot sequence identifier.

        For large lists it is recommended to perform batch retrieval.
        documentation which columns are available:

        this python script is based on

            uniprot_ids: List, list of uniprot identifier

            pd.DataFrame, pandas dataframe with uniprot id column and sequence
        import urllib
        url=""  # This is the webser to retrieve the Uniprot data
        params = {
            'from': "ACC",
            'to': 'ACC',
            'format': 'tab',
            'query': " ".join(uniprot_ids),
            'columns': 'id,sequence'}

        data = urllib.parse.urlencode(params)
        data = data.encode('ascii')
        request = urllib.request.Request(url, data)
        with urllib.request.urlopen(request) as response:
            res =
        df_fasta = pd.read_csv(StringIO(res.decode("utf-8")), sep="t")
        df_fasta.columns = ["Entry", "Sequence", "Query"]
        # it might happen that 2 different ids for a single query id are returned, split these rows
        return df_fasta.assign(Query=df_fasta['Query'].str.split(',')).explode('Query')

before adding your answer.

Traffic: 2376 users visited in the last hour

Read more here: Source link