Ensembl ID mapping GRCh37 vs GRCh38

Ensembl ID mapping GRCh37 vs GRCh38


I currently have a large list of Ensembl protein IDs (ENSP) that are from GRCh37. I need to map these IDs to the entry name listed on the UniProt website (e.g. ‘CASPE_HUMAN’ ). I am having trouble doing this using the UniProt dataset since it is up to date with the GRCh38 Ensembl IDs. Right now, I have a dataset that maps GRCh37 IDs to UniProtKB-AC (e.g. P31944)- some of these UniProt IDs are obsolete though. Is there a way I can see which Ensembl IDs have been updated in GRCh38 version? My overall goal is to find the updated UniProt IDs for the list of GRCh37_IDs I have.

I would love to have a dataframe that looks like (currently using Python):

GRCh37_ID      GRCh38_ID                      Old UniProt           New UniProt
ENSP001            ENSP001                      P1234                    P1234
ENSP002            ENSP004                      P4567                    P5632
ENSP003            ENSP009                      P1292                    P1292
ENSP004            ENSP0012                     P1434                    P2434

After this, I could just grab the new Uniprot ID that corresponds to my old GRCh37_IDs to find the entry name.
Is this possible? I’ve been struggling to figure this out.

I started with a list of Ensembl Translation/Protein stable IDs (ENSPs) for GRCh37 and I want to find their UniProtKB-SwissProtIDs. The issue I am having is that when I use BioMart, there are some UniProtKB-SwissProtIDs included that are no longer in the UniProt system (so I can’t find an entry_name for it). I was thinking in order to combat this, I could find the corresponding ENSPs for GRCh38 and then find their UniProtKB-SwissProtIDs since it should be more up to date. The issue is, I don’t know how to map the old ENSPs to the new ones.





7 hours ago by



Read more here: Source link