How to get protein ID from gene ID (batch entrez)
can someone suggest me How to get protein ID from gene ID (batch entrez).
I have hundreds of gene name like AaeL_AAEL004207 with gene ID 5564359. Manually we can get the protein ID one by one, the problem I have hundreds of that, obviously it seem not a good idea, any one can suggest me..?
• 4.5k views
With Entrez Direct:
epost -db gene -id 5564359 | elink -target protein | efetch -format uid 157105044
You can include multiple gene IDs (at least 500) in the
-id part, separated by commas. Here’s a script:
#!/bin/bash exist=$(which epost) if [ $(echo $? != 0) ] then echo "Entrez Direct not in $PATH" exit fi if [ -n "$1" ] then split -l 500 $1 input. for f in input.* do ids=$(cat $f | tr "n" ",") epost -db gene -id $ids | elink -target protein | efetch -format uid > $f.output paste $f $f.output > $f.result rm $f $f.output done cat *.result > $1.output rm *.result else echo "Usage: sh convertGeneIDs listOfGeneIDsnOutput: geneIDtproteinID" fi
Traffic: 1636 users visited in the last hour
Read more here: Source link