How to get protein ID from gene ID (batch entrez)
Hi
can someone suggest me How to get protein ID from gene ID (batch entrez).
I have hundreds of gene name like AaeL_AAEL004207 with gene ID 5564359. Manually we can get the protein ID one by one, the problem I have hundreds of that, obviously it seem not a good idea, any one can suggest me..?
thanks
• 4.5k views
With Entrez Direct:
epost -db gene -id 5564359 | elink -target protein | efetch -format uid
157105044
You can include multiple gene IDs (at least 500) in the -id
part, separated by commas. Here’s a script:
#!/bin/bash
exist=$(which epost)
if [ $(echo $? != 0) ]
then
echo "Entrez Direct not in $PATH"
exit
fi
if [ -n "$1" ]
then
split -l 500 $1 input.
for f in input.*
do
ids=$(cat $f | tr "n" ",")
epost -db gene -id $ids | elink -target protein | efetch -format uid > $f.output
paste $f $f.output > $f.result
rm $f $f.output
done
cat *.result > $1.output
rm *.result
else
echo "Usage: sh convertGeneIDs listOfGeneIDsnOutput: geneIDtproteinID"
fi
Traffic: 1636 users visited in the last hour
Read more here: Source link