Download nucleotide sequence with locus_tag

Download nucleotide sequence with locus_tag

1

I have a list of locus_tag, my idea was to download them using esearch but the downloaded file is not the desired gene, instead the nucleotide sequence of the entire contig is downloaded.

in this example my gene of interest to download has 830 nc.

esearch -db nucleotide -query "JG64_RS07240" | efetch -format fasta > gen.fasta

Any idea to obtain by esearch only my sequence of interest and not all the contig?

I know I can do it manually, but I have more than 400 locus_tag that do not have gi.

Thanks for reading, I’ll be attentive to any response


SEQUENCE


LOCUS_TAG


NUCLEOTIDE


NCBI

• 31 views

You can do this:

$ esearch -db nucleotide -query "JG64_RS07240" | efetch -format gene_fasta | awk '/^>/ {printf("%s%st",(N>0?"n":""),$0);N++;next;} {printf("%s",$0);} END {printf("n");}' | grep JG64_RS07240 | tr "t" "n"
>lcl|NZ_JQQM01000039.1_gene_17 [locus_tag=JG64_RS07240] [location=19567..20385] [gbkey=Gene]
ATGAAAAAACTTTCGATTTTGGCTATCTCCGTTGCACTCTTTGCAAGCATTACCGCTTGTGGTGCTTTCGGTGGTCTGCCAAGCCTAAAAAGCTCTTTTGTTCTGAGCGAGGACACAATCCCAGGGACAAACGAAACCGTAAAAACGTTACTTCCCTACGGATCTGTGATCAACTATTACGGATACGTAAAGCCAGGACAAGCGCCGGACGGTTTAGTCGATGGAAACAAAAAAGCATACTATCTCTATGTTTGGATTCCTGCCGTAATCGCTGAAATGGGAGTTCGTATGATTTCCCCAACAGGCGAAATCGGTGAGCCAGGCGACGGAGACTTAGTAAGCGACGCTTTCAAAGCGGCTACCCCAGAAGAAAAATCAATGCCACATTGGTTTGATACTTGGATCCGTGTAGAAAGAATGTCGGCGATTATGCCTGACCAAATCGCCAAAGCTGCGAAAGCAAAACCAGTTCAAAAATTGGACGATGATGATGATGGTGACGATACTTATAAAGAAGAGAGACACAACAAGTACAACTCTCTTACTAGAATCAAGATCCCTAATCCTCCAAAATCTTTTGACGATCTGAAAAACATCGACACTAAAAAACTTTTAGTAAGAGGTCTTTACAGAATTTCTTTCACTACCTATAAACCAGGTGAAGTGAAAGGATCTTTCGTTGCATCTGTTGGTCTGCTTTTCCCACCAGGTATTCCAGGTGTGAGCCCGCTGATCCACTCAAATCCTGAAGAATTGCAAAAACAAGCTATCGCTGCTGAAGAGTCTTTGAAAAAAGCTGCTTCTGACGCGACTAAGTAA

If you have a list of those ID’s then use a for loop.

Simply fetch all the gene sequences using

$ esearch -db nucleotide -query "JG64_RS07240" | efetch -format gene_fasta > all_genes.fa

$ for i in `cat ids.txt`; do awk '/^>/ {printf("%s%st",(N>0?"n":""),$0);N++;next;} {printf("%s",$0);} END {printf("n");}' < all_genes.fa | grep ${i} | tr "t" "n" >> needed.fa; done

needed.fa will have sequences you want.


Login
before adding your answer.

Traffic: 1389 users visited in the last hour

Read more here: Source link