I am currently trying to establish a metabolic pathway analysis pipeline for my lab, using complete genomes. The final goal is to be able to see the primary metabolic pathways from a complete/draft genome.
As a test, I am using the sequence of the complete genome of E. coli K-12 MG1655, which I downloaded the FASTA file from NCBI. I then annotate the genome using RAST Server, and downloaded the annotated genes in a Amino Acid multi-FASTA format, as requested by the KAAS page.
I then upload the annotated genes into KAAS, using BLAST, with Bi-Directional Best Hit, and I only selected eco (Escherichia coli K-12 MG1655) for the KEGG GENES data set.
But when I look at the results, I see that not all BRITE hierarchies are listed, which is to be expected as the genes were annotated through RAST, but I also noticed in the pathway maps that while all the expected genes from E. coli was there, the genes associated with the pathways includes genes from mammals, and archeas, and not just from E. coli. I thought that limiting the data set would prevent this but it seems to not be the case.
I am very new to bioinformatics and I am learning as I go, so I am sorry if there is an obvious answer to this question.
Thank you in advance,
Read more here: Source link