blastx versus tblastx

hello everyone

I have a question about the blast.
I admit that I do not understand everything.

I have been asked to blastx an fsa file of arabidopsis thaliana sequences against an oak gene model. In order to see if there were any matching sequences between the two species:

My data is formatted like this:

Reference.fsa

>Qrob_P00010.2 69
ATGTCTGGCCCTGAAAA........

Fasta file arabidopsis:

>AT3G25210.1 | Symbols:  | Tetratricopeptide repeat (TPR)-like superfamily protein | chr3:9180348-9181487 FORWARD LENGTH=1140
ATGTCGGCGACACTCCGACGCCTCATTCTTCTCACC..............

When I wanted to do a blastx, I first made my reference a protein database with these commands:

"makeblastdb -in ref.fsa  -dbtype prot -blastdb_version 5 -parse_seqids"
"blastx -query fasta_arabido.fsa -db ref.fsa -out ara.txt "

However when I do my blastx, I get no hits.

    Query= AT3G25210.1 | Symbols:  | Tetratricopeptide repeat (TPR)-like
superfamily protein | chr3:9180348-9181487 FORWARD LENGTH=1140
Length=1140


***** No hits found *****

Lambda      K        H        a         alpha
   0.318    0.134    0.401    0.792     4.96 

I saw that I could try to make a tblastx but by making my reference a nucleic database and I get the results below which seems correct.

"makeblastdb -in ref.fsa  -dbtype nucl -blastdb_version 5 -parse_seqids"
"tblastx -query fasta_arabido.fsa -db ref.fsa -out ara.txt "

ara.txt

    Query= AT3G25210.1 | Symbols:  | Tetratricopeptide repeat (TPR)-like
superfamily protein | chr3:9180348-9181487 FORWARD LENGTH=1140

Length=1140
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value   N

Qrob_P0702440.2 1323                                                  514     9e-146  1
>Qrob_P0702440.2 1323
Length=1323

 Score = 514 bits (1116),  Expect = 9e-146
 Identities = 204/312 (65%), Positives = 259/312 (83%), Gaps = 0/312 (0%)
 Frame = +1/+1

Query  157   RTRTPLETQFETWIQNLKPGFTNSDVVIALRAQSDPDLALDIFRWTAQQRGYKHNHEAYH  336
             R++T LETQFETW+QNLKPGFT SDV   L +QSDPDLALD+FRWT  QRGY H H  Y 
Sbjct  190   RSKTQLETQFETWVQNLKPGFTPSDVEHTLWSQSDPDLALDLFRWTTLQRGYTHTHATYF  369

Query  337   TMIKQAITGKRNNFVETLIEEVIAGACEMSVPLYNCIIRFCCGRKFLFNRAFDVYNKMLR  516
             T+IK  ++ KR    ETLIEEV++GAC++++PLYN II+FCC ++ LFNRAFDVY KM  
Sbjct  370   TIIKILVSNKRYGLAETLIEEVLSGACDINIPLYNYIIKFCCDKRSLFNRAFDVYKKMYN  549

Query  517   SDDSKPDLETYTlllssllKRFNKLNVCYVYLHAVRSLTKQMKSNGVIPDTFVLNMIIKA  696
             S++ KP+L+TY++L + LL+RFNKLNVCY+YL + +SL+KQMK+ GVIPDT+VLNMIIKA
Sbjct  550   SENCKPNLQTYSMLFNLLLRRFNKLNVCYMYLQSAKSLSKQMKAAGVIPDTYVLNMIIKA  729

Query  697   YAKCLEVDEAIRVFKEMALYGSEPNAYTYSYLVKGVCEKGRVGQGLGFYKEMQVKGMVPN  876
             Y+KCLEVDEAIRVF+EM LYG EPNAYTY Y+VKG+CEKGRVGQG GFY+EM+ KG+VP+
Sbjct  730   YSKCLEVDEAIRVFREMGLYGCEPNAYTYGYMVKGLCEKGRVGQGFGFYEEMKGKGLVPS  909

Query  877   GSCYMVLICSLSMERRLDEAVEVVYDMLANSLSPDMLTYNTVLTELCRGGRGSEALEMVE  1056
              S YM+LICSL++ERR ++A+ VV+DML N + PD+LTY T+L  LCR GRG+EA E+++
Sbjct  910   SSSYMILICSLALERRFEDAIGVVFDMLGNFMGPDLLTYKTLLEGLCREGRGNEAFELLD  1089

Query  1057  EWKKRDPVMGER  1092
             E +KRD  M E+
Sbjct  1090  ELRKRDRSMSEK  1125

I don’t understand the real difference between the blastx and the tblastx
one is based on a protein database and the other on a nucleic database but is it because my reference file is in nucleotide I could not make a protein database?

Did I do the right thing according to you?

Thank you in advance for your answer

Have a nice day

Aka

Source link