Why can I not reproduce some of TCGA’s MAF file contexts from the coding sequences of the mutated genes?

Why can I not reproduce some of TCGA’s MAF file contexts from the coding sequences of the mutated genes?

0

I am working with the mc3.v0.2.8.PUBLIC.maf.gz MAF file (downloaded from here) and I need to analyze the coding sequences (CDS) of the mutated genes. I only do this for SNP mutations that are silent, missense or non-sense. I also explcitly only consider mutations which have a valid context value (i.e. a string of 11 nucleotides since its the mutated base pair +/- 5 nucleotides). The strategy I tried was:

  1. Use Ensembl’s REST API (for grch37) to get the CDS sequences of all
    unique features in the MAF file. All features are transcripts so the
    query [object_type] that I give to the API are ensembl transcript
    IDs which come with the MAF file.
  2. For each mutation in the MAF, get the CDS position (a column in the MAF) of the mutation
    and get this position +/- 5 nucleotides in the retrieved CDS
    sequence of the corresponding transcript. These are the “fetched
    contexts”.
  3. Finally, check if the fetched contexts are equal to the contexts in
    the MAF for checking correctness of the strategy.

Out of the contexts from 2,861,189 mutations that I am considering, only 1,369,738 have exact matches with the fetched contexts. Since the MAF is based on NCBI’s build of grch37 I thought maybe the differences were due to this, so I took an example mutation which was mismatching (TCGA-02-0003-01A-01D-1490-08, ENST00000227163, CDS_Position: 379) and searched the CDS directly on NCBI. To my surprise, the CDS from NCBI matched perfectly with the context of the ENSEMBL CDS (TGTCCCCAGCC) and the MAF’s context has nothing to do with either ENSEMBL or NCBI (GGCTGGGGACA).

In fact while I was writing this post, I also noticed the codon column for the example mutation did not matched the context in the MAF! It does match my fetched CDS from ENSEMBL or NCBI though. What is going on here?


context


MAF


TCGA


CDS

• 14 views

Read more here: Source link