TSS of protein coding genes

Hello,

Geneid Genes (geneid.txt.gz) is an older transcript predictor algorithm that is based on the genome sequence alone and only relevant when you are working on a particular locus where you think that the manually curated gene models (Ensembl and RefSeq) have errors.

UCSC RefSeq (refGene.txt.gz) is a gene predictor developed at UCSC. This gene predictor uses protein, EST and cDNA annotations to derive a relatively restricted gene transcript set.

See our FAQ page for more information: genome.ucsc.edu/FAQ/FAQgenes.html#genename

You can use the Table Browser to extract information of start sites (TSS) protein-coding genes. For example, to query the UCSC RefSeq (refGene) on hg38, navigate to the Table Browser (genome.ucsc.edu/cgi-bin/hgTables) and make the following selections:

  1. Under Select dataset:

    clade: Mammal

    genome: Human

    assembly: Dec. 2013 (GRCh38/hg38)

    group: Genes and Gene Predictions

    track: NCBI RefSeq

    table: UCSC RefSeq (refGene)

  2. Set the region: to “genome”

  3. Click create next to “filter:”

  4. On the “Filter on Fields from hg38.refGene” page, insert “cdsStart” next to cdsEnd is, change ignored to “!=” then click submit

  5. Set the output format to “Selected fields from primary and related tables”. This will allow you to select fields of interest. Click get output

  6. On the following page, scroll down to the Linked Tables section and select “hgFixed refLink” then click allow selection from checked tables

  7. You can then select the following fields:

    name Name of gene

    chrom Reference sequence chromosome or scaffold

    strand + or – for strand

    txStart Transcription start position

    protAcc protein accession

  8. Click get output

This should display all the genes with their transcription start sites and protein accession numbers.

If you have any follow up questions, our public help desk can always be reached at genome@soe.ucsc.edu. You may also send questions to genome-www@soe.ucsc.edu if they contain sensitive data. For any Genome Browser questions on Biostars, the UCSC tag is the best way to ensure visibility by the team.


Login
before adding your answer.

Traffic: 3107 users visited in the last hour

Read more here: Source link