Please advise a tutorial/course on genetic data analysis
Hello everyone! I’m by no means a bioinformaticist, but would like to learn some art (my background is chemistry/computer science/machine learning, I do ML-supported drug design).
I would like to analyse human genetic data. Specifically, the task is as follows: given a pair of FASTQ files (produced by Illumina), I would like to get a list of mutations of each gene present in the data in the form GENE <position> A>C or as an rs number. The data contains cDNA reads.
So far I was able to run this pipeline to completion: gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/
However, I can not make any sense of the results. The pipeline produced some VCF files, but the SNPs seem to be not annotated with genes or at least
I can not read it right :(.
I used this reference genome:
ftp.ensembl.org/pub/release-86/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
Could you please advise an easy pipeline/tutorial/course to learn how to do a basic SNP analysis? Or please advise how to use the mentioned tools
• 25 views
The sandbox.bio SNP alignment tutorial is very nice: sandbox.bio/tutorials?id=dna-secrets
It uses the lambda phage reference genome, which is small enough to run on your computer. Learning with the much larger human genome takes longer but doesn’t teach you that much more.
Looking at your tutorial – raw_snps.vcf will not contain any gene names, just the positions of SNPs in the reference genome. filtered_snps_final.ann.vcf should contain gene names but I expect the majority to be up- or downstream of genes, just like in ‘real life’.
Traffic: 2227 users visited in the last hour
Read more here: Source link