Please advise a tutorial/course on genetic data analysis
Hello everyone! I’m by no means a bioinformaticist, but would like to learn some art (my background is chemistry/computer science/machine learning, I do ML-supported drug design).
I would like to analyse human genetic data. Specifically, the task is as follows: given a pair of FASTQ files (produced by Illumina), I would like to get a list of mutations of each gene present in the data in the form GENE <position> A>C or as an rs number. The data contains cDNA reads.
The sandbox.bio SNP alignment tutorial is very nice: sandbox.bio/tutorials?id=dna-secrets
It uses the lambda phage reference genome, which is small enough to run on your computer. Learning with the much larger human genome takes longer but doesn’t teach you that much more.
Looking at your tutorial – raw_snps.vcf will not contain any gene names, just the positions of SNPs in the reference genome. filtered_snps_final.ann.vcf should contain gene names but I expect the majority to be up- or downstream of genes, just like in ‘real life’.