Need help understanding a GWAS in an article. : bioinformatics

Hello r/bioinformatics.

I’m a student studying plant selection (with an interest in bioinformatics) and right now for a course work (yeah I know using reddit for homework is frowned upon) I need to showcase an example of GWAS used to identify biotic stress. So I found this example of GWAS (among other things) used to locate genes related to resistance to a virus in Maize.

However I’m a bit in a sunk cost fallacy right now, because I invested a lot of time already describing the article and now I find out that I don’t really understand the authors’ figures.

So I thought I’d ask here since I think people here will be experts in what I have an issue with, but do not hesitate to redirect me to a more relevant subreddit if there is one.

I’m mostly talking about figure 4 and 5:

In figure 4 I don’t understand what the authors meant by “traits” things like GLM14HZ, GLM15HB, etc. Also, perhaps this is more me not remembering my classes well, but how can a GWAS “account for” minor allele frequency as they claim to do ?

Figure 5 ,while very pretty to look at, puzzles me. So they used 4 differents models of GWAS in this study (which I didn’t understand the fine difference between them, it’s not my priority but I’d love an ELI5). Throughout the article they say things like “We used an intersection of significant markers detected by all methods and found that 22 common markers were detected by the four models”. However in figure 5, if I understood it properly, only the GLM model shows any SNP with -log(p-value) above the thresold. So how can you explain this discrepancy between what is stated in the text of the article (“there were multiple SNP found in every model and we took those that were in common”) and what is shown in Figure 5 (“only the GLM has significant SNP”) ?

Thanks in advance for the help, and sorry if it’s the wrong sub.

Read more here: Source link