I have some questions about the design formula in DESeq2.
I am still quite a beginner in R and have a large dataset to analyze so I have been trying different formulas to find which best describes my design layout, and I have some problems understanding which one is best.
So my experimental set-up is that I have 4 biological replicates and a time course. The baseline is time v0, and then 4 additional timepoints v1, v2, v3, v4 where we would expect something to happen at timepoints v2 and v3 and probably almost everything goes back to baseline at timepoint v4.
I have tried the design formula: ~ time + patient
and the reduced formula ~ patient to see the overall effect of time, would that be the correct way to go?
Here I was quite confused, because when I change the colData description of the patients from numbers 1,2,3,4 to pt1, pt2, pt3, pt4 there seem to be different results
so with numerical patients 1, 2, 3, 4 the resultsNames(dds) gives out
 “Intercept” “time_v1_vs_v0” “time_v2_vs_v0” “time_v3_vs_v0” “time_v4_vs_v0” “patient”
with non-numerical patients, so pt1 …. , there are additional comparisons between pt1_vs_pt2 and so on ( and also different results –> different amount of significant genes)
Why is that and what would be the correct way to do it? I would assume that the character one would be right, since I see that being used everywhere.
Additionally, to test for timepoint-specific differences, would it be correct to use the non-reduced formula ~ time + patient and then contrast timepoint v1 vs timepoint v0 and do so for each time point, as long as afterward I correct for the additional testing?
And I actually have 10 different subsets from which I have done this whole thing, can I do a whole comparison with something like
~ subset + time + patient
and then again the reduced formula
and see the overall effects? Or how would I see the overall effects across subsets over time?
Thank you so much for any answers,
Read more here: Source link