First time posting so apologies in advance if I missed any guidelines after reading through the posting guide.
I have a question pertaining to model design for DESeq2 with nested “replicates”. I’ve read through this post as well as this relevant section in the DESeq2 vignette as well as many others. While these two posts almost answer my question, I feel like there is something still a bit different about my predicament.
I have a study designed as follows:
I am interested in the Tissue Type specific effects on the Condition while controlling for individual tissue donor effects.
For each tissue donor (Normal or Diseased, n=2 donors for each type), I have two different treatment conditions (Control or Treated). For each treatment condition, I have three replicates. The three replicates for each condition, each within a donor, were performed and sequenced independently of one another. With that in mind, I wouldn’t immediately consider these technical replicates but I can see how they could be interpreted that way. My issue is how to account of these sort of “nested replicates” (the triplicates for the condition nested within the tissue donor replicate) in a model design for differential expression analysis using DESeq2.
My question is, what is the best way to go about model design with biological replicates nested within donor replicates while including an interaction term to see if the Condition effect is different based on Tissue Type?
I’ve had a few thoughts come to mind:
1) creating a grouping variable similar to what is described in the DESeq2 vignette here .
First I would create a “group” variable which just concatenates all the other variables:
design = ~group
This method allows me to perform contrasts for each treatment which is useful for within donor comparisons but not exactly what I’m after
2) pooling all of the replicates
design = ~Tissue_Type + Stimulation + Tissue_Type:Stimulation
This method combines triplicate conditions within the duplicate tissue types. One of my qualms about this artificial increase in statistical power I get by
essentially saying I have n=6 whereas I really only have n=2 donors and then these quasi-biological replicates within each donor. This also eliminates any ability for me to control for donor differences.
This method allows be to get around the “model not full rank” error that would occur if I used the following:
design = Tissue_Donor + Tissue_Type*Stimulation
In this scenario, I would create a another group (I’m calling donor.nested) which groups the donors within the given tissue type. Each individual donor is nested within a tissue type but can have the three observations across any of the stimulation conditions.
design = ~ Tissue_Type + Tissue_Type:donor.nested + Tissue_Type:Stimulation
This method allows me to control for the individual effects. Then I can look at Tissue specific Treatment effects (i.e. Control vs Treated in Healthy Tissue) or I can test to see if the treatment effect is different across tissues (i.e. Treated in Healthy vs Diseased Tissue). As I mentioned, I essentially took this straight from the DESeq2 vignette but it’s not exactly the same situation (i.e. I have multiple donors across groups AND these quasi-biological replicates)
Some other thoughts that have come to me:
- Randomly sample a single replicate from the triplicates within each condition and run the same model as in Option 2 above. I could perform this however many times I want and see which DEGs show up at each iteration.
- Collapse the Replicates using the collapseReplicates function in DESeq2. As I understand it, this really should only be done for true sequencing technical replicates.
So, after all that, I’m curious of peoples opinions on how to approach this. I understand that there is likely not one good answer here but I’m open to suggestions!
Thanks to all in advance!
Read more here: Source link