I have two RNA-seq data generated from Illumina Novaseq (same experimental design but different depth, 25M and 15M reads/sample for Run1 and Run2 respectively).
The dateset look like this:
Samples Condition Run Sample_1 A R1 Sample_2 B R1 Sample_3 A R1 Sample_4 B R1 Sample_5 A R1 Sample_6 B R1 Sample_7 A R2 Sample_8 B R2
I want to do DE analysis using DESeq2. Since I have to analyze these samples together, I set a $run factor in my colData, and I try to use collapseReplicates() function to “collapse my technical replicates”.
dds<-DESeqDataSetFromMatrix(count,coldata,design=~Condition) ddsColl <- collapseReplicates(dds, dds$condtion dds$run)
However, after merging these two dataset (R1&R2) by geneID, there are NAs in my count matrix due to different sequencing. For example, sample_5 & sample_6 are from run1 and sample_7 & sample_8 are from run 2:
gene.id sample_5 Sample_6 Sample_7 Sample_8 gene_1 2 6 2 0 gene_2 3 0 0 0 gene_3 2 3 NA NA gene_4 NA NA 1 2
My question is: What should I do with these NAs? Is it appropriate to convert NAs into 0 (considering I will perform collapseReplicates() function)?
Please correct me if I am wrong in any point.
Many thanks, Nicole
Read more here: Source link