deseq2 machine sizing best practices for very large data set

deseq2 machine sizing best practices for very large data set

0

@aa611017

Last seen 8 hours ago

United States

I want to perform differential expression analysis on a data set containing 17,000 samples. The salmon quant.sf files are about 1.5 Tb.

based on my naive understanding of R and R packages I believe I will need to run on a single very large machine, that is to say, I can not take advantage of a cluster of machines.

I read the section in the vignette on ‘ Using parallelization’.

Is there a rule of thumb for machine sizing?

I plan to run my analysis in either AWS or GCP so I should be able to access a very large machine.

Can you recommend docker image?

Any suggestions for how much SDD, memory, swap, cpu, … I should use and what the run time is likely to be?

Should I consider porting a pair bones version to something like apache spark so I can throw a lot of machines at the problem?

Kind regards

Andy


l


DESeq2

• 34 views

Read more here: Source link