I have some RNA-seq of mice (around 200GB) and I want to perform a RNA-seq analysis (including QC, mapping, quantification, differential expression analysis). But I don’t know how to choose a server. Could anyone can tell me to process such a dataset, how much CPU space, GPU, thread and memory space should be appropriate. (time-consuming and expenditure factors).
To give you an actual idea of configuration you will minimally need to do this in cloud.
A. If you choose to use mapping-based method like salmon (or kallisto) which uses transcriptome sequence
For salmon (probably similar for kalisto) There are two ways to do the alignment. One to just transcriptome. For that will need ~4 GB of RAM for each sample. It is generally recommended that you include genome-decoys so that bumps the memory requirement up to ~20G RAM for human/mouse genomes. This is for 1 sample. If you want to run multiple samples in parallel then you will need to multiple this requirement by number of samples you want to run in parallel.
B. Using an aligner like STAR or bbmap with genome sequence.
You will need about 40G of RAM to do create genome indexes/do alignments. This requirement is for one sample. (note: subread aligner can work in ~8 G of RAM but that may be the only splice-aware aligner that can).
For either method you will need cloud disk storage. 200G will be taken up by your data plus some space for programs you need. You will need space for temporary files/genome indexes/output results. Figure on having at least one TB available. There are charges to move data into and out of the cloud so keep that in mind. You will want to have at least 8 cores available.
If you have never used cloud before then it will take some time to familiarize yourself with everything, so that will add time/cost. There are calculators on AWS/Google that can allow you to estimate costs but use them as a rough guide.
Note: If you have a reasonably new laptop with 8 (preferably 16G) of RAM you may be able to do the analysis (at your pace) locally. That would save the money for cloud expense as others have noted.