why starsolo estimated cell number very different from cellranger.v5

Hello,

My results from cellranger and starsolo are very different.

The estimated cell number of cellranger count with default parameters was 9400 while starsolo estimated 3600. I am confused by the big variation. When I check the results from cellranger, there are two big cell clusters with very high UMI count (>25k) while the rest cell clusters have UMI <5k. Starsolo results have a more evenly distribution UMI across clusters.

I wonder if starsolo uses a doublet filter for cell calling? what the reason for the difference?

Thank you!

Dedails:

starsolo:

STAR –genomeDir starsolo –soloType CB_UMI_Simple –soloCBwhitelist 10x_V3_whitelist.txt –soloUMIlen 12 –readFilesIn ${wd}/2270183_P7_2_S2_L001_R2_001.fastq.gz,${wd}/2270183_P7_2_S2_L002_R2_001.fastq.gz ${wd}/2270183_P7_2_S2_L001_R1_001.fastq.gz,${wd}/2270183_P7_2_S2_L002_R1_001.fastq.gz –runThreadN 20 –outFileNamePrefix s1 –outSAMtype BAM SortedByCoordinate –outReadsUnmapped elp1_s2_Unmapped –twopassMode Basic –chimSegmentMin 20 –readFilesCommand zcat –clipAdapterType CellRanger4 –outFilterScoreMin 30 –soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts –soloUMIfiltering MultiGeneUMI_CR –soloUMIdedup 1MM_CR

Barcodes.stats:
nNoAdapter 0
nNoUMI 0
nNoCB 0
nNinCB 0
nNinUMI 6289
nUMIhomopolymer 286950
nTooMany 0
nNoMatch 21844303
nMismatchesInMultCB 0
nExactMatch 887950279
nMismatchOneWL 5268661
nMismatchToMultWL 19035523
Features.stats:
nUnmapped 89644891
nNoFeature 373085155
nAmbigFeature 13249278
nAmbigFeatureMultimap 11984809
nTooMany 1363653
nNoExactMatch 0
nExactMatch 427453001
nMatch 434911486
nCellBarcodes 2149182
nUMIs 131380805

Summary.csv:
Number of Reads,934392005
Reads With Valid Barcodes,0.974849
Sequencing Saturation,0.697914
Q30 Bases in CB+UMI,0.948347
Q30 Bases in RNA read,0.923572
Reads Mapped to Genome: Unique+Multiple,0.898433
Reads Mapped to Genome: Unique,0.756247
Reads Mapped to Transcriptome: Unique+Multipe Genes,0.479628
Reads Mapped to Transcriptome: Unique Genes,0.465449
Estimated Number of Cells,3618
Reads in Cells Mapped to Unique Genes,239834914
Fraction of Reads in Cells,0.551457
Mean Reads per Cell,66289
Median Reads per Cell,59441
UMIs in Cells,67707559
Mean UMI per Cell,18714
Median UMI per Cell,16493
Mean Genes per Cell,4247
Median Genes per Cell,4175
Total Genes Detected,23201

cellranger:

$cellranger count –id=s1 –transcriptome=refdata-gex-mm10-2020-A –fastqs=2-1649641 –localcores=20 –localmem=300

Estimated Number of Cells | 9449
Mean Reads per Cell | 98887
Median Genes per Cell | 1583934
Number of Reads | 392005
Valid Barcodes | 97.10%
Sequencing Saturation | 69.90%
Q30 Bases in Barcode | 94.90%
Q30 Bases in RNA Read | 92.40%
Q30 Bases in UMI | 94.70%
Reads Mapped to Genome | 90.00%
Reads Mapped Confidently to Genome | 85.30%
Reads Mapped Confidently to Intergenic Regions | 7.00%
Reads Mapped Confidently to Intronic Regions | 26.40%
Reads Mapped Confidently to Exonic Regions | 51.90%
Reads Mapped Confidently to Transcriptome | 48.20%
Reads Mapped Antisense to Gene | 2.70%
Fraction Reads in Cells | 64.80%
Total Genes Detected | 23824
Median UMI Counts per Cell | 3577

Read more here: Source link