Disappearing CB, the bam tag after samtools sort -t CB

 

I’ve been trying to setup an analysis pipline for RNAvelocity in AWS EC2. I used one of the 10x dataset, 10k Peripheral blood mononuclear cells (PBMCs) from a healthy donor, Single Indexed, as a test model to setup the pipeline. For speed and cost saving, I first used samtools to sort a 10PBMC bam file from 10x by firing a following command;

samtools sort -l 7 -m 2048M -t CB -O BAM -@100 -o /temp/home/cellsorted_PBMC.bam /temp/home/PBMC_10K.bam

and then,

velocyto run -b filtered_feature_bc_matrix/barcodes.tsv -o /temp/home -m GRCh38_rmsk.gtf cellsorted_PBMC.bam.bam refdata-gex-GRCh38-2020-A/genes/genes.gtf

Veoclyte complained that there is no CB tag in the 10K PBMC.bam, when I examined the bam file, I saw absolutely no CB in the sorted bam, as follows,

A00519:643:HCMYWDSXY:4:2172:22525:26287 16  chr1    148893  255 91M *   0   0   ACATGGCAAGATCCCGTCTCTATGATAAAAAATTAGCTGGACATGGTGGCACATGTCTGTAGTCCCAGCTACTTGGGAGACTGAAGTGAGA FFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:6  HI:i:2  AS:i:89 nM:i:0  RG:Z:SC3_v3_NextGem_SI_PBMC_10K:0:1:HCMYWDSXY:4 TX:Z:ENST00000484859,+724,91M   GX:Z:ENSG00000241860    GN:Z:AL627309.5 fx:Z:ENSG00000241860    RE:A:E  MM:i:1  xf:i:17 CR:Z:GCAGCTCTGTGAATAT   CY:Z:FFFFFFFFFFFFFFFF   UR:Z:TCTAAAACCTAC   UY:Z:FFFFFF:FFFFF   UB:Z:TCTAAAACCTAC

The original unsorted bam has CB tag,

A00519:643:HCMYWDSXY:3:2144:3649:12790  16  chr1    498309  1   65M26S  *   0   0   GGCCAAAATATGTAAGCACATTTGCATTTATTAGGCACTTTATTTCCATTATTACACTGTGATATCCCATGTACTCTGCGTTGATACCACT F,,,FF:F,FFFFF:FFFF:FFFFFFFFFFFFF:FFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFF:FFF:FFFFFFFFFFF NH:i:4  HI:i:1  AS:i:61 nM:i:1  ts:i:26 RG:Z:SC3_v3_NextGem_SI_PBMC_10K:0:1:HCMYWDSXY:3 RE:A:I  xf:i:0  CR:Z:TCATTGTAGTATAGAC   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:TCATTGTAGTATAGAC-1 UR:Z:ACTCTAATCTGC   UY:Z:FFFFF:FFFFFF   UB:Z:ACTCTAATCTGC

An interesting thing is when I sorted a smaller, truncated version of PBMC_10K.bam (created by samtools view -h Parent_NGSC3_DI_PBMC_possorted_genome_bam.bam|head -n 10000 | samtools view -bS > test.bam) by the exact same samtool command, I saw the CB tag preserved in the sorted bam.

Does anybody have any idea as to why sorting the entire PBMC_10K.bam based on the CB deletes the CB tag in the sorted bam while the CB tags are spared in sorted the smaller version of the same bam. I’d appreciate any pointers at this point. Thanks.

using
samtools –version
samtools 1.11
Using htslib 1.11
Copyright (C) 2020 Genome Research Ltd

Read more here: Source link