CROP-seq data analysis
I am a new bie to single cell sequencing analysis. I have to analyze CROP-seq data, I am going through the following paper, www.nature.com/articles/nmeth.4177. I have to use cell ranger ( instead of DROP-seq software) as the first step to process single cell data.I wanted to know how to assign guide RNA to each single cell,in paper it is not very clear how to go about it. Any help would be appreciated.
• 943 views
I expect your data come from a 10x Genomics machine, right?
This is how I do it
1) create a fasta file with reference sequences that sgRNA reads will map to. The sequences look like this:
U6promoter-sgRNA1-tracrRNA U6promoter-sgRNA2-tracrRNA U6promoter-sgRNA3-tracrRNA ...
where U6protomer and tracrRNA are static sequences that surround each of your sgRNAs.
2) I use this file to map R2 reads to using BWA (i.e. in a non-spliced manner). This will create a BAM output. Then I filter this BAM to get uniquely mapped reads only. Now for these reads, I need to get their corresponding R1 reads that carry their corresponding cell barcode. This can be done with a little script using e.g. Picard.
3) Now I have a table of 2 columns:
sgRNA identity and
cell barcode. I can count different sgRNA reads in each cell, decide which cell carry which knockout and add this info manually to my Seurat object with gene expression.
A somewhat easier way to do this is to add those sgRNA+surrounding sequences directly into your STAR reference and let CellRanger think these are genes on an extra chromosome. This way, you’ll get the different sgRNAs directly as “genes” in the CellRanger output matrix. It is a tiny little bit less precise approach due to possible improper spliced mapping, but it should be fine. As far as I remember, this is pretty much how they did it in the original Datlinger paper.
If you want something MUCH faster than CellRanger, use STAR-solo.
To my experience, it is better to assign guide read calls to cells on the basis of read counts rather than on the basis of UMI counts. This might be different in your datasets though. In any case, expect noise (i.e. reads coming from multiple different sgRNA sequences in a single cell). You’ll have to deal with this somehow, either with some artificial read count threshold or with some statistical test.
Another question is what to do afterwards. There is no generally accepted framework for CROP-seq data processing and it heavily depends on your application. Those few tools that exist (e.g. MUSIC, scMAGeCK) don’t seem to work well for me, but you can give them a try. You can check e.g. the papers from Jonathan S. Weissman group (around Perturb-seq) for some inspiration on what you can do next with your data.
Traffic: 1606 users visited in the last hour