Analysis of shRNA/CRISPR screens in 2021

I’ve used Mageck for CRISPR screens and it works great.

A few things:

  • It, by default, doesn’t allow mismatches between read and library but
    still I’ve always had good (>= ~80%) mapping rates; I’ve had better
    mapping results with paired-end reads (because if one read fails to
    align because of a mismatch, the second read might succeed)
  • You may need to use cutadapt to remove technical nucleotides in your sequencing (mageck tries to figure this out automatically, but it doesn’t always work especially if you had adapters on both ends of your reads or the adapter varies in length between different reads)
  • When running mageck mle, you can play around with designs (e.g. put cell lines and/or treatment/control in your design formula)

The main downside of mageck is that there are a bunch of options and ways to do the analysis with mageck mle, and it’s difficult to figure out which one is ideal. E.g. Is it better to use permutation p-values or Wald p-values? Should you use the control sgRNAs for normalization? (with control normalization with non-targeting sgRNAs, you may get inflated false positives because non-targeting sgRNAs don’t act the same as sgRNAs targeting non-essential loci; not sure if it’s the same deal with shRNAs)

You just have to run it and see if there’s anything funky (e.g. a super skewed beta score distribution, if too few genes are meeting your FDR threshold [when you expect more], if your positive controls don’t look as expected, etc.).

I recommend using a dedicated tool because it has been been peer-reviewed, validated by multiple labs, etc.; don’t use DESeq2/EdgeR or make up your own workflow (trying to re-invent the wheel, that labs at the top institutes work full time developing, never ends up working well IMHO).

As for comparing Mageck vs. other tools, I’m not sure — I haven’t come across any reliable benchmarking papers that I really like. Different tools will always produce different results and make different assumptions about your data. Mageck is a tool that seems to work well (based on what we currently know) and there are always better ways, in theory, to analyze data but Mageck seems to get the job done. Best thing to do is to extensively validate your screen (if not biological validation, do extensive technical validation: check known essential genes, known non-essential genes, do GO analysis, etc. to see if things are behaving as expected).

Just my thoughts!

Read more here: Source link