MultiQC not working correctly – dataset-collection

@stealsh

I’ve seen this before, too, for about the last 5-4 months when collections changed a bit in the 21.09 release. These two tools don’t work in a series the way they used to.

Details: The problem comes from the way the data is organized and where the sample names are derived from when in a nested collection. They are named the same in the top level of the nested structure – – one “forward” and one “reverse”. The actual sample names are one level deeper.

I couldn’t figure out how to solve it before, gave up, and no one else reported the issue. Will create the test again and ask others to review it. There is probably a solution, and it would probably involve organizing the collection differently. “Flatten collection” was one option I reviewed but that didn’t produce the MultiQC output properly either (forward and reverse from the same sample had the same “identifier” that MultiQC was interpreting instead, so again there was data loss from common naming). “Rename collections” was problematic, too, but I forget why.

If there isn’t a good workaround, will open up a ticket. For either case, expect another reply tomorrow with an update. The FastQC tool itself might need a change – or maybe MultiQC (although that tool is tricker to change).

Meanwhile, one of these might work, and probably only the latter:

  1. Expand the collection and drag and drop the datasets from inside to the MultiQC tool input. This involves a LOT of clicking.
  2. Or – unhide the datasets in your history, then multi-select those for the input. I think this worked only when all forward were combined, then all reverse, but not together. Warning that this will make a lot of clutter in the history. Maybe copy just the FastQC output into a new different history and try it there, so any tests are easier to get rid of.

Thanks for reporting this! And @gbbio if you can think of a way to do this, feel free to add more to our replies. It is easily replicated: put any two pairs in a collection then run FastQC > MultiQC. MultiQC is only able to report back one pair, not both, no matter how the collection is arranged. I guess one option is to create some new collections just for input to MultiQC but that doesn’t combine by sample ID. Maybe I missed something obvious that fresh eyes will find :slight_smile:

Read more here: Source link