The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Istvan Albert,
and was edited by lakhujanivijay,
Istvan Albert,
On the optimistic performance evaluation of newly introduced bioinformatic methods | SpringerLink (link.springer.com)
So you read about a new, cool method that “improves” on previous methods. But are they really an improvement?
Most research articles presenting new data analysis methods claim that “the new method performs better than existing methods,” but the veracity of such statements is questionable. Our manuscript discusses and illustrates consequences of the optimistic bias occurring during the evaluation of novel data analysis methods, that is, all biases resulting from, for example, selection of datasets or competing methods, better ability to fix bugs in a preferred method, and selective reporting of method variants.
submitted by: Istvan Albert
CLIMB-BIG-DATA | Cloud Infrastructure for Microbial Bioinformatics (www.climb.ac.uk)
Antimicrobial resistance is a critical universal issue and scientists need reliable, fast, reproducible tools for their research. The aim of this hackathon is to improve upon/build/extend bioinformatics tools and methods for the AMR community. This year’s hackathon has a special focus on antimicrobial resistance in bacteria.
submitted by: Istvan Albert
An FM-index of 400k SARS-CoV-2 genomes (lh3.github.io)
Leonardo Martins tweeted that xz can compress a 1.4 million SARS-CoV-2 genomes in a 39GB FASTA down to 74MB. That is a very impressive compression ratio! This reminds me of my earlier work on FM-index construction.
For an experiment, I downloaded ~400k SARS-CoV-2 genomes from EBI’s COVID-19 data portal (GISAID has ~1.5M genomes but imposes restrictions) and generated an FM-index of these sequences in both strands with ropebwt2
submitted by: Istvan Albert
from 5 GB to 74MB is the difference between the gzip and xz (lzma) formats for storing the 1.4m SARS-CoV2 genomes. This file would have taken 39GB uncompressed. pic.twitter.com/iDxhVSBrg7
— Leonardo Martins (@leomrtns) May 14, 2021
from 5 GB to 74MB is the difference between the gzip and xz (lzma) formats for storing the 1.4m SARS-CoV2 genomes. This file would have taken 39GB uncompressed. pic.twitter.com/iDxhVSBrg7
— Leonardo Martins (@leomrtns) May 14, 2021
A 39GB file containing SARS-COV-2 genomes can be compressed to just 74MB when using the xz program.
submitted by: Istvan Albert
www.youtube.com/watch?v=6skphXuBbd4&list=PLdl4u5ZRDMQRA_Fvfg9Bour_x56irZiA2 (www.youtube.com)
Watch the BioC 2021 talks online, 58 videos in total.
submitted by: Istvan Albert
GitHub – GoekeLab/xpore: Identification of differential RNA modifications from nanopore direct RNA sequencing (github.com)
xPore is a Python package for identification and quantification of differential RNA modifications from direct RNA sequencing.
submitted by: Istvan Albert
GitHub – marbl/merqury: k-mer based assembly evaluation (github.com)
Here we present Merqury, a novel tool for reference-free assembly evaluation based on efficient k-mer set operations. By comparing k-mers in a de novo assembly to those found in unassembled high-accuracy reads, Merqury estimates base-level accuracy and completeness.
submitted by: Istvan Albert
Want to get the Biostar Herald in your email? Who wouldn’t? Sign up righ’ere: toggle subscription
Read more here: Source link