Mean and SD read length from a range of fastq files

Question: Mean and SD read length from a range of fastq files

2

Hi all,

I’m trying to write some code to generate mean read length data from a range of fastq files.
awk ‘{if(NR%4==2) print NR”t”$0″t”length($0)}’ HG1.fastq > readLength.txt

i’ve got as far as here from looking through other posts and trying to improve but i’m stuck on a couple of things. This command only works on a single file and will report the length of each read within that file separately.

I want to run a single command so the mean and Standard Dev of read lengths from all .fastq files within a folder are reported in a single .txt file, one sample per line. I gues SD might be difficult to calculate in a command so even just the mean read length.

e.g.the first 5 files in my folder are:
ru1.fastq
ru2.fastq
hg3.fastq
hg25.fastq
ru7.fastq

obviously i’m a bit of a novice at this so all help would be appreciated !!

thanks a lot


fastq


awk


sed


sequencing

• 6.0k views

updated 2 hours ago by

0

written 4.5 years ago by

▴

30

Read more here: Source link