Script To Calculate Dinucleotide Frequency For Many Sequences

Hi, everyone,

I need to calculate dinucleotide frequency for many sequences. As i know, the R command count(seq, 2, freq = TRUE) can calculate only one sequence. I am new in perl. Does anyone has reliable script to complete this task? I downloaded one script (shown below), but when running it, obviously there is error message:

Use of uninitialized value $G[0] in array element at dinucleotide.pl line 31.
Use of uninitialized value $G[0] in multiplication (*) at dinucleotide.pl line 32.
..................

Thank you very much!

#!/usr/bin/perl -w

##
## Display all dinucleotide frequencies along with their expected
## values under the independence hypothesis.
##

##  Hashes for translating between bases and numbers
%H = ("A", 0, "T", 1, "G", 2, "C", 3);
%HI = (0, "A", 1, "T", 2, "G", 3, "C");

## The data file comes from the command line.
$file = shift @ARGV;

## Read the data.
open(F, "$file") or die "Unable to open $filen";
while (<F>)
{
    chomp;
    $K = length($_);
    foreach $j (0..$K-1) { push(@G, $H{substr($_, $j, 1)}); }
}
close(F);

## The number of genes.
$L = scalar(@G);

## Get the marginal and joint frequencies.
foreach $i (0..$L-2)
{
    ++$M[$G[$i]];
    ++$J[4 * $G[$i] + $G[$i + 1]];
}

## Normalize.
foreach $j (0..3) { $M[$j] /= ($L - 1); }
foreach $j (0..15) { $J[$j] /= ($L - 1); }

## Display the marginals.
foreach $j (0..3) { print sprintf("%s: %fn", $HI{$j}, $M[$j]); }
print "n";

## Display the dinucleotide frequencies.
foreach $j (0..3)
{
    foreach $k (0..3)
    {
    $P = $M[$j] * $M[$k];
    $Q = 4 * $j + $k;
    print sprintf("%s%s: %ft%fn", $HI{$j}, $HI{$k}, $J[$Q], $P);
    }
}

Read more here: Source link