# Script To Calculate Dinucleotide Frequency For Many Sequences

Hi, everyone,

I need to calculate dinucleotide frequency for many sequences. As i know, the R command `count(seq, 2, freq = TRUE)` can calculate only one sequence. I am new in perl. Does anyone has reliable script to complete this task? I downloaded one script (shown below), but when running it, obviously there is error message:

``````Use of uninitialized value \$G in array element at dinucleotide.pl line 31.
Use of uninitialized value \$G in multiplication (*) at dinucleotide.pl line 32.
..................
``````

Thank you very much!

``````#!/usr/bin/perl -w

##
## Display all dinucleotide frequencies along with their expected
## values under the independence hypothesis.
##

##  Hashes for translating between bases and numbers
%H = ("A", 0, "T", 1, "G", 2, "C", 3);
%HI = (0, "A", 1, "T", 2, "G", 3, "C");

## The data file comes from the command line.
\$file = shift @ARGV;

open(F, "\$file") or die "Unable to open \$filen";
while (<F>)
{
chomp;
\$K = length(\$_);
foreach \$j (0..\$K-1) { push(@G, \$H{substr(\$_, \$j, 1)}); }
}
close(F);

## The number of genes.
\$L = scalar(@G);

## Get the marginal and joint frequencies.
foreach \$i (0..\$L-2)
{
++\$M[\$G[\$i]];
++\$J[4 * \$G[\$i] + \$G[\$i + 1]];
}

## Normalize.
foreach \$j (0..3) { \$M[\$j] /= (\$L - 1); }
foreach \$j (0..15) { \$J[\$j] /= (\$L - 1); }

## Display the marginals.
foreach \$j (0..3) { print sprintf("%s: %fn", \$HI{\$j}, \$M[\$j]); }
print "n";

## Display the dinucleotide frequencies.
foreach \$j (0..3)
{
foreach \$k (0..3)
{
\$P = \$M[\$j] * \$M[\$k];
\$Q = 4 * \$j + \$k;
print sprintf("%s%s: %ft%fn", \$HI{\$j}, \$HI{\$k}, \$J[\$Q], \$P);
}
}
``````