Perl tutorial
Perl is a quick, easy scripting language that can be used to perform
otherwise complicated tasks in very simple ways. Some of its main
strengths are file manipulation (i.e. reading files, manipulating
columns, etc.) and searching for patterns within a file. It's
much better at math than C-shell scripting.
Creating and Running a Perl Script
To use Perl, open a text editor and at the top type:
#!/usr/bin/perl -w
The #!/usr/bin/perl tells Perl
where the Perl interpreter is located (so the code can be run as Perl
code), and the -w is an optional flag that tells Perl to give you
warnings and error messages.
Once you have your program written in the file, save it as something
like myprogram.pl and on the
Unix command line, type chmod +x
myprogram.pl (chmod means "change mode" and the +x says make it
executable). To run the program, then, just type myprogram.pl.
Some of the key things to learn:
- variables: $scalar, @array, $array[$i] (<-- this last one is
an element of an array, which is itself a scalar, so it gets a $ out
front.)
- reading from the command line (whether arguments are entered while
running the program or prompted for later)
- if/else syntax - pretty standard
- filehandles, reading in files line by line with a while statement,
and
the default variable $_ where the lines are stored
- pattern matching and grabbing parts of the patterns (i.e. a
particular column, or something that starts with 'astr' or whatever)
into variables
- search and replace - basically just pattern matching with an extra
piece that tells it to replace what it's found with something else
- subroutines - ask me about later if you like
- system command - interface with Unix anytime you want!
The Ultimate Example! Contains
(just
about) all of the above!
#!/usr/bin/perl -w
# readwrite.pl
#
# Description: This program reads in
$numfiles number of files called in1.txt,
# in2.txt, etc., line by line and
writes them out again into files out1.txt,
# out2.txt, etc. in new directories
called dir1, dir2, etc. It switches two of
# the columns of the original file
when writing out the new file.
#
# Instructions: The files to be read
in should be called in#.txt, where # is a
# number beginning at 1, and the new
files will be called out#.txt.
#
# cmcgleam, 9/28/05
# Run the subroutine main - which,
here, is really just the entire program.
&main;
sub main
{
# Declare variables. "My" makes them specific
to the enclosing block.
my($numlines, $numfiles);
# The user can enter the number of files on the
command line when they run
# the program, or else they will be prompted to
enter it.
# $#ARGV is the index of the last element in the
array @ARGV that's
# automatically created when the user enters
arguments when running the
# program. The array begins at 0, so $#ARGV +
1 is the number of
# arguments. For example, here, the user would
type "readwrite.pl 3" if
# there are three files to read in and write out.
if ($#ARGV +1 != 1)
{
print "Enter the number of
files you wish to read in: \n";
$numfiles = <STDIN>;
}
else
{
$numfiles = $ARGV[0];
}
# Read in the files line by line, snatch the
line entries into variables,
# and print them out into new files, switching
around columns 2 and 3.
for ($i=1; $i<=$numfiles; $i++)
{
# We'll count the
number of lines in the file. Initialize this to 0.
$numlines = 0;
# Tell the program where
your files are, and set what's called a
# filehandle that you can
use for referencing the files in the future.
# These can be called
anything; here I call them FILE and OUTFILE.
# Note the use of
Unix commands for file input and output, and also
# the system command,
which can be used in general whenever you
# want to execute a
Unix command.
open(FILE, "more in$i.txt |");
system("mkdir dir$i");
open(OUTFILE, ">
dir$i/out$i.txt");
# The while <(FILE)>
statement tells Perl to read the file line by
# line, "while" it
exists. It does the stuff in { } to each line
# before moving on to the
next.
while (<FILE>)
{
# PATTERN MATCHING
# =~ means find the specified pattern in $_, the
default variable
# which currently refers to the line you're on in
the file.
# / is put at the beginning and end of your complete
search pattern
# ^ means at the beginning of the line
# \ separates different things you're looking for
# \s matches whitespace (spaces, tabs)
# \S matches non-whitespace
# * means zero or more times
# + means one or more times
# . matches anything (at the end, I have .* which
means any
# character, zero or more times)
# parentheses ( ) put the thing that's matched into
a variable.
# these automatically number themselves,
$1, $2, $3... etc.
# Find at least 4 columns and save the first 4 into
$1-$4.
$_ =~
/^\s*(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\.*/;
# Print to the outfile, switching two columns.
print
OUTFILE "$1 $3 $2 $4 \n";
$numlines ++;
}
# This will print to the
command line. You can insert variable names
# into a print statement and Perl
will replace them with the actual
# variable values. The \n
means put a "newline" after the sentence.
print "There are $numlines lines in
file in$i.txt. \n";
# Close the input and
output files.
close(FILE);
close(OUTFILE);
}
}
The same code, without all the
comments:
#!/usr/bin/perl -w
# readwrite2.pl
#
# Description: This program reads in $numfiles number of files called
in1.txt,
# in2.txt, etc., line by line and writes them out again into files
out1.txt,
# out2.txt, etc. in new directories called dir1, dir2, etc. It
switches two of
# the columns of the original file when writing out the new file.
#
#
# Instructions: The files to be read in should be called in#.txt, where
# is a
# number beginning at 1, and the new files will be called out#.txt.
#
# cmcgleam, 9/28/05
&main;
sub main
{
my($numlines,
$numfiles);
if ($#ARGV +1 != 1)
{
print "Enter the number of files you wish to read in: \n";
$numfiles = <STDIN>;
}
else
{
$numfiles = $ARGV[0];
}
for ($i=1;
$i<=$numfiles; $i++)
{
$numlines = 0;
open(FILE, "more in$i.txt |");
system("mkdir dir$i");
open(OUTFILE, "> dir$i/out$i.txt");
while (<FILE>)
{
$_ =~
/^\s*(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\.*/;
print OUTFILE "$1 $3 $2 $4 \n";
$numlines ++;
}
print "There are $numlines lines in file in$i.txt. \n";
close(FILE);
close(OUTFILE);
}
}