Special Variables
$_ is the “default input and pattern matching” variable; the default input is often the current line of a file
@_ is the list of incoming parameters to a subroutine
$. is the current input file
$$ is the current process ID
$^O is the operating system (that’s an “Oh”)
$#_ is the index number of the last parameter
A basic pattern-matching loop
while ($my_var = <MY_FILE_HANDLE>) {
if ($my_var =~ /search_pattern/) {
# Notice that =~
# It’s the “search/match” operator
# Also, use those / / characters for
# MUCH faster operation!
print MY_FILE_HANDLE $my_var;
# We just printed to the file}
# Alternately, just dump the line:
print $my_var;}
If you run this loop but don’t name a loop variable, $_ is already waiting for you:
while (<MY_FILE_HANDLE>) {
/search_pattern/ and print MY_FILE_HANDLE ;
print ;
}
@_ # The array of incoming parameters supplied to a subroutine.
@_
# The whole array@_[0]
# The first element of the array@_[1]
# The second element@#_
# This one’s odd: it’s the index of the last element (which is not quite the same as the count, because this is a zero-based array).sub call_me {
print “Element zero is ” . @_[0] . “\n”;
print “There were ” , $#_+1 , “\n”;
}
use English;
This is a pragma.
Allows addressing @_ as @ARG
Allows addressing $$ as $PID
Regular Expressions
This is why we learned about =~.
$my_string = “I’m hard at work.\n”;
if ($my_string =~ /work/) {
print “He’s working.\n”;
}Metacharacters
\n
\t
\d
# matches any single digit\w
# matches any letter, digit or the underscore\s
# matches any space (white space): space, tab, \n, \rCapitalize any of the above to invert its meaning.
^
# Beginning of line or string
/^string/$
# End of line or string
/string$/.
# Generic wildcard character: matches any ONE character
# so /x.z/ matches x1z, xSz, x-z, etc.
/str.ng/*
#Preceding character match: matches the preceding character ZERO OR MORE TIMES
/s*ring/One use of * is with the dot character, when any number of any characters could appear at that position:
/this.*/ Matches “this followed by anything.”+
# Preceding character match: matches the preceding character ONE OR MORE TIMES
/xy+z/?
# Preceding character match: matches the preceding character ZERO OR ONE TIMES
/xy?z/Create groups of optional characters with parentheses:
/Fred(die)?/
Combining expressions
/^http:.+html?/
Character Classes
/[qwerty]/
# matches any of q, w, e, r, t or y/[^qwerty]/
# DOESN’T match any of q, w, e, r, t or y
# Be darn careful where that ^ is.
Flags
/string/i
# case-insensitives/match_string/replacement_string/g
# search; replace; global
# g also tells Perl to return to its last position
# in the string on the next iteration
Subexpressions
if ($_ =~ /heck|darn|dang|fooey/) {
print “This mild cussing is present in this line: $1.\n”;
}The $1 variable holds the string that produced the match. If the match was “heck” then $1 = “heck”. If you have two subexpressions, you’ll have $1 and $2, and so forth:
$singer = “Wendy Wall”;
$singer =~ /(\w+) (\w+)/;
# $1 holds “Wendy” and $2 holds “Wall”
Search and Replace
$sentence = “This is the usual cat and dog example. It mentions two cats.”;
$sentence =~ s/cat/dog/g;
print $sentence;
Resources
The formal Perldoc – http://perldoc.perl.org/perlre.html#Regular-Expressions
Perl Matching With Regular Expressions – a long page with very good detail – http://work.lauralemay.com/samples/perl.html
Troubleshooter.com – with many good examples – http://www.troubleshooters.com/codecorn/littperl/perlreg.htm
Ringofsaturn.com – with more, and detailed, examples – http://networking.ringofsaturn.com/Unix/regex.php
Regular Expression Reference – useful, concise and highly recommended – http://www.regular-expressions.info/reference.html
Example: Apache Log Analysis
A line from an Apache log file looks like this:
132.62.20.9 - - [01/Nov/2000:00:00:19 -0400] "GET /news/home/index.htm HTTP/1.1" 200 2285So let’s hack on this analyzer, called analyze.pl:
#!/usr/bin/perl
# We have to supply the log name as the first command argument
$logfile = $ARGV[0];unless ($logfile) { die “Usage: analyze.pl <httpd log file>”; }
analyze($logfile);
report();sub analyze {
my ($logfile) = @_;open (LOG, “$logfile”) or die “Could not open log $logfile – $!”;
while ($line = <LOG>) {
@fields = split(/\s/, $line);# Make /about/ and /about/index.html the same URL.
$fields[6] =~ s{/$}{/index.html};# Log successful requests by file type. URLs without an extension
# are assumed to be text files.
if ($fields[8] eq ‘200’) {
if ($fields[6] =~ /\.([a-z]+)$/i) {
$type_requests{$1}++;
} else {
$type_requests{‘txt’}++;
}
}# Log the hour of this request
$fields[3] =~ /:(\d{2}):/;
$hour_requests{$1}++;# Log the URL requested
$url_requests{$fields[6]}++;# Log status code
$status_requests{$fields[8]}++;# Log bytes, but only for results where byte count is non-zero
if ($fields[9] ne “-“) {
$bytes += $fields[9];
}
}close LOG;
}sub report {
print “Total bytes requested: “, $bytes, “\n”;print “\n”;
report_section(“URL requests:”, %url_requests);
report_section(“Status code results:”, %status_requests);
report_section(“Requests by hour:”, %hour_requests);
report_section(“Requests by file type:”, %type_requests);
}sub report_section {
my ($header, %type) = @_;print $header, “\n”;
for $i (sort keys %type) {
print $i, “: “, $type{$i}, “\n”;
}print “\n”;
}