Regular Expressions and Special Variables

Special Variables

$_ is the “default input and pattern matching” variable; the default input is often the current line of a file

@_ is the list of incoming parameters to a subroutine

See Well House Consultants

$. is the current input file

$$ is the current process ID

$^O is the operating system (that’s an “Oh”)

$#_ is the index number of the last parameter

A basic pattern-matching loop

while ($my_var = <MY_FILE_HANDLE>) {

if ($my_var =~ /search_pattern/) {

# Notice that =~
# It’s the “search/match” operator

# Also, use those / / characters for
# MUCH faster operation!
print MY_FILE_HANDLE $my_var;
# We just printed to the file

}
# Alternately, just dump the line:
print $my_var;

}

If you run this loop but don’t name a loop variable, $_ is already waiting for you:

while (<MY_FILE_HANDLE>) {

/search_pattern/ and print MY_FILE_HANDLE ;

print ;
}

@_ # The array of incoming parameters supplied to a subroutine.

@_
# The whole array

@_[0]
# The first element of the array

@_[1]
# The second element

@#_
# This one’s odd: it’s the index of the last element (which is not quite the same as the count, because this is a zero-based array).

sub call_me {
print “Element zero is ” . @_[0] . “\n”;
print “There were ” , $#_+1 , “\n”;
}

use English;

This is a pragma.

Allows addressing @_ as @ARG

Allows addressing $$ as $PID

 

This is why we learned about =~.

$my_string = “I’m hard at work.\n”;

if ($my_string =~ /work/) {
print “He’s working.\n”;
}

Metacharacters

\n

\t

\d
# matches any single digit

\w
# matches any letter, digit or the underscore

\s
# matches any space (white space): space, tab, \n, \r

Capitalize any of the above to invert its meaning.

^
# Beginning of line or string
/^string/

$
# End of line or string
/string$/

.
# Generic wildcard character: matches any ONE character
# so /x.z/ matches x1z, xSz, x-z, etc.
/str.ng/

*
#Preceding character match: matches the preceding character ZERO OR MORE TIMES
/s*ring/

One use of * is with the dot character, when any number of any characters could appear at that position:
/this.*/
Matches “this followed by anything.”

+
# Preceding character match: matches the preceding character ONE OR MORE TIMES
/xy+z/

?
# Preceding character match: matches the preceding character ZERO OR ONE TIMES
/xy?z/

Create groups of optional characters with parentheses:

/Fred(die)?/

 

Combining expressions

/^http:.+html?/

 

Character Classes

/[qwerty]/
# matches any of q, w, e, r, t or y

/[^qwerty]/
# DOESN’T match any of q, w, e, r, t or y
# Be darn careful where that ^ is.

 

Flags

/string/i
# case-insensitive

s/match_string/replacement_string/g
# search; replace; global
# g also tells Perl to return to its last position
# in the string on the next iteration

 

Subexpressions

if ($_ =~ /heck|darn|dang|fooey/) {

print “This mild cussing is present in this line: $1.\n”;
}

The $1 variable holds the string that produced the match. If the match was “heck” then $1 = “heck”. If you have two subexpressions, you’ll have $1 and $2, and so forth:

$singer = “Wendy Wall”;
$singer =~ /(\w+) (\w+)/;
# $1 holds “Wendy” and $2 holds “Wall”

 

Search and Replace

$sentence = “This is the usual cat and dog example. It mentions two cats.”;
$sentence =~ s/cat/dog/g;
print $sentence;

 

Resources

The formal Perldoc – http://perldoc.perl.org/perlre.html#Regular-Expressions

Perl Matching With Regular Expressions – a long page with very good detail – http://work.lauralemay.com/samples/perl.html

Troubleshooter.com – with many good examples – http://www.troubleshooters.com/codecorn/littperl/perlreg.htm

Ringofsaturn.com – with more, and detailed, examples – http://networking.ringofsaturn.com/Unix/regex.php

Regular Expression Reference – useful, concise and highly recommended – http://www.regular-expressions.info/reference.html

 

A line from an Apache log file looks like this:

 132.62.20.9 - - [01/Nov/2000:00:00:19 -0400] "GET /news/home/index.htm HTTP/1.1" 200 2285

So let’s hack on this analyzer, called analyze.pl:

#!/usr/bin/perl

# We have to supply the log name as the first command argument
$logfile = $ARGV[0];

unless ($logfile) { die “Usage: analyze.pl <httpd log file>”; }

analyze($logfile);
report();

sub analyze {
my ($logfile) = @_;

open (LOG, “$logfile”) or die “Could not open log $logfile – $!”;

while ($line = <LOG>) {
@fields = split(/\s/, $line);

# Make /about/ and /about/index.html the same URL.
$fields[6] =~ s{/$}{/index.html};

# Log successful requests by file type. URLs without an extension
# are assumed to be text files.
if ($fields[8] eq ‘200’) {
if ($fields[6] =~ /\.([a-z]+)$/i) {
$type_requests{$1}++;
} else {
$type_requests{‘txt’}++;
}
}

# Log the hour of this request
$fields[3] =~ /:(\d{2}):/;
$hour_requests{$1}++;

# Log the URL requested
$url_requests{$fields[6]}++;

# Log status code
$status_requests{$fields[8]}++;

# Log bytes, but only for results where byte count is non-zero
if ($fields[9] ne “-“) {
$bytes += $fields[9];
}
}

close LOG;
}

sub report {
print “Total bytes requested: “, $bytes, “\n”;

print “\n”;

report_section(“URL requests:”, %url_requests);
report_section(“Status code results:”, %status_requests);
report_section(“Requests by hour:”, %hour_requests);
report_section(“Requests by file type:”, %type_requests);
}

sub report_section {
my ($header, %type) = @_;

print $header, “\n”;
for $i (sort keys %type) {
print $i, “: “, $type{$i}, “\n”;
}

print “\n”;
}

File Opening, Input, Output and Sorting

File Handles

open (FILE_HANDLE, “file.name”);

Open for reading:

open (SONGS, “song_list.txt”);

open (SONGS, “song_list.txt”) or die “Can’t open that file! Error is $!”;

 

Retrieve lines using < > :

open (SONGS, “song_list.txt”) or die “Can’t open that file!”;
for $line (<SONGS>) {
print $line;
}

 

Open for overwriting:

open (SONGS, “>song_list.txt”) or die “Can’t open that file!”;
# Be aware that any original contents are now gone!

 

Open for appending:

open (SONGS, “>>song_list.txt”) or die “Can’t open that file!”;
# The original contents are preserved, just appended onto.

 

Writing to the file:

open (SONGS, “>song_list.txt”) or die “Can’t open that file!”;
print SONGS “Stairway to Heaven\n”;
# Everything else is gone

 

Appending to the file:

open (SONGS, “>>song_list.txt”) or die “Can’t open that file!”;
print SONGS “Copacabana\n”;

 

File Operations

Steve Litt’s PERLs of Wisdom: PERL File Input, Output and Sorting

(also see the root of this “tips” area, Steve Litt’s Perls of Wisdom)

Formatting and Printing: sprintf()

Perl File Handling: open, read, write and close files at Perlfect Solutions

-note the use of the returned error code (which isn’t always an error), $!

-and the “current line” special variable $_

-and the operator <FILE>

Perl tutorial at tizag.com

The Kinder, Gentler File Opening Tutorial at Perldoc.perl.org

File Handling at DevelopingWebs.net

Perl String Comparisons and String Replacements

Strings and String Comparisons

Be careful which operators you use! Are you assigning, or testing?

$answer = "no";
if ($answer == "yes")
{
print "Answer is Yes.\n";
}

Exercise: do test operatiors make a difference?
Test this code:

$i = 11;
if ($i == “11”) { print “Equivalent numbers.\n”; }
if ($i eq “11”) { print “Equivalent strings.\n”; }

Getting Parts of Strings: the substr() function

substr (the_string, starting_position, number_of_chars_to_get)

$x = “Medical experiments for the lot of you!”;
print substr($x, 0, 7); # “Medical”

Exercises:

Copy the two lines of code above into your working Perl file. Make sure they work.

What happens if you omit the number_of_chars_to_get? Try it, using the string above.

Select the word “experiments” using substr() and copy it into a variable.

Select a substring using negative numbers to count from the right instead of the left:

$x = “Medical experiments for the lot of you!”;
print substr($x, -11, 3); # “lot”

Exercise:

Use substr() to print only the words “for the lot of you!”

Select a substring and assign a new value to it:

$x = “Program in Bash!\n”;
substr($x, 11, 4) = “Perl”; # $a is now “Program in Perl!”;
print $x;

Exercise:

Use the example above to assign the string “I’m in school to study biology” to $x, then select the word “biology” and replace it with

Splitting strings apart:

split (delimiter, the_string, [optional_max_number_of_results])

$s = “Welcome Back Kotter”;
@s = split(/ /, $s); # “Welcome” “Back” and “Kotter”

Note the use of a regular expression as the first argument.

Resources

See perldoc for more string functions: http://perldoc.perl.org/functions/substr.html

Formatting strings: http://www.tizag.com/perlT/perlstrings.php

Perl “if” Testing

if ($user eq "fred") 
{
print "Welcome in, Fred.\n";
}
elsif ($user eq "joe" || $user eq "jill")
{
print "Get out, $user\!\n";
}
else
{
print "No Admittance.\n";
}

unless ($age > 21 && $smile == "big") 
{
print "You're too young for me.\n";
}
else
{
print "How about a date?\n";
}

Note the “and” ( && ) and “or” ( || ) operators.

Resources

http://www.tizag.com/perlT/perlif.php

Perl Debugging, Warnings and Diagnostics

Debugging

Using print( ) functions in “if” blocks

Testing small pieces of code

Warnings

Either:

use warnings; # this is a "pragma," and is placed at the top of a script, below the shebang

or:

#!/usr/bin/perl -w # use the -w flag directly in the shebang

or:

perl -w myscript.pl

Diagnostics: Full Documentation

Either:

use diagnostics;

or:

perl -w myscript.pl

Resources

http://perldoc.perl.org/warnings.html

http://perldoc.perl.org/diagnostics.html

Perl “for” Loops

for $i (1, 2, 3, 4, 5, 6) {
  print "$i\n";
}

@integers = (1 .. 10);
$limit = 25;
for $i (@integers, 15, 21, 23 .. $limit) {
print “$i\n”;
}

%months_days = (Jan => 31, Feb => 28, Mar => 31, Apr => 30, May => 31, Jun => 30, Jul => 31, Aug => 31, Sep => 30, Oct => 31, Nov => 30, Dec => 31)

# You could use this instead:
@months = keys %months_days;

for $i (keys %months_days) {
print “$i has %months_days{$i} days.\n”;
}

These are great for looping through an array:

foreach (@myarray) {
  print;
}

#!/usr/bin/perl

# Calculate compound interest

# Prompt user for inputs
print “Enter the starting amount: “;
$start_amount = <STDIN>;

print “Enter the starting year: “;
$year = <STDIN>;

print “How many years? “;
$duration = <STDIN>;

print “Enter the annual percentage rate: “;
$apr = <STDIN>;

# Do some nice formatting: provide column heads:
print “Year”, “\t”, “Savings Balance”, “\t”, “Interest”, “\t”, “New balance”, “\n”;

# Calculate interest for each year.
# Note where this loop begins and ends:
for $i (1 .. $duration) {
print $year, “\t”;

$year++;

print $start_amount, “\t”;

# First try this one
# $interest = ($apr / 100) * $nest_egg;

# Then try it this way
$interest = int (($apr / 100) * $start_amount * 100) / 100;
print $interest, “\t”;

$start_amount += $interest;

print $start_amount, “\n”;
} # Loop ends here.

print $year, “\t”, $start_amount, “\n”;

Resources

http://www.tizag.com/perlT/perlfor.php

See similar examples at http://www.perl.com/pub/2000/10/begperl1.html

and http://ezinearticles.com/?Create-a-Compound-Interest-Calculator-in-Perl&id=2079734.