PHP I : Regular Expressions

Regular Expressions

Follow this lesson in Ullman Chapter 13. The scripts are located in the 13 directory.

If you aren’t familiar with using them see my Shell Scripting section on Regular Expressions.

Have you used grep? If you have you’re going to recognize this syntax. If you haven’t, see my Linux Fundamentals section on grep.

However, WATCH OUT if you’re a grep user! Some of the characters are different here!

Pattern Matching Using Regular Expressions (regex)

Use it like this:

$pattern = “Flintstones”;
$string = “Flintstones! Meet the Flintstones!”;
ereg ($pattern, $string);

This makes a lot more sense when you think in terms of trying to find which member of an array matches a pattern, or which line of a file matches the pattern.

PHP’s POSIX Extended Functions
Function	Purpose	Syntax
ereg()	Match a pattern in a string	ereg (‘pattern‘, ‘string‘);
eregi()	Same, case-insensitive	eregi (‘pattern‘, ‘string‘);
ereg_replace()	Match and replace a pattern in a string	ereg_replace ( ‘pattern‘, ‘replacement‘, ‘string‘ )
eregi_replace()	Same, case-insensitive	eregi_replace ( ‘pattern‘, ‘replacement‘, ‘string‘ )
split()	Split a string into an array, splitting at pattern, up to an optional limit of times.	split ( ‘pattern‘, ‘string‘ [, limit] )
spliti()	Same, case-insensitive	spliti( ‘pattern‘, ‘string‘ [, limit] )
preg_match()	Similar, using regular expression matching	preg_match (‘pattern‘, ‘string‘);

Literals

Literals match literally themselves:

ereg (‘Flintstones’, ‘Meet the Flintstones’);

will return TRUE.

You can specify lists of literals:

Metacharacters and Quantifiers

.
Matches a single character.

*
Matches zero or more instances of the immediately preceding character. Example: C* if found would match C, CC or CCC … not to mention a blank string!

?
Matches one or more instances of the immediately preceding character. Example: C? if found would match C, CC or CCC …

( )
Group

|
“Or” – (mouse|cat|dog)

^
Represents the beginning of the string, so if you specified ^T grep would search for any string starting with a T.

$
Represents the end of the string, so if you specified \.$ then grep would pull up any string that ended with .

\
The escape character: it means to take the next character literally, so you can search for characters like * that have special meanings: \*

{x}
Exactly x occurrences of the preceding character or expression

{x, y}
Between x and y occurrences of the preceding character or expression

{x,}
At least x occurrences of the preceding character or expression

Character Classes

This is simply a term for grouping options in square brackets. For instance:

[HhJ]ello matches lines containing hello or Hello or Jello.

Use the ^ character before a character or expression to indicate negation:
^a is “NOT a”.

The $ character and the . character are NOT wildcards inside character classes (inside [ ] characters).

Ranges of characters are also permitted:

[0-3] is the same as [0123]
[a-k] is the same as [abcdefghijk]
[A-C] is the same as [ABC]
[A-Ca-k] is the same as[ABCabcdefghijk]
[ \f\r\t\n\v] matches any white space

There are also some alternate forms :

[[:alpha:]] is the same as [a-zA-Z]
[[:upper:]] is the same as [A-Z]
[[:lower:]] is the same as [a-z]
[[:digit:]] is the same as [0-9]
[[:alnum:]] is the same as [0-9a-zA-Z]
[[:space:]] matches any white space

Matching and Replacing Patterns

Do it like this:

$pattern = “Flintstones”;
$replacement = “Jetsons”;
$string = “Flintstones! Meet the Flintstones!”;
eregi_replace ($pattern, $replacement, $string);

To do out of class:

Review Chapter 13 of Ullman.