grep – Glenn Norman

One of the most useful commands you will use is grep. Its syntax is:

grep [-b] [-c] [-E] [-F] [-h] [-i] [-l] [-n] [-s] [-v] [-w] search_text source_text

Think of this command as “get report.” The key is, report of what?

grep options

grep [-b] [-c] [-E] [-F] [-h] [-i] [-l] [-n] [-s] [-v] [-w] search_text source_text

-b	Precede each line by the block number on which it was found. This can be useful in locating a block
-c	Print only a count of the lines that contain the pattern.
-e	“Extended” grep, using regular expressions. Same as the egrep command.
-f	“No regular expression” grep: returns results much more quickly because it does NOT interpret regular expressions.
-h	Prevents the name of the file containing the matching line from being appended to that line. Used when searching multiple files.
-i	Ignores case.
-l	Displays the files that have the text, not the text within the file.
-n	Precede each line by its line number in the file (first line is 1).
-s	Suppress error messages about nonexistent or unreadable files.
-v	Specifies you are looking for files that don’t contain the text.
-w	Search for the expression as a word as if surrounded by \< and \>.
string_to_match	The text that you are searching / not searching for.
file_name	The file or files (or other input) to search.

Search patterns

grep uses regular expressions to perform character matching or string matching. The characters include:

.
Matches a single character.

*
Matches zero or more instances of the immediately preceding character. Example: C* if found would match C, CC or CCC … not to mention a blank string!

[]
Matches any character contained within the brackets.

^
Represents the beginning of the line, so if you specified ^T grep would search for any line starting with a T.

$
Represents the end of the line, so if you specified \.$ then grep would pull up any lines that ended with .

\
The escape character: it means to take the next character literally, so you can search for characters like * that have special meanings: \*

(See my Shell Scripting course page on Operators.)

grep: a caution about special characters

Note: Be careful using the characters $, *, [, ^, |, (, ), and \ in the pattern list because they are also meaningful to the shell. To be sure of your result, enclose the entire pattern list in single quotes (also called “literal quotes” or “hard quotes”): ‘… ‘.

grep ‘2 for $3’ specials.txt

Input and output

grep will begin to make sense if you’re really clear about what goes in and what comes out of this command by default.

The basic syntax of grep is:

grep string_to_match input

grep searches the file, and the output is a printout of lines that contain matches for string_to_match.

About text files

What’s the most relevant difference between text files and binary files?

Text files are composed of lines.

In other words, at the end of each line there is a carriage return and a line feed (cr/lf). (Yes, there are other differences.)

Here’s the critical point:
Commands that process files will process them one line at a time.

End it right

Do you know that Unix text-file line endings are different from Windows line endings?

This is one really big reason why you shouldn’t use Word as your code editor.

Try gedit, kedit or nedit in Linux, pico on an AIX server, or a Unix-compatible text editor like Arachnophilia in Windows.

You should also see if your system includes the dos2unix utility.

What’s the text source?

You could simply name a file:

grep chocolate myfile.txt

which will return any lines in the file myfile.txt that contain the string “chocolate.” What you get back is a printout, to the terminal screen, of the matching lines.

What can you do with the result?

You don’t have to just let the matching lines print to the screen, though.

Use the pipe character | to redirect the results to another command, or use the redirect character > to write the results to a file. Use your imagination to see how useful this could be:

grep Denmark Hamlet.txt>term_paper_quotations.txt

Using standard input

grep will also accept standard input instead of a file, provided you don’t supply a file. Remember, though, that you will have to supply a standard input:

cat myfile.txt | grep chocolate

I like chocolate
chocolate cookies

Avoid ambiguities: Use the dash to indicate standard input

Leaving the filename argument blank, as in the previous example, will work just fine a lot of the time.

It won’t work well at all if I’m trying to search the output of a cat command for the word Hamlet, for instance, and there happens to be a file named Hamlet. It’s too easy to become unclear whether Hamlet is the filename argument or the search string.

That’s why we use a dash to indicate standard input, just for clarity:

cat myfile.txt | grep chocolate –

I like chocolate
chocolate cookies

Why we love standard input

When you think about it, the fact that grep accepts standard input is pretty wonderful. How many ways can we generate standard input?

By reading a file:

cat myfile.txt | grep chocolate –

By listing files:

ls –la | grep txt –

By listing processes:

ps aux | grep tty –

By listing users:

who | grep studenth –

Matching strings

The fact that we can feed grep input from a variety of sources isn’t the only great thing about grep.

What’s even more wonderful is that the matching string doesn’t have to be just a simple string, like “chocolate.”

The escape character: \

Now, consider this: $ is a special character within regular expressions; it means “end of line.”

So how do you search for lines containing the dollar sign without grep thinking you’re looking for a line ending?

You have to “escape” the special meaning of $:

\$

The . wildcard

Try this test: first, create two files:

touch hello.gif hello1gif

Now do a long listing of your directory and pipe it to grep to look for hello.gif.

Go ahead, actually do it before reading on.

What command did you try ?

To search for a line containing the text hello.gif, the correct command is

ls | grep ‘hello\.gif’ –

since

ls | grep ‘hello.gif’ –

will match lines containing hello-gif , hello1gif , helloagif , etc.

Now try:

ls | grep hello\.gif

What results do you get? Why?

Matching pattern lists

How do you look for a list of possible matches, not just one?

grep –e “support\|help\|windows” myfile.txt

Would search for “support,” “help” and “windows” in the file myfile.txt.

ANY of these phrases would produce a match.

Note that grep –e is the same as egrep: either command searches using regular expressions.

This or That: matching one of two strings

The grep “or” operator consists of these two characters together:

\|

What’s really happening here is that we are escaping the “regular” meaning of the pipe character, in favor of a “special” meaning. Note that you MUST enclose this inside single or double quotes.

grep “cat\|dog” myfile

matches lines containing the word “cat” or the word “dog.”

grep “I am a $cat\|dog$” myfile

matches lines containing the string “I am a cat” or the string “I am a dog”.

Matching several characters

To match a selection of characters, use brackets: []

For example:

[HhJ]ello matches lines containing hello or Hello or Jello.

Ranges of characters are also permitted:

[0-3] is the same as [0123]
[a-k] is the same as [abcdefghijk]
[A-C] is the same as [ABC]
[A-Ca-k] is the same as [ABCabcdefghijk]

There are also some alternate forms :

[[:alpha:]] is the same as [a-zA-Z]
[[:upper:]] is the same as [A-Z]
[[:lower:]] is the same as [a-z]
[[:digit:]] is the same as [0-9]
[[:alnum:]] is the same as [0-9a-zA-Z]
[[:space:]] matches any white space

Just the opposite: NOT matching

Trying to find lines that do NOT contain a string? Use the -v option:

grep -v Denmark Hamlet.txt

The \? Operator

An expression consisting of a character followed by an escaped question mark matches one or zero instances of that character.

bugg\?y

matches:

bugy

buggy

but not

bugggy

Grouping Expressions

Let’s say we want to find all references to Frederic or Fred, by making the string “eric” following “Fred” optional.

To do this we’ll make grep treat “eric” as a single letter.

An expression surrounded by “escaped” parentheses is treated like a single character:

Fred$eric$\? Smith

matches Fred Smith or Frederic Smith.

$abc$* matches abc , abcabcabc etc. (i.e. , any number of repetitions of the string abc , including the empty string)

Note that we have to be careful when our expressions contain white spaces or stars. When this happens, we need to enclose them in quotes so that the shell does not mis-interpret the command, because the shell will parse whitespace-separated strings as multiple arguments, and will expand an unquoted * to a glob pattern.

So to use our example above, we would need to type:

grep “Fred$eric$\? Smith” filename

Matching a Specific Number Of Repetitions of a Pattern

You can use escaped braces \{ \} to indicate a count of matches for the preceding item.

A good example is phone numbers. You could search for a 7 digit phone number like this:

grep “[[:digit:]]\{3\}[ -]\?[[:digit:]]\{4\}” filename

This matches phone numbers, possibly containing a dash or whitespace in the middle.

And all this is only the beginning…

Don’t try to digest all of this at once. Just carry away this:

grep matches strings in lines of text.

All text files consist of lines of text (with line returns at the end).

You can use wildcards, lists, or patterns to match.

If you need something fancy, find and modify an example!

Resouces

grep man page: http://linuxcommand.org/man_pages/grep1.html

GNU grep manual: https://www.gnu.org/software/grep/manual/grep.html

GNU grep texinfo: http://www.gnu.org/software/grep/doc/

SS64’s grep page: http://www.ss64.com/bash/grep.html

How do I search binary files?
The strings command
The od command

The strings command searches a binary file for text strings, primarily so you can figure out what it does:

strings /bin/bash | less

If you’re a real propeller-head, doing core-dump debugging for instance, you may use the od command. This command shows the contents of a binary file in octal (base 8) format:

od /bin/bash | less

For a real treat, render the file in hexadecimal (base 16) format:

od -x /bin/bash | less