One of the most useful commands you will use is grep. Its syntax is:
grep [-b] [-c] [-E] [-F] [-h] [-i] [-l] [-n] [-s] [-v] [-w] search_text source_text
Think of this command as “get report.” The key is, report of what?
grep [-b] [-c] [-E] [-F] [-h] [-i] [-l] [-n] [-s] [-v] [-w] search_text source_text
-b
|
Precede each line by the block number on which it was found. This can be useful in locating a block |
-c
|
Print only a count of the lines that contain the pattern. |
-e
|
“Extended” grep, using regular expressions. Same as the egrep command. |
-f
|
“No regular expression” grep: returns results much more quickly because it does NOT interpret regular expressions. |
-h
|
Prevents the name of the file containing the matching line from being appended to that line. Used when searching multiple files. |
-i
|
Ignores case. |
-l
|
Displays the files that have the text, not the text within the file. |
-n
|
Precede each line by its line number in the file (first line is 1). |
-s
|
Suppress error messages about nonexistent or unreadable files. |
-v
|
Specifies you are looking for files that don’t contain the text. |
-w
|
Search for the expression as a word as if surrounded by \< and \>. |
string_to_match
|
The text that you are searching / not searching for. |
file_name
|
The file or files (or other input) to search. |
grep uses regular expressions to perform character matching or string matching. The characters include:
.Matches a single character. *Matches zero or more instances of the immediately preceding character. Example: C* if found would match C, CC or CCC … not to mention a blank string! []Matches any character contained within the brackets. ^Represents the beginning of the line, so if you specified ^T grep would search for any line starting with a T. $Represents the end of the line, so if you specified \.$ then grep would pull up any lines that ended with . \The escape character: it means to take the next character literally, so you can search for characters like * that have special meanings: \* (See my Shell Scripting course page on Operators.)
grep: a caution about special characters
Note: Be careful using the characters $, *, [, ^, |, (, ), and \ in the pattern list because they are also meaningful to the shell. To be sure of your result, enclose the entire pattern list in single quotes (also called “literal quotes” or “hard quotes”): ‘… ‘.
grep ‘2 for $3’ specials.txt
grep will begin to make sense if you’re really clear about what goes in and what comes out of this command by default.
The basic syntax of grep is:
grep string_to_match input
grep searches the file, and the output is a printout of lines that contain matches for string_to_match.
What’s the most relevant difference between text files and binary files?
Text files are composed of lines.
In other words, at the end of each line there is a carriage return and a line feed (cr/lf). (Yes, there are other differences.)
Here’s the critical point:
Commands that process files will process them one line at a time.
End it right Do you know that Unix text-file line endings are different from Windows line endings? This is one really big reason why you shouldn’t use Word as your code editor. Try gedit, kedit or nedit in Linux, pico on an AIX server, or a Unix-compatible text editor like Arachnophilia in Windows. You should also see if your system includes the dos2unix utility. |
You could simply name a file:
grep chocolate myfile.txt
which will return any lines in the file myfile.txt that contain the string “chocolate.” What you get back is a printout, to the terminal screen, of the matching lines.
You don’t have to just let the matching lines print to the screen, though.
Use the pipe character | to redirect the results to another command, or use the redirect character > to write the results to a file. Use your imagination to see how useful this could be:
grep Denmark Hamlet.txt>term_paper_quotations.txt
grep will also accept standard input instead of a file, provided you don’t supply a file. Remember, though, that you will have to supply a standard input:
cat myfile.txt | grep chocolate
I like chocolate
chocolate cookies
Leaving the filename argument blank, as in the previous example, will work just fine a lot of the time.
It won’t work well at all if I’m trying to search the output of a cat command for the word Hamlet, for instance, and there happens to be a file named Hamlet. It’s too easy to become unclear whether Hamlet is the filename argument or the search string.
That’s why we use a dash to indicate standard input, just for clarity:
cat myfile.txt | grep chocolate –
I like chocolate
chocolate cookies
When you think about it, the fact that grep accepts standard input is pretty wonderful. How many ways can we generate standard input?
By reading a file:
cat myfile.txt | grep chocolate –
By listing files:
ls –la | grep txt –
By listing processes:
ps aux | grep tty –
By listing users:
who | grep studenth –
The fact that we can feed grep input from a variety of sources isn’t the only great thing about grep.
What’s even more wonderful is that the matching string doesn’t have to be just a simple string, like “chocolate.”
Now, consider this: $ is a special character within regular expressions; it means “end of line.”
So how do you search for lines containing the dollar sign without grep thinking you’re looking for a line ending?
You have to “escape” the special meaning of $:
\$
Try this test: first, create two files:
touch hello.gif hello1gif
Now do a long listing of your directory and pipe it to grep to look for hello.gif.
Go ahead, actually do it before reading on.
What command did you try ?
To search for a line containing the text hello.gif, the correct command is
ls | grep ‘hello\.gif’ –
since
ls | grep ‘hello.gif’ –
will match lines containing hello-gif , hello1gif , helloagif , etc.
Now try:
ls | grep hello\.gif
What results do you get? Why?
How do you look for a list of possible matches, not just one?
grep –e “support\|help\|windows” myfile.txt
Would search for “support,” “help” and “windows” in the file myfile.txt.
ANY of these phrases would produce a match.
Note that grep –e is the same as egrep: either command searches using regular expressions.
The grep “or” operator consists of these two characters together:
\|
What’s really happening here is that we are escaping the “regular” meaning of the pipe character, in favor of a “special” meaning. Note that you MUST enclose this inside single or double quotes.
grep “cat\|dog” myfile
matches lines containing the word “cat” or the word “dog.”
grep “I am a \(cat\|dog\)” myfile
matches lines containing the string “I am a cat” or the string “I am a dog”.
To match a selection of characters, use brackets: []
For example:
[HhJ]ello matches lines containing hello or Hello or Jello.
Ranges of characters are also permitted:
[0-3] is the same as [0123]
[a-k] is the same as [abcdefghijk]
[A-C] is the same as [ABC]
[A-Ca-k] is the same as [ABCabcdefghijk]
There are also some alternate forms :
[[:alpha:]] is the same as [a-zA-Z]
[[:upper:]] is the same as [A-Z]
[[:lower:]] is the same as [a-z]
[[:digit:]] is the same as [0-9]
[[:alnum:]] is the same as [0-9a-zA-Z]
[[:space:]] matches any white space
Trying to find lines that do NOT contain a string? Use the -v option:
grep -v Denmark Hamlet.txt
An expression consisting of a character followed by an escaped question mark matches one or zero instances of that character.
bugg\?y
matches:
bugy
buggy
but not
bugggy
Let’s say we want to find all references to Frederic or Fred, by making the string “eric” following “Fred” optional.
To do this we’ll make grep treat “eric” as a single letter.
An expression surrounded by “escaped” parentheses is treated like a single character:
Fred\(eric\)\? Smith
matches Fred Smith or Frederic Smith.
\(abc\)* matches abc , abcabcabc etc. (i.e. , any number of repetitions of the string abc , including the empty string)
Note that we have to be careful when our expressions contain white spaces or stars. When this happens, we need to enclose them in quotes so that the shell does not mis-interpret the command, because the shell will parse whitespace-separated strings as multiple arguments, and will expand an unquoted * to a glob pattern.
So to use our example above, we would need to type:
grep “Fred\(eric\)\? Smith” filename
You can use escaped braces \{ \} to indicate a count of matches for the preceding item.
A good example is phone numbers. You could search for a 7 digit phone number like this:
grep “[[:digit:]]\{3\}[ -]\?[[:digit:]]\{4\}” filename
This matches phone numbers, possibly containing a dash or whitespace in the middle.
And all this is only the beginning…
grep matches strings in lines of text.
All text files consist of lines of text (with line returns at the end).
You can use wildcards, lists, or patterns to match.
If you need something fancy, find and modify an example!
grep man page: http://linuxcommand.org/man_pages/grep1.html
GNU grep manual: https://www.gnu.org/software/grep/manual/grep.html
GNU grep texinfo: http://www.gnu.org/software/grep/doc/
SS64’s grep page: http://www.ss64.com/bash/grep.html
The strings command searches a binary file for text strings, primarily so you can figure out what it does:
strings /bin/bash | less
If you’re a real propeller-head, doing core-dump debugging for instance, you may use the od command. This command shows the contents of a binary file in octal (base 8) format:
od /bin/bash | less
For a real treat, render the file in hexadecimal (base 16) format:
od -x /bin/bash | less