String Functions

String Functions: echo, cut, paste, tr, sed, sort, grep and awk

Some programming languages, for instance perl, have terrific string-handling functions built right in. Bourne/Bash isn’t as comprehensive as perl, but it does have a basic set of functions for cutting, pasting, transposing, sorting and matching strings: echo, cut, paste, tr, sed, sort, grep and awk.

 

echo

The echo command, iIn its simplest form, just prints a message back to the terminal:

echo “Hello There!”
Hello There!

Actually, echo is capable of handling multiple strings. It will place a space between strings, and a newline character after the last string:

echo Hello there, user.
Hello there, user.

But it’s good practice to place your string inside (at least) weak quotes, like the first example.

echo provides an excellent example of how Bash handles wildcards. Try these two commands:

echo “*”

echo *

What is the reason for the difference? Exactly what is each command displaying? Why?

Further:

echo -n Hello # Outputs “Hello” without a following newline character.

echo -e “Hello, world. \a”
echo -e “\nHello, world.”

See the SS64.com page on echo for further discussion of that -e option, which invokes actions called escape sequences.

 

cut

The basic syntax of cut is:

cut -cposition file

where position is the numerical position of the characters you want to capture, and file is the file – or standard output – from which cut should extract. cut will perform its operation on each line of the file, and the output of cut can be sent to another command or written to a file.

If you want to find to cut characters 1-8 of each line of a file (handy for getting a list of all user names from /etc/passwd, for instance), you would type:
cut -c1-8 /etc/passwd

and get a result like this:

root
fred
barney
wilma
betty

To do this with the output of a command, try:

who | cut -c1-8 > whonow

to get a list of current users.

You can cut a single character:

cut -c5 file

Or you can cut to the end of the line with a single number followed by a dash:

cut -c5- file

You can also match multiple ranges:

cut -c1-8,18- file

 

Using cut with tab-delimited files

cut -ffield_number file

Tab-delimited files are easy to handle with cut. Fields are numbered starting with 1. You can just specify which field you want to cut from file, e.g.,
cut -f1 /etc/passwd

 

Using cut with character-delimited files (for instance .csv files): -d and -f

cut -ddelimiter -ffield_number file

To cut a specific field in a comma- or colon-delimited file, just specify the delimiter and the field number:
cut -d, -f1,3 /etc/passwd

 

paste

The paste command may not do exactly what you expect. Its basic syntax is:

paste files

But it doesn’t paste multiple files end-to-end (remember, cat does that). Instead it pastes corresponding lines together. If I have one file called names, and another called numbers, and coincidentally (!) they’re in the correct order, I can paste them together.

names contains:

fred
barney
wilma
betty

numbers contains:

243-0777
255-8877
243-0777
255-8877

(assuming both couples are still living together). So paste names numbers would result in:

fred      243-0777
barney    255-8877
wilma     243-0777
betty     255-8877

where each column is separated by a tab.

You can use a different delimiter if the tab isn’t good for your operation. Like the cut command, the paste command has a -ddelimiter option. You could paste the list above together using commas with:
paste -d’,’ names numbers

You can even paste lines from the same file together, effectively turning all the line endings into tabs:

paste -s names

To do this from standard output, use something like:

ls | paste -d’ ‘ -s –
# the dash means “accept standard input”

 

tr

tr is a filter. It translates one character to another:

tr from-characters to-characters < file_name

It’s surprisingly easy to use. If you’re dealing with a comma-separated file and it would be more convenient to arrange your fields in columns (that is, separated by tabs), just command:

tr ‘,’ ‘     ‘ < file_name

and all the commas are now tabs.

You can change all characters from lower-case to upper-case with:

tr [a-z] [A-Z] < file_name

(Isn’t it amazing that this even works?) You can reverse the operation by switching the two regular expressions.

When you run a tr command, the output is spilled to the screen (standard output). If you want to capture that output to a file, you need to use a redirect. This results in the unusual-looking syntax:

tr [a-z] [A-Z] < source_file > target_file

Assignment:

  1. Create a text file with a few lines of text, including at least one misspelled word.
  2. Create a short script called tr.sh. It should fix this misspelling, but should accept a single argument from the user: the text file name.
  3. Call tr.sh, with the text file name as an argument. Make sure your script works.

 

sed

The sed (stream editor) processor deserves, and has, books all its own. The most important thing to know is that it’s very handy for performing substitutions. The syntax is:

sed command file_name

where the command is a function applied to each line, in turn, of the source file file_name. A substitution function looks like this:

s/replace_this/with_this/

The s means “substitute,” and the first string (replace_this) is replaced with the second string (with_this). So a sed command to replace the string Unix with the string UNIX in the file tutorial.txt would look like this:

sed s/Unix/UNIX/g tutorial.txt > tempfile
mv tempfile tutorial.txt

Note that trailing /g – it means “apply this substitution globally,” rather than just the first time a match is found.

 

You can also send standard output to sed like this:

cat Hamlet.txt | sed s/denmark/Denmark/g

This would fix any mis-typed “denmark” to the proper “Denmark.”

 

This syntax can be expanded. You can prefix a search expression:

cat Hamlet.txt | sed /Hamlet/s/Denmark/DENMARK/g

This command would first search for lines containing “Hamlet,” then in those lines substitute “DENMARK” for the existing “Denmark.”

 

Specify the line numbers of the lines you want to modify:

cat Hamlet.txt | sed 5,10s/denmark/Denmark/g

This checks lines 5 and 10 for “denmark.”

 

Remove lines of text:

sed /the/d Hamlet.txt

This will remove (d) any lines containing “the.”

 

Finally, be aware that sed can use regular expressions as the search string:

sed s/[Dd]enmark/Finland/g Hamlet.txt

The above will switch “denmark” or “Denmark” for “Finland.”

Assignment:

  1. Use the same text file as above. Edit it to include a misspelled word. The misspelled word must occur twice or more.
  2. Create a short script called sed.sh. It should fix this misspelling, but should accept three arguments from the user: the text file name, the misspelled word, and the correct spelling of the word.
  3. Call sed.sh, with the text file name and words as arguments. Make sure your script works.

 

sort

In its simplest usage, sort works like this:

sort file_name

which returns an alphabetized (sorted) list of the lines in file_name:

Barney
Betty
Fred
Wilma

The original file is unchanged, so you need to capture the output of sort if you want to preserve it:

sort file_name > new_file

You can eliminate duplicate lines:

sort -u file_name

Or reverse the sort order:

sort -r file_name

To sort a file right back into itself, you CAN’T use:

sort file_name > file_name
#BAD – DON’T USE

The above will clobber the file and leave it blank. Instead, use the Output option:

sort file_name -o file_name

To sort numerically (the first character must be a number):

sort file_name -n

Even better, sort by fields other than the first one. sort will see each whitespace-separated word as a field. To sort by the third field, for instance, use:

sort file_name -k 3,3

This is really a range specifier: “from field 3 to field 3.” If the field delimiter isn’t white space (a tab or space), tell sort with the -t option:

sort /etc/passwd -n -t : -k 3,3

which results in a sort on the third column (“two forward”) of a colon-delimited (:) list.

 

grep

grep is so useful, it deserves a section of its own here on my web site.

 

awk

awk also has its own section here on my web site.