String Functions: echo, cut, paste, tr, sed, sort, grep and awk
Some programming languages, for instance perl, have terrific string-handling functions built right in. Bourne/Bash isn’t as comprehensive as perl, but it does have a basic set of functions for cutting, pasting, transposing, sorting and matching strings: echo, cut, paste, tr, sed, sort, grep and awk.
The echo command, iIn its simplest form, just prints a message back to the terminal:
echo “Hello There!”
Hello There!
Actually, echo is capable of handling multiple strings. It will place a space between strings, and a newline character after the last string:
echo Hello there, user.
Hello there, user.
But it’s good practice to place your string inside (at least) weak quotes, like the first example.
echo provides an excellent example of how Bash handles wildcards. Try these two commands:
echo “*”
echo *
What is the reason for the difference? Exactly what is each command displaying? Why?
Further:
echo -n Hello # Outputs “Hello” without a following newline character.
echo -e “Hello, world. \a”
echo -e “\nHello, world.”
See the SS64.com page on echo for further discussion of that -e option, which invokes actions called escape sequences.
The basic syntax of cut is:
cut -cposition file
where position is the numerical position of the characters you want to capture, and file is the file – or standard output – from which cut should extract. cut will perform its operation on each line of the file, and the output of cut can be sent to another command or written to a file.
If you want to find to cut characters 1-8 of each line of a file (handy for getting a list of all user names from /etc/passwd, for instance), you would type:
cut -c1-8 /etc/passwd
and get a result like this:
root
fred
barney
wilma
betty
To do this with the output of a command, try:
who | cut -c1-8 > whonow
to get a list of current users.
You can cut a single character:
cut -c5 file
Or you can cut to the end of the line with a single number followed by a dash:
cut -c5- file
You can also match multiple ranges:
cut -c1-8,18- file
Using cut with tab-delimited files
cut -ffield_number file
Tab-delimited files are easy to handle with cut. Fields are numbered starting with 1. You can just specify which field you want to cut from file, e.g.,
cut -f1 /etc/passwd
Using cut with character-delimited files (for instance .csv files): -d and -f
cut -ddelimiter -ffield_number file
To cut a specific field in a comma- or colon-delimited file, just specify the delimiter and the field number:
cut -d, -f1,3 /etc/passwd
The paste command may not do exactly what you expect. Its basic syntax is:
paste files
But it doesn’t paste multiple files end-to-end (remember, cat does that). Instead it pastes corresponding lines together. If I have one file called names, and another called numbers, and coincidentally (!) they’re in the correct order, I can paste them together.
names contains:
fred
barney
wilma
betty
numbers contains:
243-0777
255-8877
243-0777
255-8877
(assuming both couples are still living together). So paste names numbers would result in:
fred 243-0777
barney 255-8877
wilma 243-0777
betty 255-8877
where each column is separated by a tab.
You can use a different delimiter if the tab isn’t good for your operation. Like the cut command, the paste command has a -ddelimiter option. You could paste the list above together using commas with:
paste -d’,’ names numbers
You can even paste lines from the same file together, effectively turning all the line endings into tabs:
paste -s names
To do this from standard output, use something like:
ls | paste -d’ ‘ -s –
# the dash means “accept standard input”
tr is a filter. It translates one character to another:
tr from-characters to-characters < file_name
It’s surprisingly easy to use. If you’re dealing with a comma-separated file and it would be more convenient to arrange your fields in columns (that is, separated by tabs), just command:
tr ‘,’ ‘ ‘ < file_name
and all the commas are now tabs.
You can change all characters from lower-case to upper-case with:
tr [a-z] [A-Z] < file_name
(Isn’t it amazing that this even works?) You can reverse the operation by switching the two regular expressions.
When you run a tr command, the output is spilled to the screen (standard output). If you want to capture that output to a file, you need to use a redirect. This results in the unusual-looking syntax:
tr [a-z] [A-Z] < source_file > target_file
Assignment:
- Create a text file with a few lines of text, including at least one misspelled word.
- Create a short script called tr.sh. It should fix this misspelling, but should accept a single argument from the user: the text file name.
- Call tr.sh, with the text file name as an argument. Make sure your script works.
The sed (stream editor) processor deserves, and has, books all its own. The most important thing to know is that it’s very handy for performing substitutions. The syntax is:
sed command file_name
where the command is a function applied to each line, in turn, of the source file file_name. A substitution function looks like this:
s/replace_this/with_this/
The s means “substitute,” and the first string (replace_this) is replaced with the second string (with_this). So a sed command to replace the string Unix with the string UNIX in the file tutorial.txt would look like this:
sed s/Unix/UNIX/g tutorial.txt > tempfile
mv tempfile tutorial.txtNote that trailing /g – it means “apply this substitution globally,” rather than just the first time a match is found.
You can also send standard output to sed like this:
cat Hamlet.txt | sed s/denmark/Denmark/g
This would fix any mis-typed “denmark” to the proper “Denmark.”
This syntax can be expanded. You can prefix a search expression:
cat Hamlet.txt | sed /Hamlet/s/Denmark/DENMARK/g
This command would first search for lines containing “Hamlet,” then in those lines substitute “DENMARK” for the existing “Denmark.”
Specify the line numbers of the lines you want to modify:
cat Hamlet.txt | sed 5,10s/denmark/Denmark/g
This checks lines 5 and 10 for “denmark.”
Remove lines of text:
sed /the/d Hamlet.txt
This will remove (d) any lines containing “the.”
Finally, be aware that sed can use regular expressions as the search string:
sed s/[Dd]enmark/Finland/g Hamlet.txt
The above will switch “denmark” or “Denmark” for “Finland.”
Assignment:
- Use the same text file as above. Edit it to include a misspelled word. The misspelled word must occur twice or more.
- Create a short script called sed.sh. It should fix this misspelling, but should accept three arguments from the user: the text file name, the misspelled word, and the correct spelling of the word.
- Call sed.sh, with the text file name and words as arguments. Make sure your script works.
In its simplest usage, sort works like this:
sort file_name
which returns an alphabetized (sorted) list of the lines in file_name:
Barney
Betty
Fred
Wilma
The original file is unchanged, so you need to capture the output of sort if you want to preserve it:
sort file_name > new_file
You can eliminate duplicate lines:
sort -u file_name
Or reverse the sort order:
sort -r file_name
To sort a file right back into itself, you CAN’T use:
sort file_name > file_name
#BAD – DON’T USE
The above will clobber the file and leave it blank. Instead, use the Output option:
sort file_name -o file_name
To sort numerically (the first character must be a number):
sort file_name -n
Even better, sort by fields other than the first one. sort will see each whitespace-separated word as a field. To sort by the third field, for instance, use:
sort file_name -k 3,3
This is really a range specifier: “from field 3 to field 3.” If the field delimiter isn’t white space (a tab or space), tell sort with the -t option:
sort /etc/passwd -n -t : -k 3,3
which results in a sort on the third column (“two forward”) of a colon-delimited (:) list.
grep is so useful, it deserves a section of its own here on my web site.
awk also has its own section here on my web site.