CSC161 2011S Imperative Problem Solving

Laboratory: GNU/Linux III

Summary: In this laboratory you will continue exploring features of the Bash shell.

Prerequisites: GNU/Linux I; GNU/Linux II

Preparation

Log in to your Linux workstation and open a terminal window.

Exercises

Exercise 1: Pipes and Filters

a. You may recall that we ended the previous lab with a bit of redirection of input and output. There is one last form of I/O redirection to consider at this point. A Linux construction called a pipe (represented by the vertical bar, |, which on many keyboards is located on the same key as the backslash), can be used to connect utilities in a pipeline. A pipe causes the data sent to stdout from one utility to be re-directed into stdin of another utility. We say that the output from the first utility is piped to the input of the next utility.

Try the following example, which allows you to page through a long directory listing.

ls -l /bin | less

The fact that many Linux utilities can accept their input from one file, multiple files, or stdin makes them very versatile. Pipes take this versatility one step farther: any command or utility that can accept input from stdin can also accept input from a pipe. Any command or utility that sends output to stdout can also send output through a pipe.

Create a pipeline that displays the last ten commands you have used. (Feel free to ask me or one of your colleagues for a hint, on this or any other exercise, if you need one.)

Finally, note that we can string multiple pipes together to create longer pipelines, like so:

command | command | command

b. Many Linux utilites that accept input from stdin and send output to stdout are known as filters because, in one way or another, they filter their input to produce output. These utilities are especially apt for creating useful pipelines. In a previous laboratory, you worked with some filters already, including less, tail, and cat.

The following table lists some additional filters. All of these filters can accept their input from regular files, stdin, or a pipe. None of them modify their original input; rather, they generate new output that reflects a modified version of the input.

Try each of the examples given, and look up the commands as needed, to be sure you understand them.

Utility Description Example usage
wc "word count" - counts characters, words, and lines in input wc -l ~/.bashrc
sort sorts lines in input sort -k2 ~rebelsky/share/linux/sciencefac.txt
uniq "unique" - removes (or reports) duplicate lines in input uniq ~rebelsky/share/linux/duplicates.txt
grep searches for a target string in input grep li ~rebelsky/share/linux/duplicates.txt
cut removes parts of lines from input cut -d' ' -f2 ~rebelsky/share/linux/sciencefac.txt

c. Use the filters given above (and other utilities if needed) to perform the following tasks. Note that for some of the tasks you may need to combine filters with pipes.

1. Count the lines of source code in a program you wrote for a previous course.

2. Determine the number of user accounts on the MathLAN. Hint: Each account has a directory in /home.

3. Print a list (in the terminal window, not on a printer) of faculty in Grinnell College's Science Division, sorted by last name. The file ~rebelsky/share/linux/sciencefac.txt contains the data you need.

4. Print a list of faculty in the Biology Department. Your list should not include faculty in any other department.

5. Consider one of the programs you wrote for a previous course. Take a quick look at it, using less to remind yourself of a variable name that is used in several places in the file. Now use grep to print a listing of the lines that include that variable. Get grep to print the line number (in the source file) for each line of output as well.

Note that it can be very useful to use grep in this way when you return to a project after taking a long break from it. For example, you might want to find every instance of a given class -- in any source file in the project -- as part of re-acquainting yourself with your code.

6. Print a list of all Grinnell faculty named David.

Hint: To do this, it would be helpful to create a single list that combines all the entries in the three faculty lists I have provided. But instead of generating a separate combined file, you can do this on the fly using cat as shown below. This is where cat gets its name -- from its ability to concatenate multiple files.

cat  ~rebelsky/share/linux/socialfac.txt ~rebelsky/share/linux/humanfac.txt ~rebelsky/share/linux/sciencefac.txt

7. Print a unique list of departments in the Humanities Division.

Exercise 2: A Few More Utilities

a. Suppose you want to find a file somewhere on the system. If you know its name and an approximate location, you can use the find utility as follows: "find dirname -name filename". This will look for and report all files with the given file name that are located anywhere in or below the given directory. (Note that -name is an option specifier, and it should be used verbatim.)

Somewhere below /home/rebelsky/share there are two files named sciencefac.txt. Find those files.

b. Suppose you have two text files that are quite similar to one another, and you want to know exactly how they differ. (Or perhaps you want to confirm that they don't differ.) This can be done with the command "diff fileA fileB".

Let's use diff to locate the differences between two lists of faculty.

Note that the output of diff can seem a bit cryptic at first. Output like the following indicates that to modify fileA to match fileB, we would need to remove line 4 and insert a new line at the same position (line 4). The 'c' in "4c4" stands for "change." The arrows in the first column of the output indicate lines that would have to be removed ('<') or inserted ('>').

  4c4
  < (The text of the line that must be removed is found here.)
  ---
  > (And the text of the line that must be inserted is here.)

Do some experiments of your own devising to determine what the output from diff will look like when fileA has one line more or less than fileB has (i.e., to make fileA look like fileB a line would need to be removed or added, respectively).

c. Did you know you can send mail from the command line? Give this a try. (Note that mail takes its input from stdin, so you can type your message after entering the command. As usual, to signal the end of the message, type ctrl-d.)

mail username@grinnell.edu

In fact, you can read and manage your mail with mail too. This used to be the main way people read mail in the UNIX community, but by now most people prefer other mail readers. Even so, sending a quick mail from the command line can be handy. (You can even send mail using another command and a pipe to generate the body of the message.)

The mail command can provide a particularly easy way to send a text file to someone. For example, if you had stored your homework in the file hw1.txt, you might write

mail -s "CSC 161 HW 1 (YourName)" rebelsky@grinnell.edu < hw1.txt

The -s "..." flag sets the subject and the < hw1.txt reads the information from the given file.

d. Want to know if a machine on the network is up and running? Then ping it.

ping duerer.cs.grinnell.edu

Until you press ctrl-c, this sends messages to duerer repeatedly, and duerer responds by sending messages back. The output gives information about the length of the message, but what you usually want to know is whether the recipient is able to respond at all.

This works for hosts that are farther away too, but you may have to wait a bit for the response.

ping www.google.com

Exercise 3: Archiving and Compressing Files

The Unix shell command tar allows you to "bundle" several files into a single file. The resulting file typically has the extension .tar", and it is called a "tar file" or sometimes a "tarball." Tar files are a very convenient way to transmit a group of files (as a single file) via email, ftp, etc.

a. Try the command "tar --help | less". Some shell programs also provide help in this form, and I particularly like the examples provided at the top of the tar help text.

b. Make a tar file, called lab1.tar that contains at least two of your source code files from a previous class. (You will find the first example in the tar help text useful for this.)

Use the command "ls -ltr" to confirm that a tar file was created.

Note also that if you tar a directory, the entire subtree of your file structure is included in the tarfile, and the directory structure is preserved.

c. Create a new directory called something like tartest. Move or copy your tar file into that directory, and then "untar" it (i.e., extract the files from it). The third example in the tar help text will be useful here.

Confirm that your source code files were extracted into your tartest directory.

e. The tar utility has an option that compresses files as well as bundling them together. However, it is more common to use the GNU file compression utility gzip to compress files on a Linux system. Try the command "gzip filename", and then confirm that the original file has been replaced by a new smaller one named filename.gz.

Of course, it is possible to gzip a tar file, and in fact, it is common to do so.

f. At this point, you should have a file that was compressed via gzip. To revert the file to its original state, use "gunzip filename".

g. You will have noticed that gzip has much the same functionality as zip, which is commonly used on Microsoft systems. In fact, Linix can also zip and unzip, which is convenient for sharing files with Microsoft users.

I will probably ask you to create a tar file, gz file, or zip file when you submit homeworks that include multiple source files. That will be more convenient for both of us than having you submit multiple individual files.

Exercise 4: Logging onto a Remote Host

One of the nice features of UNIX and UNIX-like systems is that multiple people can log onto and work on the same machine simultaneously.

Want to know the name of the machine you are working on? Then ask, with hostname.

Want to know who is logged onto your machine? You can ask with who.

Next, you will log onto the machine where one of your colleagues is sitting, so choose a colleague and ask for the name of the machine s/he is working on.

To do this, use the command

ssh username@hostname.domain

but replace username with your own username, hostname with the name of the machine you want to work on, and domain with cs.grinnell.edu.

Note: The first time you use any particular machine to remotely log onto any other particular machine, you will get a message like the following. Answer the question yes. (However, if you ever get this message in a different circumstance, you would be wise to answer the question no.)

  The authenticity of host 'cocke.cs.grinnell.edu (132.161.196.33)' can't be established. 
  RSA key fingerprint is fd:4f:89:7a:4b:51:b0:c6:7e:a9:4b:ab:66:cb:58:f1.
  Are you sure you want to continue connecting (yes/no)?

At this point, you should be logged onto a new host computer. The commands you issue in the terminal window where you invoked ssh will be received by the new host. (Any other windows that you may have open are still associated with your own machine.) Check that this worked correctly by checking the computer's hostname. Then check to see who is logged onto that machine.

Try the following command, but expect to be disappointed. (I'm asking you to do this, so that you might recognize the error message if you get it again another day.)

  
evince ~rebelsky/Web/Courses/CSC161/2010F/Labs/linux-lab-3.pdf &

The issue is that if you log onto a host via ssh, without adding the option -X, you will not be able to view any graphics output generated by the remote host. (This includes any application that has a GUI interface.) To amend this difficulty, first log out of the host by typing exit or ctrl-d. Then log in again with ssh -X username@hostname.domain, noting that the X must be upper-case. Now try evince again, and it should work (though it may be a bit slow).

In practise, however, it is usually preferable to launch graphics-oriented applications on the machine where you are sitting. When you do so on a remote host, the graphics output must be sent to you over the network. This works, but it can be slow.

Exercise 5: Wildcards, Quoting, and Command Substitution

a. You may already be familiar with the idea of wildcards in filenames. For example, you can use the command

ls *.java

to get a listing

of all the files with the extension .java.

What allows this to work? The shell parses your input, discovers the asterisk in it, and "expands" the command to include all files that match the given pattern. This ability to expand commands based on special characters in the input is also called globbing.

Further, the asterisk is not the only special character used for globbing. Here are some more.

Special Character Is replaced by... Example(s)
* matches any sequence of characters (including zero characters) cat ~rebelsky/share/linux/*fac.txt | less
? matches any single character ls -ld /usr/bin/gc?
ls /usr/lib/lib?.a
[...] matches any single character inside the brackets ls /usr/lib/lib[xX]*.a

b.

On occasion, we want to keep the shell from treating special characters specially.

There is a file with the following goofy name in my share directory:

   ~rebelsky/share/linux/goofy file name

What do you expect will happen if we try to list its contents using the following command? Give it a try to be sure.

cat ~rebelsky/share/linux/goofy file name

You may have an idea how to work around this problem. If so, try it to make sure it works.

In fact, there are two ways to deal with it. First, you can quote the file name, as follows. The single quotation marks keep the shell from treating characters inside them specially. In this case, it keeps the shell from parsing the line into four separate tokens.

cat 'goofy file name'

Second, you can escape individual characters with a backslash. Again this keeps the shell from interpreting these characters in their usual (specialized) way. Be sure to try this one as well.

cat goofy\ file\ name

You may have noticed that, in the files I have provided in ~rebelsky/share/linux/ that list faculty names, the Chair of each department is denoted by an asterisk. Use grep to output a list of Department Chairs in one (or all) of the divisions. It might also be nice to sort your list by last name.

c. Command substitution allows you to embed one command inside another, using "backquotes" to delimit the nested command. (You should be able to find the backquote character in the upper-left of the keyboard, with the tilde.)

When you do this, the shell first executes the backquoted command, then substitutes the result that the command output to stdout in place of the command itself. Finally, the shell interprets and runs the resulting command string.

Try these, and then look up any of the commands you are not familiar with already.

echo There are `ls | wc -l` files in my current working directory.
echo Today is `date +%A`. It is now `date +%r`.

When we begin writing shell scripts, you will find this ability useful for creating informative output messages.

 

History

January 2008 [Marge M. Coahran]

  • Created. (Some material taken from Jan 2007 version.)

August 2008 [Marge M. Coahran]

  • Moved material on "Pipes and Filters" to this lab from a previous one.

31 August 2010 [Samuel A. Rebelsky]

Wednesday, 26 January 2011 [Samuel A. Rebelsky]

 

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Wed Jan 26 10:35:39 2011.
The source to the document was last modified on Wed Jan 26 10:35:36 2011.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CSC161/2011S/Labs/linux-lab-3.html.

Samuel A. Rebelsky, rebelsky@grinnell.edu