Lab: Detecting memory leaks

CSC 323: Software design · Spring, 2012

Department of Computer Science · Grinnell College

0: Setting up the source code

Create a new subdirectory for this lab somewhere within your MathLAN home directory and use cd to move into that directory. Put in it a copy of the files line-sorter.c and line-tester from the /home/stone/courses/software-design/code directory.

The line-sorter.c file contains a standalone C program that reads in lines from one or more text files specified on the command line, sorts them into lexicographical order, and writes the sorted lines to standard output. Any of the files can contain any number of lines, and any line can contain any number of characters -- storage is allocated dynamically to accommodate files of any length and lines of any length.

Exercise. Read through the source code for the program to get an idea of its overall structure and approach. Then compile the program and use it to obtain a version of line-tester in which the lines are sorted.

1: Memory allocation and deallocation

Even though this is quite a short program, it allocates and frees memory in several different places, not always in the same function. For instance, the push_every_line function contains a call to malloc, but no call to free, so that none of the storage that it allocates when executed is freed before the exit from the function. And it would be an error to add a statement like free(new_component); anywhere inside the definition of push_every_line. Even though the pointer new_component is stored in the activation record for push_every_line on the run-time stack and so is discarded when that function returns, the storage on the other end of that pointer remains accessible, because push_every_line returns another copy of that pointer, and the caller has a valid use for it.

It's often difficult to get the timing right. You don't want to free any dynamically allocated storage while there is still an accessible pointer to it that might still be needed, but you also shouldn't wait around until there are no pointers to it left at all (since then there is no way to refer to it in order to free it). Storage allocated by the call to malloc in push_every_line is freed at the very end of the execution of the main function, in the free(trailer) line.

But is all of the storage allocated by all of the calls to push_every_line freed at that point? It's hard to say, at a glance. To fully justify either an affirmative answer or a negative one, you'd probably have to trace carefully through the operations that occur during the intervening invocations of mergesort, making sure that no storage locations are dropped, discarded, or accidentally made inaccessible as the list components are split up, rearranged, and merged.

Exercise. For every call to the malloc function in line-sorter.c, find the call to the free function that frees the storage that it allocates.

2: Tracing memory allocation and deallocation

The GNU C compiler comes with a library that makes it possible for the programmer to determine whether all dynamically allocated storage is properly freed. The mtrace function, which takes no arguments and returns no value, replaces the standard functions for memory allocation and deallocation (malloc and the calloc macro, realloc, and free) with instrumented versions that record their operations in an auxiliary log file. The user specifies this file by setting an environment variable, MALLOC_TRACE, in the shell within with the program to be checked will run.

To set this variable, one might give the command

export MALLOC_TRACE="line-sorter.mtrace"

in the shell's terminal window. The part that looks like an assignment statement sets the environment variable; preceding it with the word export directs the shell to pass this variable along to any subshells that it spawns.

Exercise. Define the MALLOC_TRACE environment variable, using the file name of your choice for the log, and create an empty file with that name in the directory containing the line-sorter.c program. Edit line-sorter.c, placing a call to mtrace at the beginning of the main function. (You'll also need to #include the header file containing the prototype for mtrace, which is mcheck.h.) Recompile the line-sorter program and run it again to sort the lines of line-tester.]

3: The mtrace utility

The log file that the memory-management system constructs is not very readable, so GNU also includes a utility program also called mtrace, that reformats the results more legibly. You invoke it from the command line, giving it the name of the log file as the command-line argument:

mtrace line-sorter.mtrace

If you do this now, mtrace will produce the report

No memory leaks.

which tends to confirm the conclusion that you may have reached back in section 1, that line-sorter eventually frees all of the storage that it allocates.

Unfortunately, this conclusion is incorrect: The version of line-sorter that I provided to you does leak memory, though not when it is used to sort the line-tester file.

Exercise. To see a case in which line-sorter leaks memory, sort the lines of line-sorter.c itself, then use the mtrace utility to get a readable report of the contents of the log file. You should see a line containing three hexadecimal numbers: the memory address of a block of memory that was allocated and never freed, the number of bytes in that block, and the memory address of the call to malloc (or realloc) that performed the allocation. Using this information, try to find, diagnose, and correct the error that resulted in the leak.

4: Improving mtrace's report

Of the three hexadecimal numbers that mtrace reported, the first and last were not much help, because the programmer usually has no useful information about where blocks of allocated memory are located and how the executable program instructions are arranged. In particular, the caller's address would be much more useful if it were associated with a particular line number in the file containing the source code.

To enable mtrace to include that information in its report, you need to compile the program in which the memory is to be traced with the -g option, the same one that prepares a program for debugging. One of the effects of this option is to embed information about the names of the source files and the line-by-line structure of the source code into the executable file. If you then re-run the program and invoke mtrace, giving it both the name of the executable file and the name of the log file as command-line arguments, mtrace will replace the hexadecimal caller address with a reference to the source-code file and line number.

Exercise. Recompile line-sorter.c and re-run mtrace as described, to determine the exact location of the call that allocated storage that was never deallocated. Using this additional information, find, diagnose, and correct the error that is causing the leak. Confirm that your solution worked (at least in this particular case).

5: Limiting memory tracing

The GNU library that contains mtrace also contains the function muntrace, which also takes no arguments and returns no value. The effect of invoking muntrace is to suspend the tracing of memory allocation and deallocation. Tracing resumes when and if another call to mtrace is executed.

For instance, if you're investigating whether a memory leak occurs within a particular block of code, you can put a call to mtrace at the beginning of the block and a call to muntrace at the end, and the memory-allocation system will track only operations inside the block.

When a directory contains several files, perhaps even several standalone executables, that contain calls to mtrace, it can be difficult to keep track of which of them the log file currently pertains to. Another way to be more selective about memory tracing is to enclose the invocation of mtrace in compiler directives that make it conditional on the definedness of some identifier that the preprocessor will see -- MTRACING, perhaps:

#ifdef MTRACING mtrace(); #endif

The gcc compiler leaves the call to mtrace out of the executable that it constructs unless the identifier MTRACING has been defined. To activate it, recompile the code, giving gcc the command-line option -DMTRACING (“define MTRACING”).

Exercise. Conditionalize the call to mtrace that you added to line-sorter.c. Recompile that file without defining MTRACING and run the resulting executable on line-tester. Confirm (by checking timestamps) that the newly compiled version of line-sorter did not create an mtrace log file.