Software Design (CSC-223 97F)

[News] [Basics] [Syllabus] [Outlines] [Assignments] [Studies] [Examples] [Readings] [Projects]

# Greedy Algorithms

Original version by John Stone on October 24, 1995. Some modifications by Samuel A. Rebelsky in September 1997.

## Introduction

Many of the problems that computer programs are designed to solve have involve optimization or finding a "best" solution. That is, some operation can be performed in a number of different ways, all meeting some basic constraints, and the problem is to find the way that is optimal, in the sense that it maximizes the value of some desirable quantity or minimizes the value of an undesirable one.

A greedy algorithm is a method for finding an optimal solution to some problem involving a large, homogeneous data structure (such as an array, a tree, or a graph) by starting from an optimal solution to some component or small part of the data structure and extending it, by considering additional components of the data structure one by one, to an optimal global solution. A greedy algorithm assumes that a local optimimum is part of the global optimimum.

The principal advantage of greedy algorithms is that they are usually straightforward, easy to understand and easy to code. Their principal disadvantage is that for many problems there is no greedy algorithm. More precisely, in many cases there is no guarantee that making locally optimal improvements in a locally optimal solution yields the optimal global solution.

## Psuedocode

In an imperative language, like C, one might write:

```solution = initialize_solution(data);
while (!completed(solution, data)) {
component = select(data, solution);
solution = extend_optimally(component, solution);
}
return solution;
```

In an object-oriented language, like Java, one might express it more like:

```public Solution greedilySolve(Data data, Evaluator eval) {
// The current solution (or partial solution)
Solution solution = new Solution(data);
// The parts of the input data that are currently unused
Data unused;
// The next best piece of datum to extend
Datum component;
// Keep going until our solution is finished
while (solution.incomplete()) {
// Get the parts of the data not yet used in the current
// solution.
unused = solution.unused();
// Find the "best" one to add to the current solution
component = eval.best(unused,solution);
// Extend the solution
solution.extend(component);
} // while
return solution;
} // greedilySolve
```

## Example: Bin packing

Let's consider two closely related problems, only one of which can be effectively solved by a greedy algorithm. We imagine a thief who has broken into a warehouse, surrounded by merchandise of various sorts and equipped with a flexible swag-bag, or knapsack, in which to carry loot. We idealize the problem by assuming that knapsack has a fixed volume but can be stretched into any form that encloses that volume. We assume that the unit price, the unit volume, and the number of units of each kind of merchandise that the warehouse contains are all known. The thief, naturally, wants to carry off goods of the greatest possible value; how should the theif pack a bag in order to achieve this goal?

### Bin packing of "continuous" objects (greedy method works)

If the warehouse contains merchandise that is continuous rather than discrete -- stuff, such as cloth or gold dust, rather than things, such as clock radios and steel ingots -- there is a greedy algorithm that can easily be proven to give the correct answer: Divide the unit price of each kind of stuff by its unit volume to obtain its ``value density.'' Load up the knapsack with the stuff that has the highest value density until either the knapsack is completely full or there is no more of that stuff. Repeat with stuffs of successively lower value densities until either the knapsack is full or the warehouse is empty.

If the warehouse winds up empty, it's obvious that the thief has optimized her take: There's nothing left to steal! And if, instead, the knapsack is full, it's again obvious that the solution is optimal -- replacing anything in the knapsack with anything outside of it would leave one with an equal volume of stuff of equal or lower value density, and hence with the same or less total value in the knapsack.

### Bin packing of "discrete" object (greedy method fails)

The discrete form of the knapsack problem is much harder, because the volume of the knapsack may not be an integer multiple of the unit volume of the item with the highest value density. The greedy policy of taking as many as possible of those items may leave an empty space in the bag, too small to accommodate any of the items left in the warehouse, but large enough that the overall value density is well below that of other items. Is it worthwhile to steal items of lesser value density that happen to fit better into the knapsack? No simple answer fits all cases.

## Example: Minimum spanning tree

Another classical greedy algorithm finds a minimum-weight span for a weighted graph. A graph is a collection of labelled, unstructured objects called vertices (you may want to think of them as points) some pairs of which are connected by edges. (In diagrams edges are typically represented as segments of lines or curves, but the exact shape of the edge is irrelevant when you're considering the graph -- all that is important is which vertices are connected to one another.) In a weighted graph, each edge has a positive real number associated with it; when a graph is used to model some real-world network, such as cities connected by highways or pipelines running between pumping stations, an edge's weight typically represents the cost of constructing, traversing, or consuming whatever it is that the edge represents.

A span for a given graph is a subset E'of the edges of the graph such that you can move from any vertex of the graph to any other by following edges that are in E'. Not every graph has a span (for example, a graph with four vertices and only two edges cannot have a span), but lots of graphs have many spans. The optimization problem is to identify a span in which the sum of the weights of the edges in the span is as small as possible.

A greedy algorithm solves this problem as follows: Initialize E' to be the empty set of edges. Choose any vertex of the graph, and add to E' whichever of the edges connecting the chosen vertex to any other vertex has the least weight. Now think of the two vertices that are connected by that edge as the ones that have already been ``spanned,'' and find the least-weight edge that connects a spanned vertex to an unspanned one. Add it to E', and mark the previously unspanned vertex at its far end as spanned. Repeat until all the vertices are spanned (or until there are no edges connecting the spanned and unspanned vertices, in which case no span exists).

It's not obvious that this method always leads to a span that has the least possible total weight, but in fact one can prove that it does, as follows: Suppose that the minimum-weight span E* that has a lesser total weight than the E' constructed by this algorithm. E' must include at least one edge that is not in E* (otherwise E' would be a subset of E*, and the total weight of E* would have to be greater than that of E'). Walk through the algorithm until it is about to add to E' some edge e' that is not in E*. At that point, there must be some other edge e* connecting the spanned and the unspanned vertices, one that is in E* (otherwise E* would not be a span, because you would not be able to move from any of the spanned vertices to any of the unspanned ones or vice versa).

Consider the result of replacing e* with e' in E'. The result of this replacement is still a span, since any path from one vertex to another in the graph that uses e* can instead move from the spanned end of e* to the spanned end of e', and from there to the unspanned end of e', and from there to the unspanned end of e*. But the weight of e' is less than or equal to that of e*. It cannot be less, since E* is by hypothesis the minimum-weight span. So e' and e* have the same weight, and it makes no difference which one you include in the span.

Repeating this argument for every edge that is in E' but not in E*, one finds that every such difference involves the substitution of one edge for another of equal weight. But then the total weight of E* is the same as that of E', contrary to the hypothesis that is was less. So E' is itself a minimum-weight span.

History of modifications:

• 24 September 97 [rebelsky]: Added OOP psuedocode. Removed pronouns from discussion of thief. Added section headers. Reorganized some discussions.

[News] [Basics] [Syllabus] [Outlines] [Assignments] [Studies] [Examples] [Readings] [Projects]

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.

Source text written by Samuel A. Rebelsky.