GC FAQ -- draft

Text versions (converted with lynx) of these files are available, as GC-faq.txt, GC-algorithms.txt, GC-lang.txt, and GC-harder.txt.

There's been some concern that the emphasis here ought to be a bit more evangelical, and a little less academic (perhaps that evangelism ought to be added, rather than technical content subtracted). Concise arguments for the wonderfulness of garbage collection are more than welcome, as are pointers to non-concise arguments.

Table of Contents

What is garbage collection? Garbage collection is a part of a language's runtime system, or an add-on library, perhaps assisted by the compiler, the hardware, the OS, or any combination of the three, that automatically determines what memory a program is no longer using, and recycles it for other use. It is also known as ``automatic storage (or memory) reclamation''. Why is it good? Manual memory management is (programmer-)time consuming, and error prone. Most programs still contain leaks. This is all doubly true with programs using exception-handling and/or threads. A second benefit of garbage collection, less obvious to people who haven't used it, is that relying on garbage collection to manage memory simplifies the interfaces between components (subroutines, libraries, modules, classes) that no longer need expose memory management details ("who is responsible for recycling this memory"). You can read more at Object-Orientation FAQ -- 3.9) Why is Garbage Collection A Good Thing? Is garbage collection slow? Not necessarily. Modern garbage collectors appear to run as quickly as manual storage allocators ( malloc/free or new/delete ). Garbage collection probably will not run as quickly as customized memory allocator designed for use in a specific program. On the other hand, the extra code required to make manual memory management work properly (for example, explicit reference counting) is often more expensive than a garbage collector would be. This is more likely to be true in a multithreaded program, if the specialized allocator is a shared resource (which it usually is). Since this was first written, memory has become so cheap that garbage collectors have been applied to very-large heaps, for example more than a gigabyte. For a sufficiently large live set, pause times are still an issue. On the other hand, for very many applications modern garbage collectors provide pause times that are completely compatible with human interaction. Pause times below 1/10th of a second are often the case, and applications with relatively small live sets (or slowly changing live sets, for generational collector) can obtain pause times below 1/100th of a second. Can I use garbage collection with C or C++? Probably. Modern (well-tested, efficient, non-pausing) garbage collectors are available that work with all but the most pathological C and C++ programs, including legacy code. See GC, C, and C++ for more details. Does garbage collection cause my program's execution to pause? Not necessarily. A variety of algorithms allow garbage collection to proceed concurrently, incrementally, and (for some definitions of the term) in "real time". There are incremental garbage collectors that work with C and C++, for instance. Where can I get a C or C++ garbage collector? Boehm-Weiser collector

http://www.hpl.hp.com/personal/Hans_Boehm/gc/ or

ftp://parcftp.xerox.com/pub/gc/gc.html Great Circle from Geodesic Systems <sales@geodesic.com> or 800-360-8388 or http://www.geodesic.com Kevin Warne <warne@direct.ca> or 800-707-7171

GC is necessarily slower than manual memory management.

GC will necessarily make my program pause.

Manual memory management won't cause pauses.

GC is incompatible with C and C++.

Most allocated objects are dynamically referenced by a very small number of pointers. The most important small number is ONE.

Most allocated objects have short lifetimes.

Allocation patterns (size distributions, lifetime distributions) are bursty, not uniform.

VM behavior matters.

Cache behavior matters.

"Optimal" strategies can fail miserably.

precise vs. conservative

moving/compacting vs. non-moving

explicit vs. implicit reclamation phase

stopping vs. incremental vs. concurrent

generational vs. non-generational

What do you mean, garbage collection and C?

malloc

free

malloc

free

It is also possible to program in a style where free still reclaims storage, but the garbage collector acts as a backstop, preventing leaks that might otherwise occur. This style has also been tested with many applications, and it works well. The advantage here is that where it is easy for the programmer to manage memory, the programmer manages the memory, but where it is not, the garbage collector does the job. This doesn't necessarily run any faster than free -does-nothing, but it may help keep the heap smaller.

How is this possible?

This doesn't sound very portable. What if I need to port my code and there's no garbage collector on the target platform?

Won't this leave bugs in my program?

If a program is developed using garbage collection, and the collector is taken away, then yes, the result may contain bugs in the form of memory leaks. Similarly, if a program is developed using FP (or VM) and that is taken away, that program, too, may contain bugs.

Also in practice, many programs that use malloc and free already leak memory, so use of a garbage collector can actually reduce the number of bugs in a program, and do so much more quickly than if they had to be tracked down and fixed by hand. This is especially true if the memory leak is inherent in a library that cannot be repaired.

Can't a devious C programmer break the collector?

One problem described by a team considering the use of GC is the use of pointer mangling to get "really opaque" pointers. That is, pointers handed out from a package to a client are XORed with a random number chosen at program start time, and thus the client cannot access package data structures without going through defined interfaces. This is simply incompatible with conservative GC. It is also incompatible with a strict interpretation of the Ansi C standard, and can confuse leak detection tools (which use conservative GC technology to detect leaks), but nonetheless people do it, and it generally does work.

Insert more questions here -- send them to <gclist@iecc.com>

What does a garbage collector do about destructors?

There remains the question of what to do with destructors that do something other than assist in memory management. There are a couple of typical uses.

One use is for objects that have state outside the program itself. The canonical example is an object that refers to a file. When a file object becomes eligible for reclamation, the garbage collector needs to ensure that buffers are flushed, the file is closed, and resources associated with the file are returned to the operating system.

Another use is where a program wants to keep a list of objects that are referenced elsewhere. The program may want know what objects are in existence for, say, accounting purposes but does not want the mechanism of accounting to prevent objects from otherwise being freed.

There are several ways of handling such situations:

In systems where the garbage collector is "built in," it typically has special knowledge of all the cases where outside resources can be referenced and can deal with them appropriately. Many GC systems have a notion of a "weak pointer." A weak pointer is one that is not considered as a reference by the garbage collector. So if an object is referenced only by weak pointers, it is eligible for reclamation. Weak pointers can be used to implement the object list example. For another example, in Java an external resource R might by protected like this: class ClientR { CRWeak wr; // delegate all methods to wr; ClientR() { wr = new CRWeak(this); } } class CRWeak extends WeakReference { static ReferenceQueue rq = new ReferenceQueue(); static { Thread th = new CRCleaner(rq); th.setDaemon(true); th.start(); } CRWeak(Object x) { super(x, rq); } ExternalResource r; // delegated methods from ClientR } class CRCleaner extends Thread { ReferenceQueue rq; CRCleaner(ReferenceQueue rq) { this.rq = rq; } public void run() { while (true) { CRWeak x = (CRWeak) rq.remove(); // Release x.r } } } When no clients have references to a ClientR, its memory is released, and the weak reference to it is placed on the respective reference queue. The cleaning thread can ensure that the external resource is reclaimed. Many GC systems have a notion of "finalization." An object may be registered with the GC system so that when it is about to reclaim the object, it runs a function on the object that can perform necessary cleanups. Finalization is tricky. Some of the issues are: When can an object actually be finalized? This is trickier than it first appears in the presence of some normally-desirable optimizing transformations. In what thread, resource, or security context does a finalization function run? What happens when registered objects reference each other? What happens if a finalization function makes an object not be garbage any more?

A good book, recently published. Garbage Collection: Algorithms for Automatic Dynamic Memory Management by Richard Jones and Rafael Lins, published by John Wiley and Sons, 1996. ISBN 0-471-94148-4. http://www.memorymanagement.org/ Ravenbrook's (formerly, Harlequin's) Memory Management Reference http://www.cs.utexas.edu/users/oops/papers.html This is a collection of various papers on garbage collection. Among them is Paul Wilson's survey paper, which should be required reading for anyone claiming to be a practical computer scientist. http://www.hpl.hp.com/personal/Hans_Boehm/gc/

ftp://parcftp.xerox.com/pub/gc/gc.html The Boehm-Weiser collector has been in use since the mid-1980s. It is widely ported, C and C++ compatible, conservative garbage collector. http://www.geodesic.com Geodesic systems sells garbage collectors for C and C++, among other things. I think they sell support as well. Henry Baker's Archive of Research Papers Many interesting random things, including his paper on real-time garbage collection. L. Peter Deutsch and Daniel G. Bobrow. An efficient, incremental, automatic garbage collector. Communications of the ACM, 19(9):522-526, September 1976. Combines heap reference counting with a stack scan to get relatively low reference counting costs and incremental reclamation. Henry G. Baker, Jr. List processing in real time on a serial computer. Communications of the ACM, 21(4):280-294, April 1978. A clear exposition of how a real-time copying-compacting collector can work. W. R. Stoye, T. J. W. Clarke and A. C. Norman. Some Practical Methods for Rapid Combinator Reduction. In SIGPLAN Symposium on LISP and Functional Programming . 1984. Interesting issues and tricks, among them "one-bit reference counts". David Ungar. Generation Scavenging: A Non-disruptive High Performance Storage Reclamation Algorithm. In Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments. 1984. A good paper on generational collection. John Hughes. A Distributed Garbage Collection Algorithm. In Functional Programming Languages and Computer Architecture. 1985. See title. Hans Boehm and Mark Weiser. Garbage Collection in an Uncooperative Environment. Software Practice and Experience. September, 1988. Conservative garbage collection. Andrew W. Appel, John R. Ellis and Kai Li. Real-time concurrent garbage collection on stock multiprocessors. In SIGPLAN Symposium on Programming Language Design and Implementation, 1988. Not really "real time", but a nice adaptation of Baker's algorithm to typical hardware. Joel F. Bartlett. Compacting Garbage Collection with Ambiguous Roots. DEC WRL Research Report 88/2. February, 1988. Conservative-compacting garbage collection is in fact possible. John R. Ellis and David L. Detlefs. Safe, Efficient Garbage Collection for C++. DEC SRC Research Report 102. June, 1993. If you were dead-set on adding garbage collection to C++, this is what it might look like.