Real-Time Systems Symposium 1999 Paper | |||||||||||
|
Fridtjof Siebert
We show the difficulties that arise for the implementation of a real-time garbage collector (GC) in a multi-threaded system. A mechanism for synchronization between threads is proposed for a single processor system. It is shown how this mechanism can be used to maintain exact information on roots, to do incremental or even constant-time root-scanning and to allow pre-emption of GC activity. 1. Introduction
For incremental scanning of the memory-graph to be correct, most GC
algorithms rely on a write-barrier to inform the collector about changes
made by the mutator [1]. Write-barrier code typically must not be pre-empted
by GC activity. On standard hardware, it cannot be implemented using a
single atomic instruction. Additional locking code is required to ensure
atomicity.
Another difficulty that arises in multi-threaded systems is scanning values on the stacks for root pointers. Current implementations often use conservative scanning techniques [4] that do not require exact information on the location of references. This conservatism makes such implementations unusable for security relevant domains and for applications that require guarantees on the performance of the GC to be able to satisfy their allocation requests or to meet real-time deadlines. We assume an incremental implementation of a mark-and-sweep GC [5], but the proposed mechanisms is just as well applicable to copying collectors [6]. When we talk about native threads, we mean threads where scheduling can occur at any time, like it is usually the case for kernel-level threads. 2. Synchronization points
The solution we propose here is to prohibit thread switches at arbitrary points during execution for mutator threads. Instead, scheduling should occur only at synchronization points that are automatically inserted in the code by a compiler or virtual machine. At such a synchronization point, a test of a global variable indicates if a thread switch is required, and some code is executed if this is the case. Here is the required code for a synchronization point as as pseudo-C code: if (synchronization_required == true) { ... record thread's state ... V(global_thread_semaphore); /* thread switch */ P(global_thread_semaphore); }The thread scheduler has to set the flag synchronization_ required whenever a different thread than the currently running one should become active. These synchronization points have to be inserted frequently enough so that pre-emption of threads is possible with a minimal delay. This is required in real-time applications where high priority tasks have to be able to pre-empt lower priority tasks within a fixed delay. A compiler or virtual machine implementation can generate code in a way that guarantees frequent execution of the synchronization points. This can be achieved by generating this code within all loops, at (potentially recursive) calls and within long sequences of linear code. For a given platform it is possible to guarantee an upper bound for the length of the time interval between two synchronization points. The synchronization points can be realized using native threads and a global semaphore [7] that has to be acquired by a mutator thread to execute. To allow a thread switch at a synchronization point, this semaphore will be released and directly reacquired to allow a different thread to take over the processor, as shown above. When synchronization points are used, the restrictions imposed on the mutator by the GC are relaxed significantly for the code between two synchronization points. There are three main aspects: 1. The invariants required by the GC do not have to hold in between two synchronization points. A typical GC invariant [1] is: There are no pointers from a black object to a white object. It is sufficient to restore the invariant by the time the next synchronization point is reached. This gives freedom to optimizing compilers allowing them to modify all the code in between synchronization points. 2. No locks are required to modify the memory graph or global data used for memory management. Especially write-barrier code and allocation of objects can be performed without the need for locks on the accessed data structures since all other threads are halted at synchronization points and are guaranteed not to modify this data at the same time. 3. No exact reference information is required, since root scanning cannot take place in between synchronization points. 3. Garbage Collector
4. Root scanning
Call points
Incremental root scanning
Mutator-threads can generate and delete stack-frames while they are running, and they are allowed to modify the references of already scanned frames, making rescanning of these frames necessary. This can be done by adding a flag to every frame that indicates the need for rescanning. Constant time root scanning
To solve these problems, we want to take the task of scanning the stack-frames completely away from the GC. Instead, the mutator threads should guarantee that at a synchronization point, any reference that is used locally by a thread is also present on the garbage collected heap. This can be ensured by storing all life reference variables of the current routine into the heap on synchronization points and on calls. Per-thread preallocated memory sub-pools or stacks can be used for this. When the references are stored, GC constraints like write-barriers have to be obeyed. An implementation of this mechanism has to ensure that storing of these references is efficient, not to increase the overhead for calls unreasonably. We have run several large Java applications (a compiler, SwingSet and HotJava) to collect information on the required additional code on calls. The applications had in average between 2.02 and 2.73 life references on a call, adding the overhead of storing 2 to 3 references, which appears to be reasonable. Having all reference values present on the heap, it is sufficient to have a single root pointer in the system that is scanned at the beginning of the GC cycle. It just has to be ensured that all memory used for local reference variables is reachable from this single root pointer. The root scanning time is constant and very small. 5. Conclusion
6. References
|