If you are reading this entry, you probably already know about G1 the new Garbage First concurrent collector currently in development for Java 7.
I asked him in the comments some questions about G1 and a very interesting discussion starts. Tony Printezis an expert from the HotSpot GC Group joined the discussion and answered all the questions very detailed.
(I have aggregated the discussion here because I think it is much easier to read if the answer follows next to the question without the noise between them)
Tony: Inititally, G1 will behave similarly to CMS, i.e., stop-the-world "young GCs" (with every now and then some old regions also being reclaimed during such GCs) and concurrent marking (but no sweeping, as it's not needed). But, with several advantages (compaction, better predictability, faster remarks, etc.). We have many ideas on how to proceed in the future to do even more work concurrently, but nothing is certain yet. so we will not say much else on this at this time.
Tony: Regarding stack allocation. I believe (and I've seen data on papers that support this) that stack allocation can pay off for GCs that (a) do not compact or (b) are not generational (or both, of course).
In the case of (a), a non-compacting GC has an inherently slower allocation mechanism (e.g., free-list look-ups) than a compacting GC (e.g., "bump-the-pointer"). So, stack allocation can allow some objects to be allocated and reclaimed more cheaply (and, maybe, reduce fragmentation given that you cut down on the number of objects allocated / de-allocated from the free lists).
In the case of (b), typically objects that are stack allocated would also be short-lived (not always, but I'd guess this holds for the majority). So, effectively, you add the equivalent of a young generation to a non-generational GC.
For generational GCs, results show that stack allocation might not pay off that much, given that compaction (I assume that most generational GCs would compact the young generation through copying) allows generational GCs to allocate and reclaim short-lived objects very cheaply. And, given that escape analysis (which is the mechanism that statically discovers which objects do not "escape" a thread and hence can be safely stack allocated as no other thread will access them) might only prove that a small proportion of objects allocated by the application can be safely stack allocated (so, the benefit would be quite small overall).
(BTW, your 3D engine in Java shots on your blog look really cool!)
[ thank you! :) ]
Andrew: Is it worthwhile to try and collect 'cheap' garbage? Or does the cost of tracking it outweigh the benefits? One thought was having a bit on each object that indicated whether it had ever been assigned to a non-stack location, coupled with a list on each stack frame for objects that had been allocated. Then, when unwinding that stack frame you could immediately GC any object that wasn't potentially referenced from elsewhere.
(Note: the flag would just be turned on, no attempt would be made to reference count, etc)
Tony: In practice doing what you're proposing is not really straightforward (even though it sounds good "on paper"!).
The main issue is that for GCs that rely on compaction (or that at least have a copying young generation, which is basically all the GCs in HotSpot), GCing specific objects is just not possible (or at least, it's not very efficient). Compacting GCs assume that, when a GC happens, all live objects will move somewhere (to another space in copying GCs, or to the bottom of the compacting space in sliding compacting GCs) and all available free space will be in one place. This means that such GCs do not keep track of individual free chunks and makes it impossible to just reclaim specific objects. And there are several good reasons why we like such collectors, aaone of the most important ones being that they allow for very fast, very scalable bump-the-pointer allocation.
Even if we could GC specific objects, how are we going to find all the objects allocated by a particular stack frame? Are we going to link them at allocation? That's extra overhead.
Performance-wise, copying young generations (like the ones we have in HotSpot) are super efficient in reclaiming young, short-lived objects (they just evacuate the few survivors they come across and never even touch the dead objects; this is why they are so fast). So, in most cases, they should be able to reclaim space at least as efficiently as what you propose. In fact, they might be even more efficient, given that they don't have to iterate over the dead objects: they copy the survivors, the rest are reclaimed, done. Whereas, according to what you propose, we would have to iterate over the dead objects and de-allocate them one-by-one.
To summarize, your scheme might work for a non-generational, non-compacting GC (where you can de-allocate specific objects). But, I can't see it working for our GCs.
I got slightly carried away in my reply here... Hope it helps!
Aaron: Will the new GC also collect the non-heap (i.e. Code Cache and Perm Gen)? Or will you get rid of those two?
Tony: Right now the G1 heap replaces the young / old generations. I.e., we still have a permanent space + code cache. In the future, we might be able to incorporate the permanent space into G1 heap (there are many tricky issues that we need to resolve first to do that...). However, I don't think we'll also incorporate the code cache too.
Adam: How large is a region likely to be?
Tony: Right now, regions are 1MB. We allocate a contiguous block of regions for objects that are "humongous", i.e. that are too large to fit in one region.
Adam: Will TLAB's be (partially) replaced by regions, so threads may be allocating into different regions?
Tony: No, regions will not replace TLABs. There's one allocating region and threads will allocate TLABs from it. When that region is full, then it will be "retired" and another one will become the allocating region. So, a single region might hold TLABs from several threads.
Tony: As I mentioned in an earlier post, we perform a marking phase every now and then to get up-to-date liveness information.
Tony: (you're not missing anything! good question) Collections are done by copying. Basically, we pick the regions we want to GC (we refer to that set of regions as the "collection set") and we evacuate the surviving objects from those regions to another set (the "to-space"). The assumption is that to-space will have fewer regions than the collection set and this is how we reclaim space. Given that we assume that the survival rate in the collection set will be quite low (we chose which regions to GC, remember?), copying is the most efficient way to perform such collections.