I have had a serious headache thinking about automatic memory management lately. My goal for YAFL is to hide all aspects of memory management, keep the overhead low, support embedded devices, have a small runtime and have consistent performance at all times. Most of that statement rules out a mark sweep garbage collector, and all of its derivatives no matter how short the pause. Sorry Java, but no. I have concentrated my efforts on reference counting, accepting that the cost of atomic locking on SMP systems will be reduced with aggressive optimisation and reference borrowing.
… my head hurts … i haven’t slept in ages …
Slowly, recently, I have started to wonder if there is a way to have my cake and eat it, as they say.
Alright folks, let’s start with a simple premise. When using reference counting, there is no atomic operation on allocation, because you are guaranteed to hold the only reference. On release, if the reference count is 1, you again can avoid the atomic decrement because your reference is the only one. This can be proven, but that’s another discussion. So for now, run with it. If we need to acquire an existing pointer, its count will exceed 1, that’s guaranteed. To what value we don’t know, but it will exceed 1. Short lived objects can be released easily, even if they form graphs, so long as we have good optimisation and transfer ownership when building graphs, so even those pesky complex results of functions that have compound types are simple, avoid atomics and release easily, so long as they aren’t shared.
Could we hybridise. An object is created with a counter… err… flag, let’s call it a flag now, of 1. If a pointer is acquired its flag is set to 2. That’s all. On release, if the flag is 1 we free the memory immediately, otherwise we drop it. We let it leak. Let’s give this flag a meaningful name. Let’s call it the “Generation”.
Now we need to add a mark sweep collector. We use a state machine attached to each thread to drive it, advancing a small amount each time an allocation function is called, scanning all reachable objects in generation 2 only. In a pure functional language this works, because generation 3 can’t be mutated to reference generation 2. If an object is retained after 5 or so scans, we move it into generation 3. I need to think more to get some meat on the bones here, but this feels right so far.
Traditional GCs periodically scan older generations. I propose that we never scan an older generation. Instead, the runtime action already stated above of setting the generation to 2 on acquire will automatically, and regularly in some cases, keep pulling active objects back into generation 2. In order for something in generation 3 to be unreachable, something referencing it in generation 2 must be collected. When it is collected we’ll move all outgoing gen 3 references back to gen 2.
This all seems a bit airy fairy, and frankly it is a bit. It’s still a thought in progress. Somehow this also links in with the fibre library I have added. There is a serious lack of detail around the state machine in my mental model. I don’t know if it’ll keep up with high churn.
What I do know is that this is the first time in this project that I have felt that I might even be close to solving the problem of having memory management that scales from simple single core embedded devices all the way up to massive multicore beasts without degrading performance.
Even if it is slightly less efficient that a state of the art solution, I value consistency much more.
More to come.