Continuation Passing Style

I dropped away from posting here for a bit. After that last post I had a cardio-ablation, and it went well. All is good, and touch wood, will stay that way for a long time to come.

I unfortunately, during recovery, got out of a few good habits. Programming my pet project though wasn’t one of them, and so here we are, with something to say.

CPS (Continuation Passing Style) has been my want and desire for the internal way of handling async function calling for a while, and I am soooooooOOOOOO happy today, because I have finally got code generation not just working, but the result is compiling and running, without blowing up!

Ok, so now for some knowledge download.

A normal function call puts a few parameters into registers, and some others are passed on the stack. The called function moves the stack pointer to make a bit of space for its locals and does its work. This can happen recursively, so the stack grows. As functions return the stack pointer is unwound, so the stack shrinks back down. This happens a lot. A HUGE amount.

The thing to note about stack based function calling is that it is mostly single threaded. If you want multi-threaded you have to create more stacks, and then you need to wait for them to complete. That is, if you want to stick with the stack based model.

CPS is slightly different, in that every function call instead takes an additional parameter, the so called continuation, that tells it what to do next. It’s usually a callback function of some kind. And the thing is, a language that reads like it is stack based can generate CPS output.

I’ll get on to the why later. First, let’s play a little. Here’s an example bit of C code that adds 10 to a number and prints the result. Not much going on here.

int addTen(int number) {
    return number + 10;
}

void main() {
    printf("The answer is %d\n", addTen(20));
}

Let’s convert that to CPS. It’s easy enough.

void addTen(int number, void(*continuation)(int)) {
    continuation(number + 10);
}

void main_2(int result) {
    printf("The answer is %d\n", result);
}

void main() {
    addTen(20, main_2);
}

What happens here is that we are calling the function ‘addTen’ with the additional continuation function. We’ve split ‘main’ into two functions, the setup and the continuation.

If only this was it, but actually functions need to carry some state around with them. Let’s try another example.

int addTen(int number) {
    return number + 10;
}

void main(char* message) {
    printf("The %s is %d\n", message, addTen(20));
}

Now we can’t just point to another C function, we need some state.

void addTen(int number, void(*continuation)(void*,int), void* context) {
    continuation(context, number + 10);
}

struct context {
    char* message;
};

void main_2(struct context* context, int result) {
    printf("The %s is %d\n", context->message, result);
    free(context);
}

void main(char* message) {
    struct context* context = malloc(sizeof(struct context));
    context->message = message;
    addTen(20, main_2, context);
}

You might be looking at this wondering what it’s all about, wondering why there are still function calls. See the call to ‘addTen’, to ‘continuation’, they are at the end of the each block. The C compiler will not emit a function call, but instead a tail call, a jump, not a call. The stack doesn’t grow.

Now we’re getting closer. As it stands this isn’t efficient, but replace malloc/free with a bump pointer allocator and a garbage collector and things speed up a bunch. However, it’ll never be quite as fast as the stack calling convention, so why do we do it?

It is often said that async/await as implemented in C# is like an infection, in that once you start using it, it spreads through your code until nearly everything is async. The downside is that any function declared as async has slightly higher overhead, but the upside is that you can do many things at once without the overhead of too many stacks. Stack space is a big overhead in high throughput servers.

My thought was, what if we start at the other end. Everything is async, unless we can prove otherwise. Have an environment that only has the minimum number of threads required to match the physical hardware, and all work is broken into these async + continuation chunks that can be spread over many cores.

Could we make a language and environment where we get parallelism without any coder effort? It’s not a new idea, just one that I wanted to run with. It’s the core idea behind the YAFL language I am writing. Write linear code, generate parallel code, use CPS as the main mechanism to enable this.

I guess my next write up needs to be how this is actually working, with some real world examples. As it stands the output is 1000’s of lines, but it compiles down quite small. I rely heavily on the C compiler’s dead code removal.

And then there’s the garbage collection. It’s critical that it’s ultra fast and that it has no delay. I have thought a lot about that, and CPS might just enable this as well. Again, for another day.

Arrhythmia is back and gone again

On March 30th 2024 my heart went back into AF. A few days later the rate ramped up to 130, so I was back on Digoxin. On April 19th I had my second Catheter Ablation in hospital at Basildon CTC. As I write this now, I am in rhythm.

Every so often my heart gives me a little kick, just to remind me that it isn’t healthy, but it stays in rhythm.

I probably have to get used to the idea that I will always have this hanging over me, and maybe, one day, I’ll need further treatment. For now I will try to lead a good and healthy life, and fingers crossed, stay in rhythm.

Fibers and Garbage Collection

https://github.com/supersoftcafe/yafl/tree/indev/yaflr

After a lot of experimentation I am satisfied with this. Fibers to spread work over a fixed pool of threads and to cheaply suspend the caller on IO. Language support will make use of the automatically.

Garbage collection is integrated with the fiber library. Each fiber has its own mini pool that is swept and merged into the callers pool as the fiber exits. This is a single threaded model, working in parallel, the best of both worlds.

I am currently working on migrating the compiler to C++. KDevelop is my IDE of choice now. It’s robust and fully featured.

Thoughts on automatic memory management

I have had a serious headache thinking about automatic memory management lately. My goal for YAFL is to hide all aspects of memory management, keep the overhead low, support embedded devices, have a small runtime and have consistent performance at all times. Most of that statement rules out a mark sweep garbage collector, and all of its derivatives no matter how short the pause. Sorry Java, but no. I have concentrated my efforts on reference counting, accepting that the cost of atomic locking on SMP systems will be reduced with aggressive optimisation and reference borrowing.

… my head hurts … i haven’t slept in ages …

Slowly, recently, I have started to wonder if there is a way to have my cake and eat it, as they say.

Alright folks, let’s start with a simple premise. When using reference counting, there is no atomic operation on allocation, because you are guaranteed to hold the only reference. On release, if the reference count is 1, you again can avoid the atomic decrement because your reference is the only one. This can be proven, but that’s another discussion. So for now, run with it. If we need to acquire an existing pointer, its count will exceed 1, that’s guaranteed. To what value we don’t know, but it will exceed 1. Short lived objects can be released easily, even if they form graphs, so long as we have good optimisation and transfer ownership when building graphs, so even those pesky complex results of functions that have compound types are simple, avoid atomics and release easily, so long as they aren’t shared.

Could we hybridise. An object is created with a counter… err… flag, let’s call it a flag now, of 1. If a pointer is acquired its flag is set to 2. That’s all. On release, if the flag is 1 we free the memory immediately, otherwise we drop it. We let it leak. Let’s give this flag a meaningful name. Let’s call it the “Generation”.

Now we need to add a mark sweep collector. We use a state machine attached to each thread to drive it, advancing a small amount each time an allocation function is called, scanning all reachable objects in generation 2 only. In a pure functional language this works, because generation 3 can’t be mutated to reference generation 2. If an object is retained after 5 or so scans, we move it into generation 3. I need to think more to get some meat on the bones here, but this feels right so far.

Traditional GCs periodically scan older generations. I propose that we never scan an older generation. Instead, the runtime action already stated above of setting the generation to 2 on acquire will automatically, and regularly in some cases, keep pulling active objects back into generation 2. In order for something in generation 3 to be unreachable, something referencing it in generation 2 must be collected. When it is collected we’ll move all outgoing gen 3 references back to gen 2.

This all seems a bit airy fairy, and frankly it is a bit. It’s still a thought in progress. Somehow this also links in with the fibre library I have added. There is a serious lack of detail around the state machine in my mental model. I don’t know if it’ll keep up with high churn.

What I do know is that this is the first time in this project that I have felt that I might even be close to solving the problem of having memory management that scales from simple single core embedded devices all the way up to massive multicore beasts without degrading performance.

Even if it is slightly less efficient that a state of the art solution, I value consistency much more.

More to come.

Equivalence principle

This isn’t rocket science, it’s YAFL. In YAFL these two statements are considered to be equivalent:

  • let square = (x) => x * x
  • fun square(x) => x * x

If you look deep inside the AST you’ll see that there is a separate mechanism for representing let and fun objects, but that is really there to make debugging the compiler easier. Semantically, in the YAFL language, they are the same.

YAFL also supports overloading.

Think about it…. The above two statements are equivalent, and overloading is supported. Sooooo

  • let square = (x: Float32) => x * x
  • let square = (x: Float64) => x * x

it is completely valid to have two lets of the same name, and the compiler will deal with it gracefully. We do this by distinguishing loads by the receiver type, i.e. what the recipient of an expression expects can be a factor in finding which ‘square’ to load.

This goes one step further, as it’s not just to do with lambdas and functions. The following is also valid:

  • let anumber = 100i32
  • let anumber = 100i64
  • fun square(x: Int32) => x * x
  • fun main() => square(anumber)

Because of the context in which ‘anumber’ is used, it is clear that we are referring to the first instance. Just like a function call context with parameters can make it clear which lambda type we are looking for when loading ‘square’.

There is symmetry here that works, it just works. But it does start to mess with my mind when generics comes into the picture, and that comes next.

YAFL generics, again

I have been stuck in a thought loop. In Rust traits provide generic constraints and they provide the equivalent of class/interface in other languages. The distinction is in how you use them. Traits are very similar to the Haskell concept of class, which has nothing to do with classes as you or I know them. And class/interfaces as we know them from C++/C#/Java etc have nothing to do with traits.

For my first attempt at formalising the grammar around generics I went for the Rust model of traits and trait implementations, which I kept separate from the OO class/interface model I borrowed from C#/Java. This still left me feeling uneasy. Rust managed to cross over these concepts. Should I try harder, or is it ok that I have two generic domains.

Then I went back to the origins of what a class means, where the original concept in Scheme was as a kind of module. I thought further that really it’s got a lot in common with modules, but with additional parameterised state. But it is still a set of functions in a scope. And I thought about Kotlin’s extension methods, and how they are used for passing in domain APIs to code blocks. Scope, state, modules. Generics. All coming together.

At this point my head was starting to hurt, until something suddenly struck me. The fundamental concept I was seeing here is that the C#/Java concept of class/interface is equivalent to the Rust concept if trait, IFF the class has a zero parameter constructor. In that case, it might as well be a singleton, and it might as well be implicitly in scope.

Suddenly the pieces are falling into place. Let’s play.

# Interface defines some functions that can operate on T
interface Addable<T> {
    fun `+`(a: T, b: T): T
}

# Zero param constructable class implements the functions
class Int32Addable() : Addable<Int32> {
    fun `+`(a: T, b: T): T = plusFunction(a, b)
}

# This function can add anything if there is an implementation
# of Addable available at the call site
fun addSomething<T>(a: T, b: T): T where Addable<T> = a + b

# Or we could construct an instance of Int32Addable and pass
# it in as an object.
fun addSomething<T>(a: T, b: T, impl: Addable<T>): T = impl.`+`(a, b)

fun main() {
    # These two calls are functionally equivalent, but
    # semantically and in terms of runtime code generation
    # are passing functionality in different ways.
    print(addSomething(1, 2))
    print(addSomething(1, 2, Int32Addable()))
}

This isn’t everything, not by a long shot. It looks nice, but I have just re-introduce the diamond inheritance problem that Rust so elegantly avoided. So it’s still a work in progress.

VPN to home

I have gigabit internet. It has long been a dream and so when it arrived I pounced at the opportunity. There was unfortunately a drawback, in that the provider uses CGNAT. Carrier Grade NAT is NAT as we all know it, but on an industrial scale. It’s used mostly by the mobile operators to reduce the number of IP addresses they need to own, but fixed line operators are getting in on the act. The big drawback if you run a home VPN server is that it wrecks dynamic DNS solutions.

I pay extra for a static IP address. There are technically solutions that shouldn’t require this, but they require effort and hacking around with software that I just don’t want to do. Anyway, it doesn’t cost much to have a static IP. After the ISP asked me a few questions to make sure I wasn’t just chasing something I didn’t understand, it was set up within hours.

My router is a Netgear Nighthawk R7000 with builtin VPN server. Unfortunately it’s a bit old, and not quote as secure as the people that maintain Linux packages would like, and so by default they don’t let me connect my Linux laptop to it. Most advice you find will say “use a larger key”. Well, I can try petitioning Netgear, but I double that they’ll update the firmware, so I am stuck trying to convince OpenVPN to let me connect.

Well, here we go again. 100s of forums and comments later, I finally have the magic incantation. Edit the file ‘client2.ovpn’ that you obtained from the router and add the following (replacing the redacted IP of course):

tls-cipher "DEFAULT:@SECLEVEL=0"
tls-version-min 1.0
route-gateway 192.168.xx.1
data-ciphers AES-128-CBC

I’m not giving much away with that IP, after all it’s the same one most of us have. I just randomise the penultimate digits to avoid clash with whatever public network I happen to be going over.

Simply put, this is telling OpenSSL (the underlying library used by OpenVPN for encryption) to allow lower security levels and telling it where to route packets. The last part is a rename of the existing ‘ciphers’ section, because for some reason it needs to be renamed.. Ugh, I don’t know. Just do it, it works.

I don’t need to do any of this on any other platform. I guess Linux is that good.

Linux encourages more reading. Reading is good.

Sharing a folder with VMs

I am sure this used to be easier. Oh yes, I used to use a Mac with Parallels for virtualisation, it was easier.

I have my Fedora 38 host using virt-manager to run a Windows 10 guest in a QEmu container. It’s pretty slick, but I can’t yet transfer files between the guest and host.

Looking in the virt-manager devices I see that I can add ‘virtiofs’ for sharing a host folder. Perfect, let’s try that. It needs shared memory, OK I can switch that on. Still not working, oh, it doesn’t work with Windows at all.

Next I read about Spice WebDAV. So, I add the new spice channel ‘org.spice-space.webdav.0’ and install the guest tools for spice webdav. I have read that it shares the folder ‘Public’ by default, which is fine by me. But still nothing in Windows.

With frustration starting to creep in I go old school and decide to setup a Samba share on the internal virtual network. Does it work? Does it heck!… I have lost a lot of hair over this, so let’s just accept that there is a magic incantation that works. I read so many posts, and so many rants, and each one had a piece of the puzzle. Frankly this post will be a piece of somebody else’s puzzle because every deployment is different.

Before I do the grand reveal, the key bits of information that finally unblocked it for me where:

  1. You cannot use ‘guest ok = yes’ with ‘valid users’ (note ‘guest ok’ and ‘public’ mean the same)
  2. For SELinux to work, you need to give extra permission to the folder like so ‘chcon -t samba_share_t Shared’.
  3. Restrict Samba to use the internal VM network, which in my case is virbr0.
  4. Punch a hole through the firewall so that Samba can be connected to via the internal VM network like so:
    firewall-cmd –get-active-zones
    sudo firewall-cmd –permanent –zone=libvirt –add-service=samba
    sudo firewall-cmd –reload

And now with no further ado, here is the magic incantation I finally used in ‘/etc/samba/smb.conf’.

[global]
        workgroup = SAMBA
        map to guest = bad user
        bind interfaces only = yes
        interfaces = lo virbr0
        guest account = myuser

[Shared]
        comment = Shared
        path = /home/myuser/Shared
        security = share
        read only = no
        browseable = yes
        writable = yes
        map to guest = bad user

        public = yes
        guest ok = yes
        guest only = yes
        force user = myuser
        force group = myuser
        force create mode = 0644
        force directory mode = 0755

Linux is an education.

Thoughts on generics in YAFL

I have had to take a break from YAFL development for the last month due to work commitments. For my own sanity, I have a clear priority ladder of family, work, hobbies, in that order. But now things are starting to lighten up, the clouds are parting, and I need to start collecting my thoughts again.

Before I go into the generics discussion I had one small thought about constructors. Currently I have gone for an approach similar to Kotlin, where the class is effectively also a function of the same name, which is the constructor. I want to move that to be a static member function named ‘new’ instead. For me it just provides more clarity, but so far I have avoided supporting static member functions. I think I still will avoid them, but with caveat that’ll become clear as we dive into generics.

The model I want to follow has its roots in Haskell and Rust, which have similar concepts referenced by the ‘class’ and ‘trait’ keywords respectively. The basic idea is that you can declare a set of well defined functions in a named group, and a type is said to be generically compatible with that named function group if a specialised set of implementations is available.

I’ll take that concept, but not exactly as defined in these languages. In Rust for example the ‘trait’ concept is conflated with the ‘interface’ concept that can be found in Java/Kotlin/C#, which makes sense for the Rust memory model, but is harder to fit into a languages with that hide memory management from the programmer like Java/Python etc. That aside though, I do have issue with the idea that the same named thing can be thought of as a type in the strict sense but also as a generic constraint. These, to me, are substantially different ideas.

YAFL already has ‘interface’ and ‘class’ keywords that behave in a similar way to Java/Kotlin/C#, and I am happy with that. I don’t want to force anything in there, so the generics idea will be kept separate, at least for now whilst we get this first implementation sorted.

Now it might be best to show some ideas, and explain what I am aiming for. The syntax will borrow from Rust and Kotlin:

trait Numeric<T> {
    let T::MAX: T
    let T::MIN: T
    fun `+`(l: T, r: T): T
    fun `-`(l: T, r: T): T
}

impl Numeric<Int32> {
    let Int32::MAX = 2147483647
    let Int32::MIN = -2147483648
    fun `+`(l: Int32, r: Int32): Int32 => someAddThingy(l, r)
    fun `-`(l: Int32, r: Int32): Int32 => someSubThingy(l, r)
}

fun processNumber<T>(a: T, b: T): T where Numeric<T> => a + T::MAX

‘::’ means scoped by. It’s a syntax used in Rust and C++ that I like, and used for module scoping. Here I propose to use it for type scoping as well, so we have MAX scoped by the type T. In the function we require that the type T complies with the Numeric trait, which it does. The function can be proven to be correct for any T that has an implementation of Numeric.

Anyone familiar with Rust will be looking at that syntax with furrowed eyebrows, wondering why I put the T in angle brackets next to the trait name. It’s because this isn’t about one generic type. Try this on for size:

trait Convertible<X, Y> {
    fun X::new(r: Y): X
}

impl Convertible<Int32, String> {
    fun Int32::new(r: String): Int32 => parseToInt(r)
}

fun addToString<X>(a: X, b: String): X where Convertible<X, String>, Numeric<X> => a + X::new(b)

That’s nice. We’ve effectively added a new constructor to ‘Int32’ that takes string. We’ve required that X is convertible from ‘String’ and that it’s a numeric type so that we can add. It’s all built on the idea of a trait that defines the relationship between multiple types.

Back to my opening statement, I don’t want to support static members on classes. This trait syntax clearly has done something semantically similar, so we’ll utilise that. The primary constructor of a class will be defined as ‘new’ in an anonymous impl block, and all other constructors can be explicitly declared by the programmer in the same way like so:

impl {
    fun MyClass::new(a: Int32, b: Float32): MyClass => MyClass::new(1, 2, 3, a, b)
}

Looks like rust, and as I said before it borrows a lot, un-ashamedly so, as Rust is brilliant. But I also borrow much from Kotlin, C# and even a little from Python. It’s a chimera.

There is one more thing though. YAFL tries to infer a lot in order to reduce code bloat, so where possible I’d like for it to avoid the need to declare trait constraints on generic parameters. Ideally, it should note the operations used in a function body, and locate trait definitions in the imported namespaces that uniquely match. If found, we can imply the constraints. So:

fun addToString<X>(a: X, b: String) => a + X::new(b)

Implicitly, we can see that X is expected to have a function ‘new’ that takes a parameter of type String. In our imports there is a trait defined for this, uniquely. The result X is added to ‘a’, another X. There is another trait defined for maths operations, so we can imply that as the next constraint. Ultimately, I believe we can remove much of the boiler plate, whilst retaining good code clarity.

I’ll stop here, and mull over this a bit. So far though, this feels good, and keeps the concepts simple but powerful.

Virt-Manager running Windows 10

My employer has provided me with a lovely new laptop, a completely blank slate, ready for me to install an OS and setup. I have gone with Fedora 38 Beta because of the maturity of Fedora in general, and a general hedonistic desire for pain.

Some of my work is on Windows 10, for which I have a virtual machine that I copied over from the old laptop. Here’s where my pain begins.

Permissions – Selinux as setup on Fedora won’t let KVM touch the hard disk image I have copied over. Try “chcon -t virt_content_t -R PATH/TO/VMs”.

You want multiple VMs to talk to each other without the IP address changing all the time, and they must be able to access the internet no matter what network you are actually connected to – Use the bridge. It’s probably already setup as “virbr0”. That’s all, just use bridged networking. Each machine will be allocated an IP, on next startup, that will remain with it for a very long time.

Display and fractional scaling issues – Set “Scale Display” to “Never” and enable “Auto resize VM with window”. Set video to QXL and then edit the XML to double each of the RAM values. Make sure you have added a spice channel, target name “com.redhat.spice.0”. Install spice tools on the guest machine. Set DPI to 200% in the guest. Whatever fractional scaling you use on your Linux desktop, the guest will be appropriately scaled.

Sleep mode doesn’t work or isn’t present – This is a pain, because suspend to RAM basically doesn’t work, it crashes on resume. Suspend to disk on the other hand does work, with coaxing, and there’s no button in the UI to initiate it. Edit the XML and ensure “suspend-to-disk” is enabled. Install the virtio client tools in the guest, they can be downloaded from the Proxmox web site. Add a new spice channel for the QEMU guest agent. Restart the guest. You can now hibernate the VM from the host with this command line “virsh -c qemu:///system dompmsuspend win10dev disk”.

Naturally much of this will work differently, or not at all for you, but here it is as future notes for me, and hopefully the start of finding your solutions.

We don’t use Linux because it’s easy.