The Ultimate Breakdown of Kotlin Coroutines. Part 1. State-Machines and Suspension.

Introduction

Throughout 2020 I wrote a document on coroutines codegen, which aimed mostly at my colleagues at JetBrains and Google to help them understand the pitfalls of the codegen and which design decisions we made. These blog posts are a simplified and trimmed down version of the documentation for a larger audience of experienced Kotlin programmers, who just want to know how the compiler turns sequential code into suspendable and become more proficient at their jobs. The blogposts will not cover kotlinx.coroutines internals. However, I think, with the knowledge of coroutines internals, it will be easier to understand, for example, how Flow's backpressure works and why it is essential for Flow operators to be inline with crossinline lambda parameter (spoilers: this is an optimization).

I decided to write the blogposts now, before Kotlin 1.5 release, since Kotlin 1.5 brings new JVM_IR back-end with a lot of improvements in code generations, tail-call optimization of suspend functions and so many bug fixes, that I lost count.

The blogposts are JVM-centric, that means it explains how things work in JVM back-end since this is the area I am most familiar with and since in JVM, there are guaranties of backward compatibility, which the compiler shall obey in both so-called "Old JVM" back-end, and in the new JVM_IR one. The new back-end's naming can differ from the official documentation: the document uses the "IR" suffix, while the official documentation omits it.

The things I omitted in this breakdown, but present in the documentation are compiler internals, and the shape of generated classes, since the last thing I want is to confuse my readers and overwhelm them with the information, which is helpful for the compiler writers, but not necessarily for language users.

Throughout the series term "coroutine" will represent either a suspend lambda or a suspend function, which is different from the usual definition of coroutines - something like a lightweight thread. The document reuses the term since "suspend lambda or function" is wordy, and when it requires the typical definition, I will say explicitly "a coroutine in a broad sense."

The last couple of words before I get to the meat - I will use a lot of code to explain different concepts, most of it actually suspends, and it might get overwhelming really fast. I will try my best to make the example short and digestible, however, coroutines is a complex and intermingled topic, and it is hard to kill the three-headed hydra one head at a time, with its heads, namely suspend, resume and state-machine, supporting each other. I hope, I've done a fine job of explaining the concepts, but if something is unclear, feel free to point to the part, which needs to be reworked.

With all the introductory note out of the way, let me begin by introducing suspend lambdas.

Suspend Lambda

Suspend lambda is an example of a coroutine, and the compiler turns ordinary, sequential code inside the lambda into suspendable. The example shows a simple suspend lambda:

suspend fun dummy() {}

suspend fun main() {
    val lambda: suspend () -> Unit = {
       dummy()
       println(1)
       dummy()
       println(2)
    }
    lambda()
}

which, upon running, will print 1 and 2, as expected.

One can call a suspend function only from other suspend function or suspend lambda, but it can call ordinary, non-suspendable functions. For example, both dummy and println are used only inside the lambda. Because one cannot call suspendable functions from ordinary, we can imagine two worlds: suspendable and ordinary. Alternatively, one can consider them as two different colors, and we color the program by using the "suspend" modifier.

The lambda, creatively named lambda, contains two suspend calls (dummy) and one from the main function to the lambda itself, but there is no suspension. Let us add it:

import kotlin.coroutines.*

var c: Continuation<Unit>? = null

suspend fun suspendMe() = suspendCoroutine<Unit> { continuation ->
    println("Suspended")
    c = continuation
}

suspend fun main() {
    val lambda: suspend () -> Unit = {
        suspendMe()
        println(1)
        suspendMe()
        println(2)
    }
    lambda()
}

Now, when we run the code, it prints Suspended and nothing else; it does not even finish the execution of the program, since lambda is, in fact, suspended, and it suspends suspend fun main as well.

To fix the issue with the suspension of main, we need to cross a boundary between suspendable and ordinary worlds and make main ordinary, so, when it starts a coroutine, and the coroutine suspends, main does not. Since one cannot call a suspendable function from an ordinary one, there are special functions, so-called coroutine builders, whose sole purpose is to create a coroutine, run it, and when it suspends, return execution to the caller. Other than that, they act like other ordinary functions. Let's name ours, I don't know, builder:

fun builder(c: suspend () -> Unit) {
    c.startCoroutine(object: Continuation<Unit> {
        override val context = EmptyCoroutineContext
        override fun resumeWith(result: Result<Unit>) {
            result.getOrThrow()
        }
    })
}

A separate section explains the exact mechanism of starting a coroutine (in a broad sense) and writing their builders. For now, consider builder as a boilerplate to cross the worlds.

Now, when we change main to use the builder and not suspend itself

fun main() {
    val lambda: suspend () -> Unit = {
        suspendMe()
        println(1)
        suspendMe()
        println(2)
    }
    builder {
        lambda()
    }
}

and then run the example, it will print expected Suspended, but this time it will exit the program.

Additionally, when we change main to resume the lambda

import kotlin.coroutines.*

var c: Continuation<Unit>? = null

suspend fun suspendMe() = suspendCoroutine<Unit> { continuation ->
    println("Suspended")
    c = continuation
}

fun builder(c: suspend () -> Unit) {
    c.startCoroutine(object: Continuation<Unit> {
        override val context = EmptyCoroutineContext
        override fun resumeWith(result: Result<Unit>) {
            result.getOrThrow()
        }
    })
}

fun main() {
    val lambda: suspend () -> Unit = {
        suspendMe()
        println(1)
        suspendMe()
        println(2)
    }
    builder {
        lambda()
    }
    c?.resume(Unit)
}

it will print

Suspended
1
Suspended

lambda is resumed and then suspended once more. If we add a couple more c?.resume(Unit)

fun main() {
    val lambda: suspend () -> Unit = {
        suspendMe()
        println(1)
        suspendMe()
        println(2)
    }
    builder {
        lambda()
    }
    c?.resume(Unit)
    c?.resume(Unit)
    c?.resume(Unit)
}

we will get

Suspended
1
Suspended
2
Exception in thread "main" java.lang.IllegalStateException: Already resumed

The last line is what we get when we try to resume a finished continuation.

A lot happens in this little example. Unfortunately, I cannot break everything down in just one blogpost, so, let's stick with state-machines for the rest of it.

State-Machine

The compiler turns sequential code into suspendable by using state-machines. It distributes suspending calls between states in a state-machine. The relationship between the calls and the states is almost one-to-one: each call gets a state, but not each state gets a call. The state ends with the call, and the compiler places all instructions preceding the call in the same state before the call. It places all instructions after the last call before (implicit) return statement in a separate state.

For example, having

dummy()
println(1)
dummy()
println(2)

the compiler splits the code in the following way

==========
dummy()
----------
println(1)
dummy()
----------
println(2)
==========

where function boundaries are represented by ========== and state boundaries are represented by ----------. The compiler after splitting the function generates the following code (simplified for now):

val $result: Any? = null
when(this.label) {
    0 -> {
        this.label = 1
        $result = dummy(this)
        if ($result == COROUTINE_SUSPENDED) return COROUTINE_SUSPENDED
        goto 1
    }
    1 -> {
        println(1)
        this.label = 2
        $result = dummy(this)
        if ($result == COROUTINE_SUSPENDED) return COROUTINE_SUSPENDED
        goto 2
    }
    2 -> {
        println(2)
        return Unit
    }
    else -> {
        throw IllegalStateException("call to 'resume' before 'invoke' with coroutine")
    }
}

Then it puts the state-machine inside the invokeSuspend function. Thus, in addition to the usual for lambda captured parameters, <init> and invoke, we have label field and invokeSuspend function.

At the beginning of the function label's value is 0. Before the call, we set it to 1. During the call, two things can happen:

dummy returns a result, in this case, Unit. When this happens, execution continues as if it was sequential code, jumping to the next state.
dummy suspends. When a suspending function suspends, it returns the COROUTINE_SUSPENDED marker. So, if dummy suspends, the caller also suspends by returning the same COROUTINE_SUSPENDED. Furthermore, all the suspend functions in call stack suspend, returning COROUTINE_SUSPEND, until we reach the coroutine builder function, which just returns.

Upon resume, invokeSuspend is called again, but this time label is 1, so the execution jumps directly to the second state, and the caller does not execute the first call to dummy again. This way, the lambda's execution can be suspended and resumed. Thus, the lambda is turned into a coroutine, which is, by definition, is a suspendable unit of code.

That is the reason why we need to turn linear code into a state machine.

On a closing note, the state machine should be flat; in other words, there should be no state-machine inside a state-machine state. Otherwise, inner state-machine states will rewrite label, breaking the whole suspend-resume machinery and leading to weird behavior, ranging from ClassCastException to infinite loops. Similar buggy behavior happens when several suspending calls are in one state: when the first call suspends, and then the execution resumes, skipping all the remaining code in the state. Both these bugs were quite frequent in the early days of coroutines inlining.

It the following post I will explain continuation passing style, which is the reason to use coroutine builders to start a coroutine (in a broad sence, of course), and the resumption process.

CC&M