A programming and hobby blog.

Futures Should Be Composable

Recently, and for the first time in my career, I have the opportunity to implement a large, highly concurrent application. After seeing a lot of interest in Java’s new Virtual Threads, I decided to try them out. I’ve done a decent amount of concurrent programming, so I felt like it would be relatively easy to get the project started and grow it. From what I encountered, Java’s Future abstraction is not up to the task.

Without too much backstory of what I looked at, I made a bet: Futures and blocking are the right abstraction. This means:

Avoid any sort of callback hell. Generally, blocking get() calls are the right way. This is the same bet that Golang makes, with blocking being the norm. Lean into the scheduler to make the code work.
Skip flow control. Reactive Java, like Mono and Flux are work arounds for the Java problems of a decade ago. The stack traces are impossible to understand, and the exception handling in general doesn’t mesh with the rest of the language. It served its purpose, but avoid it.
Avoid API dependence on CompletableFuture. This class is bloated to the max. Every time I want to call a method I need to look at the docs, then get mad the code has nearly zero Javadoc, Look up the CompletionStage for the specification, and finally scratch my head as to how it’s subtly different than nearly identically named methods nearby. Also, CompletionStage is practically impossible to implement, and is missing all the useful methods of Future.

Thus, I decided to make Future<T> the standard return type and interface of choice for my code.

Threads and Futures

In Java, Futures were designed around a thread pool. The idea being work would be scheduled onto an Executor(Service), and return a Future result which would eventually be populated by another thread.

Unlike Futures or Promises in other languages (notably Javascript), Futures in Java had the concept of being associated with a thread. To see why this is interesting, take a look at this method on the interface:

public interface Future<V> {

    /**
     * Attempts to cancel execution of this task.  ...
     */
    boolean cancel(boolean mayInterruptIfRunning);
}

Two things are interesting here:

Futures can be cancelled. Other languages and libraries often do not support cancellation. This is a special feature.
Interruption. Java has a special feature of Threads called interruption, which allows other threads to ask a sleeping thread to wakeup, and possibly stop waiting for some event.

This is a very useful thing to have, as it means we don’t have to commit to doing the work in the thread, should things change. As far as I have seen, almost no other programing model has this as a core part. Additionally, to implement this yourself, it would be challenging. As a quick thought experiment, I would ask the reader to think how they would implement this? Using something as basic as pthread_cond_signal and pthread_cond_wait require holding a lock, which Java’s implementation doesn’t! How did they do it? (See the link above for an explanation of the magic).

Thus, Futures, at least when introduced originally, strongly implied attachment to a thread.

Threadless Futures

As of Java 25, there are three main implementations of Futures in the JDK:

FutureTask. This is both a Runnable and a Future, and is intended to be extended. It holds a reference to the “runner” Thread, which is mounted and unmounted when the task runs and completes, respectively. Cancelling this Future can interrupt the runner Thread.
CompletableFuture. While I think the implementation is way overengineered, it is the more powerful of the implementations. It is full featured, and has a solid, reliable way to chain work together.
ForkJoinTask. This happens to be a Future, but I haven’t seen anyone seriously use it as one. I mention it here for completeness, but it’s more meant for Fork-Join style work, and less for complex, heterogeneous work items.

CompletableFuture is the main implementation of interest, since it is capable of building a general DAG of computation.

Consider the above. CompletableFuture, hereafter “CF”, is a general purpose computation tool. The dependency graph between Future stages is built dynamically, meaning the whole graph is not known ahead of time. Each CF can be used to notify multiple downstream CFs. Two CFs can be used to complete a single downstream CF. The key takeaway here is that any individual CF does not know what other CFs depend on it.

A consequence of this design decision is that cancellation doesn’t have a clear meaning for CFs. What does it mean for a CF to be cancelled, and the mayInterruptIfRunning bit is set? The CF may be a combination of many other CFs. There may be no thread at all attempting to fulfill a particular CF. The linkage between a CF and a Thread is weakened. As a result, CompletableFuture does not cancel the underlying thread. (because there may not even be a thread)

Cancellation and Bi-directionality

Is giving up cancellation that big of deal? Well, maybe. In the world that CF was born into, threads may not have played as big a role. CF is decidedly push based, despite its predecessor being pull based. As computations complete, they pop their Treiber Stack of dependent CFs and fulfill them. Each downstream CF in the DAG is completed, usually on the thread that is completing the current CF. (As an aside, this is one of the reasons there are a jillion overloads in CF; they needed a way to schedule the downstream “callback” work potentially on a different thread.) Keeping track of which thread is doing the async work may not have been that valuable. Since the idea of a thread working hard to fulfill a future is gone, where’s the need to interrupt the thread?

Enter Virtual Threads. It’s now possible to have as many ~~Goroutines~~ ~~Green Threads~~ ~~M:N Threads~~ Virtual Threads as you want. They can all block without consequence waiting for CPU or IO bound work to complete as they patiently await to fulfill a Future. The idea and value of cancellation now seems more tenable.

CompletableFuture and Chaining

Let’s look at how CFs chain together, in a simple, unidirectional chain.

When CF 1 completes, it notifies (and completes) CF2. When CF2 completes, it notifies CF3. The flow is from left to right. Control flow only goes one direction. Consider the following snippet of code:

// Build the HTTP Request
CompletableFuture<HttpRequest> requestFuture = 
    CompletableFuture.completedFuture(request);

// Issue the request
CompletableFuture<byte[]> httpResponse = 
    requestFuture.thenComposeAsync(
        req -> fetchHttp(req), executor);

// Validate and convert the response
CompletableFuture<MyObject> parsedResult =
    httpResponse.thenComposeAsync(
        rawJson -> validateAndConvert(rawJson), executor);

System.out.println(parsedResult.get());

Each stage depends on the previous one.

Why Cancellation Matters

Using the snippet above, instead of printing the result, suppose the parsedResult CF is returned to a caller. Also suppose that the caller is an RPC, and the RPC is cancelled for whatever reason. We want to cancel the work being done to avoid consuming memory and threads. How well does this work?

Despite CFs being chained together, they are only chained in one direction! Whoever cancels the CompletableFuture<MyObject> parsedResult object, it won’t stop the HTTP request. The parsing future, which has yet to be assigned a thread, has no way to indicate that the upstream result is no longer needed. In a sense, dependency is a singly-linked list, with no way to get back to the original CF.

You might suggest that this linkage be added, and the CF class could be made to propagate cancellation of a downstream CF to the upstream. However, this is where the DAG property bites us. Consider the following, legal, CF chain:

Cancelling one of the downstream CF’s doesn’t mean the otherones should be.

Without properly cancelling futures, it means that there is a risk of consuming limitted resources. While it may be okay to do a little extra work if the RPC client cancels their request, it’s not okay to consume all threads and connection pools on responses that will never be seen. (In my own work, we saw this result in an OOM due to a runaway executor that kept adding threads.) Cancellation matter for stability.

Bi-directionality

When thinking through a solution to this problem it becomes obvious that it can’t be solved by just added a cancel listener to each CF. Someone will eventually forget to add it and drop the link. The real problem is that the implementation of CF, and the general interface contract of Future, don’t afford it. Futures do one thing well: defer execution. However, this is not enough. The true problem is that only results and exceptions flow from one future to another but not the consumer’s interest in the result.

I have to say I unfairly judged Reactive Java here, with their fully featured cancellation and flow control mechanics. Originally I had written them off because flow control is only a seldom useful feature, and primarily between systems, rather than inside them. That said, flow control is another “consumer interest” signal like cancellation. I guess the implementers saw that cancellation and flow control nicely unified into a “subscription”, and added both. I still maintain that flow control is overkill with their request(n) call, but I can clearly see the value of cancellation propagation.

We do need bi-directionality.

Composability

Given the above history and problems, I now bring my full request: Futures should be composable. CompletableFuture did a decent job of composition for downstream dependence. However, it is not enough. We need a way to formally describe the cancellation semantics of asynchronous computation. It is an error-prone pain in the ass to write this every time:

CompletableFuture<HttpRequest> requestFuture =
    CompletableFuture.completedFuture(request);

CompletableFuture<byte[]> httpResponse =
    requestFuture.thenComposeAsync(
        req -> fetchHttp(req), executor);
httpResponse.whenComplete((_, _) -> {
    if (httpResponse.isCancelled()) {
        requestFuture.cancel(true);
    }
})

CompletableFuture<MyObject> parsedResult =
        httpResponse.thenComposeAsync(
            rawJson -> validateAndConvert(rawJson), executor);
parsedResult.whenComplete((_, _) -> {
    if (parsedResult.isCancelled()) {
        httpResponse.cancel(true);
    }
});

Manually wiring cancellation is not sustainable.

Execution Context

One additional concern is how execution context is propagated along. In my case, we are using gRPC. By default, gRPC Java propagates RPC cancellation and deadlines through a thread-local Context object. One idea for propagating cancellation is to just wire through the cancellation signal to the root of the dependency tree. For example, if the client RPC triggered the code above, but then went away, maybe only the end of the dependency chain needs to be cancelled. If the fetchHttp() call just checked the thread local gRPC context, all the chained futures between it and the final consumer parsedResult, could be ignored. The root would transitively cancel all the others.

The problem here is in how CF delegates work to the executor. Each dependent execution stage in CF only triggers on completion of the source CF. This means the original calling context has been lost by the time work is scheduled on the executor! To be specific, suppose that CompletableFuture<HttpRequest> requestFuture was not an immediate, but instead had to be asynchronously loaded. When it finishes and schedules the HTTP call work, it may do so on it’s thread, or it may do so on the caller thread. We don’t know. The original gRPC context won’t be propagated to other threads, since we don’t know how that work was scheduled. In other words, there is no reliable way to make sure that the calling context is propagated to the async work.

This is why we need full composition with Futures. Between cancellation, deadlines, and execution context, it’s verbose and error-prone to pass these along reliably.

For those of you designing your own languages and libraries, consider these problems carefully! CompletableFuture can be used, but it can’t be re-used. When you make your implementation, make it so that the right thing is the default usually, and custom or specialized behavior doesn’t become onerous.

The End Of Software Engineering: The Advent of Vibe

(Before we begin, I like the idea of robots, which we now call AI, enabling humanity’s progress. The people in this post are not the problem, it’s the incentive structure. This post is a lament, but not yet a eulogy for the software engineer’s way of life that is going away.)

My team at $currentCompany has been using Claude and other AI tools recently to build a project that is beyond my comprehension. As part of a team, I feel both a desire to help others, but also an accountability for when things go wrong. As I see my team lean more into AI generation of code, both of these team oriented feelings are evaporating. I feel a growing disconnect about what they are working on, and no stock in the success or failure of their project. (Agents project?) As a software romantic, I can’t help but feel emotionally torn about the consequences of AI, as it seems to offer both incredible ability, but at the cost of all our dearly earned practices.

The premise of the problem is simple. We must justify our salaries, and we must have something to work towards promotion, so a new big project it is. However, how can we show enormous and lightning fast impact? As our fearless leaders have pronounced, the answer is to use AI to write our code.

What does that mean?

In my case, the answer is “writing” lots of code. Code, whose authorship is not quite certain. Code which is by most humans’ reading is distasteful. Code which does, in fact, fulfill the desire of the human who asked for it. For those of you who haven’t seen this yet, think about all the AI art you have seen over the past 2 years, and then imagine it as code. Most people I know consider AI art to be somewhere in the Uncanny Valley. So we have lots (thousands of lines a day) of code written and being checked in.

Many companies expect their software engineers to engage in code review. It’s frequently a legal requirement. Many human beings would agree that the practice of having someone else read, review, and provide commentary on code is a good thing. But here is where I see our way of life beginning to change. Let’s check our assumptions. Why do we think code review is valuable?

Code review spreads the knowledge of what one human is working on, with the rest of the humans on the team. The other humans know, at least a little, what is changing. If something is hard to understand (for a human), they can provide that feedback to the author. The knowledge flows from the reviewer to the author too! A [senior] reviewer can provide feedback on possibly better ways to doing things, either through different structure or different APIs. Knowledge is spread between the humans, and everyone increases in skill as a result.

With AI generated code, that’s all gone. The author vibe codes 1,500 lines of something, sends the PR out for review, and then submits at the first sign of approval. Does the [human] author understand what it does? Well not really, but that’s okay. Our $fearlessLeaders said it was okay to do it. You’re not going to directly contradict them, arrrrreeee youuu?

So curmudgeonly me provides feedback on 1,500 lines of mystery meat. Another second reviewer comes in and approves the whole mess, and the code is submitted without any knowledge pollination. My feedback is, at best, ignored. The old way of doing things. Understanding. Providing. Learning. At worst, it’s being used to replace us.

In a subsequent PR, I read through countless lines of slop, I provide my thoughts. 14 years of hard earned battle experience. Surprisingly, the author takes my feedback seriously, and sends out the next commit in minutes, completely addressing the 45 minutes I took to reason through the mess. The “author” scrapes the GitHub comments, feeds it into their agent, applies the changes, and sends it right back out. No need to spend time learning, arguing, or disagreeing. I’m absolutely right!™

It’s at this point I realize my feedback is not valuable. I’m not working to help improve my teammates or improve the code. I’m being used to train my replacement. Any more words that I say are basically going to be used against me. Those 14 years of being a hard worker don’t seem so valuable now. The “author” of the code is merely a proxy to the agent.

There’s a deeper problem here though. Let’s check our assumptions. We assumed that writing good, easy to maintain, easy to understand code, is a good thing. But why? If humans are going to be maintaining and modifying the code, it is a good thing! But that’s not the future. The machine is capable of digesting all knowledge, all code, all things ever written. And it does not forget. So why is good code needed, when the machine can keep track of everything? A machine can remember all things, and keep an enormous working set in its digital brain. It doesn’t need to know “why did someone write the code this way?”, like a human would.

The conclusion is that “good” code, is really just good-for-meat-bags code. Since AI lacks our weaknesses of limited brainpower, it can re-absorb everything in a moment. Consider the case where you have joined a team with a 15 year old code base, and the code evolved from tens of amateur programmers, to the point it’s a hot mess of undebuggable garbage. And your manager wants you to add a big, complex feature, in the next 7 days, or else. You have no hope! You might as well take some vacation days because there is no way you’ll untangle the Gordian knot of pig-shit code with your puny, software engineer, brain.

But with AI, that isn’t a problem any more. Bad code is no challenge at all. There is no problem to make sweeping changes. The goal-oriented approach of agentic development means that we can verify that the new code delivers the feature. Why bother “reviewing” code, when it can be “fixed” with an utterance of agent prose?

Here lies the deeper problem. With AI, it can keep track of more details than you or I ever could. It can know all things. It will write code that exceeds both your and my ability to understand it. It meets the goals, but humans can no longer grok it. AI writes more code, it’s harder and harder to understand for humans, and thus humans become less and less involved with code progeny. Play this game out over hundreds of iterations: the only way we can interact with code from now on is through the agent. In effect, it becomes the only entity able to code. And the longer this goes on, the longer only it knows WTF is happening.

I have to return here to my central premise: that our way of life is going away. In the words of Scarlett O’Hara: “Where shall I go? What shall I do?” How we adapt to the new world isn’t clear. Even being experienced and wise does not seem to be enough. My experience is being used by my successor. But unlike a teacher or a parent, there is no relationship being formed. It’s hard to see how I provide value in the future. I don’t think our way of life, as experienced software engineers, is going to stick around much longer. We are going to be sucked dry by the machine, or left by the roadside as the sacrificial lambs re-purpose our work, one last time.

Footnotes:

I don’t mean to criticize the people. It’s the incentive structure that’s being setup that’s to blame.
Using AI, or any code generation, is fine, as long as it isn’t abused. The machine is subordinate to the human; it must not subsume the human. When people attach their name, and their face, and their brand, to work that isn’t theirs, their identity is diminished. The unique, individual, personhood becomes less than one, as they blend into another’s.
Either write the code yourself, or the test and validation code yourself, but not both. If the test taker and test administrator are the same, it creates real hazard. There is temptation to skim the logic and assume you know what and why it works.

InterpreterPoolExecutor - We Need PEP 734

Python programs can sometimes be compute bound in surprising ways. Recently I tried refactoring a program that downloaded 4 JSON files, parsed them, and made them available to be used in a larger program. When I rolled out my “improvement”, it actually made the code slower, and I had to quickly fix it. How could have I avoided this?

What We Should Expect from a Good Program

A few things would make our lives easier. Python has not traditionally made the following easy, but we are right on the cusp of having our cake and eating it too. Here’s what I would expect from a good program:

Easy to Parallelize. If the code is slow, we should be able to split it up.
Easy to Profile If the code is slow it should be easy to figure out why.

Let’s see if we can get both at the same time.

Hard to Parallelize

The original authors had used os.fork() to acheive parallelism, which has problems. I assumed that this was to avoid using threads directly, or some other reason, but it turned out to not be the case. “Downloading some JSON and sticking it in Redis? That’s definitely IO-bound”. Wrong. The JSON parser in Python is very slow. To the point that trying to download and parse all 4 versions ended up taking more than 60 seconds. The refresh interval for this code was only 1 minute long. When I replaced the fork-based code with a ThreadPoolExecutor, the code started taking minutes to nearly hours to finish. It seemed IO bound, but it was actually CPU bound.

Hard to Profile

A more seasoned engineer might point out that I should have profiled this code before trying to “optimize” it. However, Python only recently gained the ability to integrate with perf. Unfortunately, the implementation creates a new, PID-named file, at an unconfigurable location, each time the procress starts. In a fork-based concurrency world, that’s a lot of PIDs. And because these perf-based files aren’t small, it runs the risk of maxing out the disk of the server you are profiling on. Secondly, these forks flare into, and out-of existence quickly (i.e. seconds), so it’s hard to catch them in the act of what they are doing. A long lived process would be much easier to observe.

And Still Hard to Parallelize?

When I replaced my ThreadPoolExecutor with a ProcessPoolExecutor, this problem reared its head again. Because the processes associated with the pool aren’t associated with the tasks, it’s hard to identify which processes to profile. The same problem exists; tracking down all the PIDs associated with my pool is trickier. Secondly, switching from ThreadPoolExecutor to ProcessPoolExecutor is not straightforward. All the functions and arguments now need to be Pickle-able, meaning things like references to class methods no longer work.

Parallel, Profile-able Python

Python 3.14 adds a new module and APIs for creating sub-interpreters. (e.g. InterpreterPoolExecutor) Significant work has gone into CPython to make the Interpreter state a thread-local, meaning it’s possible to run multiple “Pythons” in the same process. This helps us a lot because it means we can get the parallelism we want, without the system overhead of running multiple processes. Specifically:

There’s no overhead of starting up multiple processes. Processes can share Page tables, Signal Handlers, file descriptors, and so on.
PIDs are way more stable. The Process ID of the parent thread is the same as the ID of the child (sub) threads.
Memory sharing (is | will be) easier. Rather than have to convert from Python objects in one interpreter to a serialized (cough Pickle cough) form, it will be much easier to synchronize with other workers. (also shout out to Ray which has done the hard work to make this sharing a lot easier).

The multiple-runtimes-in-one-process model is not new, with the most notable example being NodeJS. But, it is a greatly welcome addition to Python. Given the amazing improvements in GIL removal and JIT addition in Python 3.13, Python is becoming a much more workable language for server development.

More Thoughts:

You can find me on Twitter @CarlMastrangelo

Carl Mastrangelo