H ow does a job’s completion affect the other jobs in its hierarchy tree?

Photo by Alisa Anton on Unsplash

One of the main focuses of Kotlin coroutines is to provide a sensible approach for a concept referred to as Structured Concurrency. Essentially, this refers to the idea that all coroutines exist within a scope. This scope is responsible for coordinating the completion of its children coroutines, so that processes are never accidentally leaked and failures are never implicitly ignored. The relationships built between coroutines using these scopes will condition when they start and end.

Job States

All coroutines implement the Job interface, which models the life cycle of the coroutine. In this interface, the state of the coroutine is represented by the combination of three Boolean properties, isActive , isCompleted , and isCancelled . The full list of possible states is available in the documentation.

Coroutine Job States

The execution of a coroutine leads to transitions between these states. It can finish in one of two completed states, either due to finishing successfully (Completed state), or due to being cancelled by either a cancellation or a failure caused by an exception (Cancelled state).

Coroutine Job State Transitions

There are therefore three different events which can lead to a terminal state.

Succeed

Fail

Cancel

A Completed state is any of these states.

Parent & Child Job Relationship

Coroutine life cycle events are propagated throughout the child-parent job relationship tree. The documentation of the Job interface explains how this relationship works.

Jobs can be arranged into parent-child hierarchies where cancellation of parent lead to an immediate cancellation of all its children. Failure or cancellation of a child with an exception other than CancellationException immediately cancels its parent.

Different life cycle events can therefore be conceptually seen as events propagating in different directions.

Cancellation is propagated from parent to child. A job’s cancellation causes its children to cancel, while a child’s cancellation (when it isn’t due to a failure) won’t affect its parent.

Completion due to a successful or due to a failure is propagated from child to parent. A parent’s successful completion is conditioned by all of its children’s successful completion, and a child’s failure is notified to its parent which may cause it to cancel depending on its policy.

Parent-Child Interactions

Job Creation

In the API of the kotlinx.coroutines library, there is no method available to obtain a job’s parent, however there is a children property which can be used to obtain a job’s children. The documentation of this property explains how jobs are created with a relationship to their parent.

A job becomes a child of this job when it is constructed with this job in its CoroutineContext or using an explicit parent parameter.

Jobs are created using one of three methods, which each offer parameters to declare the parent.

Coroutine builders such as launch or async . The coroutine created by the builder will inherit as its parent the Job in the CouroutineContext of the CoroutineScope that the builder was called on, unless it is overridden by a Job in the CouroutineContext passed in the context argument.

or . The coroutine created by the builder will inherit as its parent the in the of the that the builder was called on, unless it is overridden by a in the passed in the argument. Factory methods for jobs Job and SupervisorJob . The parent job is explicitly provided as a parameter.

and . The parent job is explicitly provided as a parameter. Scoping functions coroutineScope , supervisorScope , and withContext . These suspend functions create a new scope, and propagate the completion result of this scope through the return of the function, rather than propagating it to a parent scope.

The documentation of the children property also details the consequences of a parent-child coroutine relationship.

A parent-child relation has the following effect: - Cancellation of parent with cancel or its exceptional completion (failure) immediately cancels all its children.

- Parent cannot complete until all its children are complete. Parent waits for all its children to complete in completing or cancelling state.

- Uncaught exception in a child, by default, cancels parent. In particular, this applies to children created with launch coroutine builder. Note, that async and other future-like coroutine builders do not have uncaught exceptions by definition, since all their exceptions are caught and are encapsulated in their result.

For failures, the behavior described above is the default one. A different behavior is observed when using a supervisor job created using the SupervisorJob factory method or the supervisorScope scope builder.

Job Success

The join method is used to wait for a job to complete, regardless of whether it’s successfully or not. It doesn’t throw an exception if the child failed.

The join statement suspends until the coroutine completes, which happens after a delay of 1s has passed.

If a child coroutine is added, the parent will wait for it to complete before itself being able to complete.

The join statement still needs to wait for the 1s delay in the child coroutine to have elapsed before continuation. If launching more children within children coroutines to increase the depth of the parent-child tree, the root coroutine waits for all of its children’s children to complete before completing.

Coroutine scope builders like coroutineScope provide a more explicit way to do essentially the same thing. The scope builder function suspends until it completes. If child coroutines are launched within the scope, then coroutineScope suspends until they all complete.

The coroutineScope statement suspends until the delay of 1s has passed.

Jobs created using the Job factory method currently can complete successfully because the factory method returns a CompletableJob implementation for which the complete function can be invoked.

Job Cancellation

Calling cancel on a job, or cancelAndJoin to simultaneously wait for completion, triggers its cancellation.

After 1s, the cancellation signal is sent to the coroutine, causing the delay method to throw a CancellationException . A cancellation is just a specific type of exception which is treated differently from a failure.

This cancellation is immediately propagated to children coroutines, if there are any.

After 1s, the cancellation signal is propagated to the child coroutine, causing the delay method to throw a CancellationException .

Even after a cancellation, the parent coroutine still hasn’t completed until all of its children are completed. The call to cancelAndJoin therefore suspends until the cancellation of all the children has finished.

If a child coroutine cancels, it has no effect on the parent other than allowing it to complete successfully once all of its other children are completed.

Job Failure

If job without a parent fails, the result depends on the type builder that was used to create it. The section of the Kotlin coroutines guide on Exception Handling explains that coroutines have to be divided into two types with regards to their behavior on failure.

Coroutine builders come in two flavors: propagating exceptions automatically (launch and actor) or exposing them to users (async and produce). The former treat exceptions as unhandled, similar to Java’s Thread.uncaughtExceptionHandler , while the latter are relying on the user to consume the final exception, for example via await or receive […].

The failure therefore depends on the type of builder used to create the coroutine.

Coroutine Builder Failure

When a coroutine started using launch fails without a parent job, the CoroutineExceptionHandler is invoked, using the platform default if none is provided in the context.

Launching the coroutine from GlobalScope uses an empty context, without a parent job and using the default dispatcher and exception handler. On Android, the default exception handler causes the application to crash. On other JVM implementations, the default behavior is to log the exception to the console.

However, when a coroutine started with async fails without a parent job, the exception is silently ignored. It is expected that the exception is later handled when calling await .

The exception can be caught and handled without leaking to the application.

All coroutine builders like launch and async create a new coroutine scope, given as receiver to the block passed as argument. The Job within this scope inherits as parent the job of the scope that the builder was called on, if it exists. These scopes implement the default behavior, therefore they will propagate any errors to their parent.

The failure of a child coroutine started with launch is therefore always propagated to its parent.

The failure of the child coroutine causes its parent to fail as well.

The behavior is identical when the child of launch is created using async .

When a child coroutine fails, it propagates the failure to its parent. The default behavior of the parent is to treat this as failure of itself. The behavior of whether or not the uncaught exception handler is invoked therefore only depends on the root coroutine which failed, not the child which failed.

The uncaught exception handler in the current context is only invoked when there is no parent, so adding an uncaught exception handler on a child is useless.

This report shows tests which fail when the uncaught exception handler is invoked.

A parent’s failure a specific case of a cancellation, therefore it leads to a cancellation of all its children. This leads to the fact that a failing coroutine will cancel all of its siblings.

When a coroutine fails, its siblings are cancelled by the parent, then the parent itself completes with a failure.

Factory Job Failure

A job created using the factory method Job fails if one of its children fails. In that case, the failure behavior depends on the type of child which failed. A child created using launch will cause the uncaught exception handler to be invoked, while a child created using async will fail silently, similarly to the case when these children are launched without a parent.

This report shows tests which fail when the uncaught exception handler is invoked. The supervisor behavior is identical in this regard to that of the regular job.

However when a failing child causes the parent job to fail, it will not only cancel all other children but it will also fail the parent, making it impossible to launch any new coroutines using it as parent.

The second coroutine isn’t started, because the first one caused the job to fail.

As explained in the documentation of the SupervisorJob factory method, it implements its own custom policy for handling children failures. The core idea of a supervisor is that it doesn’t throw an exception when one of its children fails.

- A failure of a child job that was created using launch can be handled via CoroutineExceptionHandler in the context. - A failure of a child job that was created using async can be handled via Deferred.await on the resulting deferred value.

This report shows tests which fail when the job has failed. The failure or cancellation state of a job is examined using the isCancelled property.

In that regard, the behavior only depends on the type of job. A job created using SupervisorJob won’t fail when one of its children fails.

Coroutine Scope Builder Failure

The purpose of scope builders such as coroutineScope is to create a new parent job in which to launch coroutines, in order to avoid failing the current context’s job. This way, the failure of a coroutine launched in the newly created scope causes the scoping method to throw an exception instead, which can be caught.

This report shows tests which fail when the scope builder throws an exception.

Scopes created using supervisorScope don’t throw an exception when one of their children fails, however they do throw an exception when the coroutine created within this scope fails directly.

There is, however, an oddity with the supervisor scope. This report shows tests which fail when the uncaught exception handler is invoked.

The supervisor scope is the only scope builders which invokes the uncaught exception handler when a child started with launch fails. This is particularly dangerous on Android, where the default CoroutineExceptionHandler implementation causes the application to crash.

Job Completion & Structured Concurrency

Arguably the most important aspect to understand in structured concurrency is that coroutine failures are propagated to their parent. When a there isn’t a parent, the behavior depends on on how the root coroutine was created, using either a launch -style builder or a async -style builder. Failures within launch are dangerous, and can cause application crashes on Android.

For this reason, the style of declaring coroutine functions is important in order to establish the contract for failures. When using an extension on CoroutineScope , then the implicit contract is that the newly launched coroutine can cause the scope’s job to fail. However, suspend method may throw an exception, but they shouldn’t cause the current coroutine context to fail.

Supervisor jobs are also an important tool to manage coroutine failures. Although using a supervisor doesn’t prevent the uncaught exception handler from being invoked, it does prevent a parent job from failing due to a child failure, allowing other children to continue uninterrupted.