Not my area historically, likely missed some conversation, but was motivated to start compiling the chaotic wide spread disucssion, and to surface some things I liked about other languages but couldn’t find in discussions. Credits to Dean for CT “motivational” tweet, getting more people to look at solving problems. Try it yourself some time, it’s fun ;).

Atomic cross-shard transactions

Problem: shards are isolated, but transactions are meant to either execute fully, or not at all.

Also referred to as “Train and hotel problem”.

Problem extension: If you take an approach where you “reserve” (lock) some resource, how do you:

Speed up the unlocking. Instant case = synchronous cross-shard transactions

Avoid blocking other actions that could have proceeded. Design of locking and/or nonces

Allow out-of-order transactions, fundamental to a fee market (nothing to outbid otherwise)

Existing approaches

In no particular order:

Step back, simple approach

This is a re-hash of all of the above ideas, simplified for the fast crosslinking we have:

We have fast cross-shard 1-slot state roots now (in healthy shards case).

The locking can be improved with this: timeouts are less scary, single slot cross-shard atomic transactions are easily achieved with locks. With no strain on the beacon chain other than regular shard linking.

Steps:

Slot N-1 : Start tx. Lock resource on shard A, with conditions: Commit that a section of tx data must be succesfully processed on shard B the next slot. Stage the change, resource can not be read/written while staged.

: Start tx. Lock resource on shard A, with conditions: Slot N : whoever continues the transaction on Shard B can simply make it check if shard A had the necessary resource locked on Slot N-1 , with the right modification to the resource in the state. (thanks to fast crosslinking). And no locked resources on shard B.

: whoever continues the transaction on Shard B can simply make it check if shard A had the necessary resource locked on Slot , with the right modification to the resource in the state. (thanks to fast crosslinking). And no locked resources on shard B. Shard A staged change is: Persistable by anyone who can proof that the tx data was successfully processed within time. Undoable by anyone who can proof it was not

If it times out, does not continue on B, or continues on B without A being linked into view of B, it will atomically fail.

The remaining staged change on A can be fixed by anyone at any slot after N-1 who needs the resource.

Now for multiple shards / more complex transactions, you abstract everything into staged changes:

On every touched shard, stage the change and lock the resources for some special finalizing tx (on any shard)

One slot later, the locks are all checked and the finalizing tx is completed successfully or not.

Stages are either all committed or none at all, based on the finalizing tx.

I think this is a step in the right direction to:

Not focus on storage. EEs are all very different, and resource locking can have so much more useful meaning.

Some simple time element to easily avoid long locking. It’s to the EE to decide on the right time default (or requirement) here for its users.

New proposal: capabilities

TLDR: Yes, lots of similarities with the receipts approach, but a minimal extract from existing non-receipt concurrency approaches, to have it translate better into locking/async programming, and drop the storage/balance thinking that is holding us back from making it work for general EEs.

Intro

First of all, “capabilities” are a fun but maybe bit obscure pattern of creating “unforgeable” objects. And in some contexts, they can be revoked by the creator. And then there are more variations.

Key here is that this is very tiny concurrency building block that is designed for “isolates”: systems running completely separate from eachother, no shared memory, only learning about the others through user inputs / message passing, with messages built from object screenshots. Sounds familiar, eh?

And best of all, it’s (albeit not that extensively) implemented in one of my favorite programming languages, Dart: dart:isolate . Isolates are similar to Elixir processes, or javascript webworkers. An isolate is single-threaded and has its own memory.

Dart is very minimal, in just providing a factory for capabilities, and not attaching any properties to them. Unlike reference-capabilites, such as in the Pony programming language. Quite an obscure language, but type-safe, memory-safe, exception-safe, data-race-free and deadlock-free (There is a paper with proofs).

Now although the properties by Pony are impressive, and conceptually also very interesting, it is not as easily ported as something as minimal as Dart object capabilities, and probably too opinionated. However, reference capabilities could be fun for a safe but super concurrent EE later down phase 2.

Also note that the minimal capabilities are not only globally unique and unforgeable, they are also unknowable except when passed to an isolate through a message. In a blockchain context it makes more sense to minimize message passing by just querying some protected state, but the unforgeable and unique properties can be preserved.

For more safe-concurrency conceptual gold, this article has a nice comparison

Capability definition

Now, let’s define our own eth2 flavor capability:

(shard, EE) pairs are the actors in the system

pairs are the actors in the system A capability is owned and maintained by an actor

Either exists as part of some sparse merkle set (experimental SSZ definition) maintained by the actor, or not. Just need a root for each EE embedded in the shard data each slot to check against.

Unforgeable by other actors. A capbility is allocated with a special function. This hashes some desired capability seed v with the creator to define the capability identifier: H(v, (shard, EE)) . Repeated allocation calls for v just return the existing capability, unchanged.

by other actors. A capbility is allocated with a special function. This hashes some desired capability seed with the creator to define the capability identifier: . Repeated allocation calls for just return the existing capability, unchanged. Revocable by the owner actor. And revocation will only be effective after the slot completes. The commitment made with a capability to other shards must uphold while those shards can’t see what is happening.

by the owner actor. And revocation will only be effective after the slot completes. The commitment made with a capability to other shards must uphold while those shards can’t see what is happening. Has an implicit timestamp : it is allocated in a shard block at slot t , and it not existing at a prior slot x ( x < t ) can be proven by looking at the accumulated capabilities in x .

: it is allocated in a shard block at slot , and it not existing at a prior slot ( ) can be proven by looking at the accumulated capabilities in . compositional : on creation, the seed can be another capability. Deterministically identified by H(c, H(dest_shard, dest_EE)) , where c is the capability that is created ( H(v, (shard, EE)) ). There is no moving of capabilities, only committing to those of other actors.

: on creation, the seed can be another capability. Deterministically identified by , where is the capability that is created ( ). There is no moving of capabilities, only committing to those of other actors. publicly viewable: Any actor can check if capability H(x, H(target_shard, target_EE)) exists at some (target_shard, target_EE) at the previous slot (thanks to fast crosslinking). Note that historic capability tracking can be free: simply refer to the accumulating root at that given slot, instead of the latest root.

So not an object capability, not a reference capability, let’s dub it the “commit capability”.

Required EE host functions: allocate_capability(seed) , revoke_capability(id) , check_capability(id)

One slot locks

Now, upgrade the locking idea to use capabilities:

Slot N-1 : Start tx. EE marks a resource as locked by some capability lock = H(H(res ID, nonce), H(shard A, EE)) . Commit that a capability H(lock, H(shard B, EE)) will exist on slot N . Stage the change, resource can not be read/written while staged.

: Start tx. EE marks a resource as locked by some capability . Slot N : whoever continues the transaction on Shard B can simply make it check if shard A has capbility lock on Slot N-1 , with the right modification to the resource in the state (thanks to fast crosslinking), and that the current slot is N . The EE in Shard B produces a capability unlock = H(lock, H(shard B, EE)) to declare success.

: whoever continues the transaction on Shard B can simply make it check if shard A has capbility on Slot , with the right modification to the resource in the state (thanks to fast crosslinking), and that the current slot is . The EE in Shard B produces a capability to declare success. Shard A staged change is: Persistable by anyone who can proof that capability unlock at slot N exists. Undoable by anyone who can proof it does not exist.

If it times out, does not continue on B, or continues on B without A being linked into view of B (i.e. B cannot see the lock ), it will atomically fail: A will not be able to persist the staged change, and B aborts.

), it will atomically fail: A will not be able to persist the staged change, and B aborts. The remaining staged change on A can be fixed by anyone at any slot after N who needs the resource, as it’s public access to check the existence of the unlock capability. (A smart EE does not require state changes to deal with expired capabilities)

Synchronous but deferred

Essentially the same as deferred receipts. However, abstracting away state / merkle proofs. The EE can design that. The EE is just provided a function to register and check capabilities.

Semantically synchronous, but deferred change:

Slot N : EE on shard A stages a change by unregistered capability syn = H(H(res ID A, nonce), H(shard A, EE)) . To be persisted if synack can be found, and when found also register ack .

: EE on shard A stages a change by unregistered capability . To be persisted if can be found, and when found also register . Slot N : EE on shardB stages a change by registering capability synack = H(H(res ID B, syn), H(shard B, EE)) . To be persisted if ack can be found.

A simple syn-ack does the job here. (first syn does not have to be registered, using it as part of synack is good enough)

This can all be done in the same slot, as the execution is deferred to a later slot where the shards can learn about the capabilities published by eachother.

Chaining changes in the same slot

Simply make the EE add aditional persistence conditions when working on unconfirmed resources: last_cap ( ack of previous transaction modifying the resource of interest) needs to be registerd too. Nothing is blocking, it essentially just optimistically runs the stacking transactions, defers evaluation for a slot, to be then lazily persisted. And best of all, the EE can program the chaining however it likes, and separate resources will not affect eachother unless used together in an atomic transaction.

Similarities/difference

A lot of the space is explored, the challenge really is to iterate well, don’t opinionate it, and keep it bare minimum but powerful on protocol level.

The ability to do an existence check is similar to receipts; just provide a merkle-proof for cross shard data. However, important here is that we should only be looking to standardize the keys, not the values: a capability can be a hash or other compression of any data. It’s not about the receipt contents, it’s about the boolean property of existence of a given key. The remainder follows however the EE likes it.

So now we translated the receipts/locks/timeout/logging ideas into simple unforgeable objects, completely agnostic to EE style or state approach, that can build many other patterns too.

Building EE concurrency patterns

Simple Asynchronous Calls (async await, callback, etc.)

Define a TX as a function invocation chain as {shardX, EE_Y}.then(() => ...) (for success), {shardX, EE_Y}.onError(() => ...) (for failure), {shardX, EE_Y}.finally(() => ...) (after either success or failure completes) structure. Each chain call can be in a different shard/EE.

(for success), (for failure), (after either success or failure completes) structure. Each chain call can be in a different shard/EE. The modification of every part P in the chain is only persisted if a capability with the decision path input that describes the follow-up path is registered: Path result capability is recursively defined as: success: x = H(cap_then, 0) when then completes with success. error: x = H(0, cap_error) when then does not complete with success cap_then and cap_error are the result capability of the respective execution paths. Persisting modifications: then persisted when H(x, 0) onError persisted if defined and when H(0, H(x, 0)) (the error handler must be successful) finally persisted if defined and for any registered outcome, i.e. any of the H(a, b) options. tx fails (no-op) if the onError branch was not covered but hit H(0, H(0, x)) , and no finally was declared.

in the chain is only persisted if a capability with the decision path input that describes the follow-up path is registered: async / await /async- try are just syntax sugar for then and onError

Message Driven Approach (Actor model)

Lots of options here, but capabilities are essentially the unforgeable objects to authorize messages between EEs at low cost. That could mean a capability based memory safety model like the Pony language has. Or simply focus on messages that claim existence of capabilities unique to the message, to authorize async changes.

Two Phase Commit (Wide Atomicity)

Commit request phase: one transaction to all EEs to stage a change, that will be locked in and persistable if a certain capability is registered in the future (and optionally with a timeout to free resources when no action is taken).

Commit phase: Notify all participants that everyone is set (staged the changes), and publish the capabability to force participants to persist the staged change everywhere eventually (or keep it staged until someone actually needs the resource, but never drop the change).

Locking — Read/Write

Disable reads and/or writes for a certain resource until a capability is published

Re-enable locks when the capability is revoked.

Optionally the inverse: temporary enable things when a capability is revoked.

Contract Yanking

Yank a contract by: Register it as yanked in its old shard, declare your foreign (it will be on the yank destination shard) promised but unpublished capability to unyank it. And possibly with a deadline to unyank (contract is free to specify other conditions too: e.g. incentive to be yanked to certain shards). Others can queue their planned changes by building on your promised unyank capability. Make changes to a copy on your preferred shard. Publish the Unyank. The EE may make choose if this is a passive option (anyone can unyank when you are done after a certain action), or if you must do it yourself. The original contract is required to load the state of the new contract when the unyank frees it.



Building more patterns

Capabilities are nice for minimal concurrency legos, but also for other use-cases.

Permission systems

Instead of having to make every single EE compatible to read eachothers state (lots of duplicate EE code!), capabilities could be the “permission lego” we need. A simple “ commit <X> ” system to run with. Not conceptually new, but simple yet powerful.

Classic owner permission, but cross-shard: H("address 0xabc... is owner of resource X", H(shard, EE)) where (shard, EE) is the host of the address .

Now visit another shard, and say “here’s this address you need to trust as owner, check it”, and then the EE checks the existence of the capability.

where is the host of the . Now visit another shard, and say “here’s this address you need to trust as owner, check it”, and then the EE checks the existence of the capability. Time-out permission: H("can do X until block N", H(shard, EE))`

Permissions based on commitments are really that easy, why not?

To be explored