Scenarios

Existent imbalanced data/gas spent on shards

Migration of data from one shard to another for other technical/business reasons

New data from off-chain sources / Eth1

For the purpose of solving this problem we will use dType (dType (Decentralized Type System) on Ethereum 2.0) and the fact that it now has a count function for stored items. The count function is not ideal, because it does not measure the actual storage cost, but it gives enough approximation for the current purpose.

Problem Statement

Given a number of shards shard_count , each loaded with a certain shard_load and a number of dType storage contracts dtype_count , each with a certain dtype_load , we need to find a way to balance the loads across shards.

The result is a list of shards, each with a list of dtype IDs that should be added to that shard and the final data load of that shard.

Solution

The Python code for this solution is: (it can also be read at https://github.com/pipeos-one/dType/blob/f28bc63377f0565b1809c4dd4842242ce25dbd73/docs/research/Data_Load_Balancing_of_Shards.ipynb)

import random shard_count = 20 dtype_count = 50 # # Increase average_coef if there are not enough shards for all dtypes average_coef = 1.3 max_shard_load = 400 max_dtype_load = 2000 # Initialize random load values for shards and dtypes shard_loads_initial = list(enumerate([random.randrange(i, max_shard_load) for i in range(shard_count)])) dtype_loads_initial = list(enumerate([random.randrange(i, max_dtype_load) for i in range(dtype_count)])) shards = [[] for i in range(shard_count)] next_index_s = 0 next_index_dt = 0 last_index_dt = len(dtype_loads_initial) - 1 last_index_s = len(shard_loads_initial) - 1 # Sort loads: ascending for shards, descending for dtypes shard_loads = sorted(shard_loads_initial, key=lambda tup: tup[1]) dtype_loads = sorted(dtype_loads_initial, key=lambda tup: tup[1], reverse=True) # Calculate average count per shard average_load_shard = (sum(i[1] for i in dtype_loads) + sum(i[1] for i in shard_loads)) / shard_count average_load_shard *= average_coef print('average_load_shard', average_load_shard) # Move heavier than average dtypes on the least heaviest shards for i, dload in dtype_loads: if dload >= average_load_shard: shards[next_index_s].append(i) next_index_s += 1 next_index_dt += 1 # Pair heaviest dtypes with lightest shards # and add as many light dtypes on top, as possible for i, dload in dtype_loads[next_index_dt:]: if last_index_s < next_index_s: print('Needs more shards. Increase average_coef'); break # Add the next heaviest dtype to the next lightest shard shards[next_index_s].append(i) # Add as many light dtypes as the average_load_shard permits load = shard_loads[next_index_s][1] + dload + dtype_loads[last_index_dt][1] while last_index_dt > next_index_dt and load <= average_load_shard: shards[next_index_s].append(dtype_loads[last_index_dt][0]) last_index_dt -= 1 load += dtype_loads[last_index_dt][1] next_index_s += 1 next_index_dt += 1 if next_index_dt > last_index_dt: break print('(shard_index, shard_load, dtype_indexes)') final_shards = [(shard_loads[x][0], sum([dtype_loads_initial[dtype_index][1] for dtype_index in shards[x]]), shards[x]) for x, _ in enumerate(shards)] print('final_shards', final_shards)

Conclusions

We may need more data about gas/storage costs for this algorithm to be effective. Extended usage of dType will also improve the outcome.