Task Scheduling

by Jenny Ryoo

Introduction

Task Scheduling is an important technical factor that has a big impact on system performance.

In a blockchain platform, transactions are grouped into a specific group. Therefore, to execute a transaction, it is necessary to check the batches in the block and to read and execute the actual transactions included in the batch one by one. Because all the software executes pre-programmed tasks, Task Scheduling can be described as playing the same role as the skeleton in the overall system architecture in that it performs best in the shortest amount of time.

In EdenChain, Tasks are classified as Blocks, Batches, and Transactions. Since the final consuming unit is a transaction which the platform is required to perform, the key technical task of Task Management is how to execute a Transaction.

Accumulated transactions should be executed as soon as possible. As each transaction is executed, there should be no data conflicts or deadlocks. Also, system resources should be used efficiently. Task Scheduling that performs this role should be designed while taking into account scalability and performance.

3 Types of Scheduling

The Edenchain Platform supports three types of Task Scheduling.

Serial Scheduling

Parallel Scheduling

Namespace Scheduling

Serial Scheduling is a way of stacking transactions in a queue one at a time. This approach is intuitive, very easy to implement, and has no data conflict issues. However, this is not a good approach in terms of scalability and speed. It is a Task Scheduling method used by common blockchain platforms due to avoid data collision problems.

Parallel Scheduling is a method that executes several transactions simultaneously. It is faster than a Serial Scheduling method that processes one transaction at a time, and has excellent scalability. If performance is required, increasing the number of threads for Parallel Execution has the advantage of achieving high performance relatively easily. Since the parallel scheduling method may have data conflict issues, however, the order of execution should be adjusted so that there is no data conflict between each executed transaction, rather than simply executing the transactions.

Namespace Scheduling divides transactions by Namespace and applies completely independent scheduling methods. The Task Scheduling of the two methods discussed above is fundamentally different from applying serial and parallel methods to all transactions without distinguishing between Namespaces. The Namespace method can be thought of as a completely separate blockchain network because it processes the Namespace value contained in the Transaction Header and transfers the transaction itself to the corresponding Namespace Computing Zone.

These three types of Task Scheduling have their own advantages and disadvantages. However, when considering the scalability and performance requirements of the system, we conclude that Parallel Scheduling and Namespace Scheduling should be adopted.

This is quite intuitive, and the two methods can fundamentally perform multiple transactions at the same time, thus achieving both performance and scalability.

Technical Issues in Scheduling

There are two major technical issues that need to be addressed in order to achieve the desired performance and scalability of Task Scheduling. These are all issues directly related to Parallel Scheduling.

The first problem is the problem of data concurrency.

As specific data in a transaction is manipulated, the data should not be altered by other programs or by any transaction processing. It may seem natural to change the value of the data, but if you change the value of the data elsewhere, or if the value of the data required for the transaction changes during the transaction, it can cause serious data problems.

The basic idea of ​​solving data concurrency is to process data associated with transaction processing in one place, that is, to process one transaction at a time. For this reason, Serial Scheduling is widely used.

However, as mentioned earlier, it is not possible to use this approach because it has performance and scalability problems. In order to prevent data concurrency in Task Scheduling, it is necessary to design an architecture that can handle related tasks as serial and other tasks as parallel. A method to separate the tasks is using to use notions such as Namespace or to process transactions by properly isolating them in a Task Scheduler after including related transactions.

The second problem is a data lock problem.

You can think that the primary benefit of Parallel Scheduling is that there are no data concurrency problem if you split tasks into unrelated transactions and execute them in parallel.

However, from a system point of view, locking is required due to the presence of functions other than transaction processing, i.e. block creation, consensus, and state change. These locks cannot be taken lightly because they affect the entire system, not just the transaction.

For example, if you need to change a state in a transaction, you need to lock it so that its value cannot be changed from any other part of the entire system. Even with Parallel Scheduling, all Task Schedulers must stop and wait for the task to finish. Otherwise, the state data may be mismatched and the entire data may be corrupted.

This problem is more complicated than Data Concurrency, and it is often encountered in developing real applications. Developing blockchain applications on the EdenChain platform requires the use of Transaction Processors using the EdenChain API. Developers may need to write locks according to the logic contained in the Transaction Processor. If the execution time is long or an exception occurs, a deadlock may occur and the system may be brought to a halt state.

Currently, Edenchain supports two types of serial scheduling and parallel scheduling, and we are planning to develop and support Namespace Scheduling in the future.

Namespace scheduling separates transactions according to their Namespace, so there is no problem of data concurrency, and serial scheduling and parallel scheduling can be selectively applied as needed.

The issue to consider when implementing Namespace Scheduling is that it is an important technical decision as to whether or not to separate State by Namespace. This is because separating by State yields a block-chain network that is independent for each namespace. If a blockchain network is run by namespace, the Edenchain platform infrastructure operations will also be costly and maintenance-intensive.

Since there are always pros and cons for any choice, we plan to establish the theory and then choose an appropriate method by conducting experiments.