Spring Batch is a great tool to deal with large data volume, the project introduces the chunk-oriented processing, that divides data into batches, for each of these batches, its transaction is independent to each other. In this post, I will show you how to work with Spring Batch and distributed transaction.

Distributed transaction is invoked in this example while the ItemWriter updates records in two independent MySQL databases, changes on these records must be commit or rollback at once. On the other hand, the Spring Batch library stores execution history and meta-data into another MySQL database, this database is isolated with the first two - the execution history should be saved even though the distributed transaction need to be rollback due to any exception.

All three MySQL databases mentioned above, plus the Spring Batch program are packed and available as a docker-compose stack, and all the source code in this example can be found in GitHub.

Highlighted techniques in this post

Deploy the ChainedPlatformManager from Spring Data project to manage the distributed transaction across two MySQL databases, other alternatives includes Atomikos, Bitronix, or Narayana.

Use CompositeItemWriter to call a series of JdbcBatchItemWriter, since a chunk step only allows to have one ItemWriter, in order to update multiple databases, the CompositeItemWriter is used.

Generate random testing records with MySQL stored procedure.

Divide the testing data set into multiple chunks, and test the transaction handling while a planned error occurred in one of the chunks. For those chunks that have been processed, the records should be updated and committed, for the chunk that has an exception, all updates should be rollback.

Set up a multiple data source configuration and using it in Spring Boot.

Used GitHub Action pipeline to build and pack the Spring Boot project into Docker image.

Test scenario

To keep it simple, this example has only one Spring Batch job, and the job has only one chunk step.

First, the job reads card transactions from the Card database, then it posts these records, and updates the corresponding account balance in the Account database. Finally, the transaction record is time-stamped to mark it has been posted.

Each account is assigned with a credit limit, the total loan amount of an account cannot exceed this limit, or it will violate the constraint of account table when posts the transaction record, then the job will be aborted.

Since the job updates records in two independent MySQL databases, thus these updates must be committed or rollback at once, and these changes are managed under the same distributed transaction.

The Spring Batch’s meta-data and the chunk execution history are stored into the spring batch database, the corresponding transaction should be isolated from the Card and Account databases.

1. Define multiple data sources

I walk through the example starting from the DatabaseConfig configuration class:

The class groups all database related setting includes three MySQL data sources, the data source for Spring Batch meta-data is marked with the @Primary annotation.

For each data source, I had to specify its implementation class to HikariDataSource, without doing this, the @ConfigurationProperties annotation fails to inject the properties into the data source.

@Configuration

public class DatabaseConfig {

... @Bean

@Primary

@ConfigurationProperties("spring-batch-db.datasource")

DataSource springBatchDb() {

DataSourceBuilder builder = DataSourceBuilder.create();

builder.type(HikariDataSource.class);

return builder.build();

} @Bean

@ConfigurationProperties("card-db.datasource")

DataSource cardDb() {

DataSourceBuilder builder = DataSourceBuilder.create();

builder.type(HikariDataSource.class);

return builder.build();

} //skip the accountDb() here...

... }

Above @ConfigurationProperties annotations read the data source properties from the application.yml:

The auto-commit property of both card and account data source has to be set to false and pass the control to the distributed transaction. However, I have to reverse this setting for the Spring Batch meta-data data source, otherwise it will mess up the Spring Batch database initialization script, the insert statements in the script do not commit by itself.

The Spring profile shown below is for running this example in docker-compose, the MySQL hostnames in the jdbc-url properties actually are the name of the MySQL services defined in the same docker-compose.yml file.

It also expects that three blank new MySQL databases are created when each time the docker-compose stack starts, therefore, the Spring Boot program will call the database initialization scripts whenever it starts.

spring:

profiles: docker-compose

batch:

job:

enabled: false

initialize-schema: always spring-batch-db:

datasource:

driver-class-name: com.mysql.cj.jdbc.Driver

jdbc-url: jdbc:mysql://spring-batch-db:3306/db

username: batch

password: testing

auto-commit: true card-db:

datasource:

driver-class-name: com.mysql.cj.jdbc.Driver

jdbc-url: jdbc:mysql://card-db:3306/db

username: card

password: testing

auto-commit: false

init-script: classpath:schema-mysql-card-db.sql #skip account-db properties here...

...

2. The ChainedTransactionManager

The next step is to define the Transaction manager, it is also define in the DatabaseConfig configuration class:

Both card and account databases have its own DataSourceTransactionManager, on top of both, a ChainedTransactionManager is defined to control the distributed transaction acrossing these two databases.

...

@Bean

PlatformTransactionManager cardDbTxManager() {

return new DataSourceTransactionManager(cardDb());

} ...

@Bean

PlatformTransactionManager accountDbTxManager() {

return new DataSourceTransactionManager(accountDb());

} ...

@Bean

PlatformTransactionManager chainTxManager() {

ChainedTransactionManager txManager =

new ChainedTransactionManager(

cardDbTxManager(),accountDbTxManager()

);

return txManager;

}

3. The JdbcCursorItemReader

Once completed the database configuration, I move to Spring Batch job and step, the JobConfig groups all job and step related configuration:

The chunk step starts with the JdbcCursorItemReader, it reads card transaction records from the card database, and transforms the record into an AccoutTxn pojo.

The JdbcCursorItemReaderBuilder class coming with the Spring Batch library provides a builder pattern to construct the JdbcCursorItemReader, it takes a query SQL and a rowMapper function to build the instance. For most of the ItemReaders and ItemWriters (list here) in the Spring Batch library, they come with its own builder class.

The @Qualifer annotation indicates to inject the cardDb data source into the bean definition function.

@Bean

JdbcCursorItemReader<AccountTxn> txnReader(

@Qualifier("cardDb") DataSource datasource

) { return new JdbcCursorItemReaderBuilder<AccountTxn>()

.name("account-txn-reader")

.dataSource(datasource)

.sql("select t.*, c.credit_limit " +

"from CARD_TXN t join CARD c " +

"on t.card_no = c.card_no " +

"where t.post = 'N' order by t.tx_datetime")

.<AccountTxn>rowMapper((rs,rownum) -> {

AccountTxn bean = new AccountTxn();

bean.setTxRef(rs.getLong("tx_ref"));

Timestamp txTs = rs.getTimestamp("tx_datetime");

bean.setTxDatetime(txTs==null?null:txTs.toLocalDateTime());

...

return bean;

})

.build();

}

4. The JdbcBatchItemWriter

The first JdbcBatchItemWriter cumulates the net amount reading from AccountTxn pojo, and updates the corresponding account balance in account database.

With MySQL database, I can put both insert and update statements into a single SQL, for other databases, the alternative can be using stored procedure, or splitting the SQL into another JdbcBatchItemWriter instance.

The beanMapped() call indicates to bind the properties of AccountTxn Pojo into the SQL.

@Bean

JdbcBatchItemWriter<AccountTxn> balanceWriter(

@Qualifier("accountDb") DataSource datasource

) {

return new JdbcBatchItemWriterBuilder<AccountTxn>()

.dataSource(datasource)

.assertUpdates(true)

.sql(

"insert into ACCOUNT (" +

" account_no, credit_limit, balance, " +

" last_tx_ref, last_tx_datetime " +

") values ( " +

" :cardNo, :creditLimit, :netAmt, :txRef, :txDatetime" +

") " +

"on duplicate key update " +

" BALANCE = BALANCE + :netAmt, " +

" LAST_TX_REF = :txRef, " +

" LAST_TX_DATETIME = :txDatetime")

.beanMapped()

.build();

}

The second JdbcBatchItemWriter time-stamps the card transcaction records after it has been posted.

@Bean

JdbcBatchItemWriter<AccountTxn> postTxnWriter(

@Qualifier("cardDb") DataSource datasource

) {

return new JdbcBatchItemWriterBuilder<AccountTxn>()

.dataSource(datasource)

.assertUpdates(true)

.sql(

"update CARD_TXN set " +

" POST = 'Y', " +

" POST_DATETIME = :postDatetime " +

"where TX_REF = :txRef")

.beanMapped()

.build();

}

5. Define CompositeItemWriter

Since a chunk step allows to have one ItemWriter only, therefore I need to define a CompositeItemWriter that call above two JdbcBatchItemWriters one-by-one.

@Bean

CompositeItemWriter<AccountTxn> txnWriter() {

return new CompositeItemWriterBuilder<AccountTxn>()

.delegates(balanceWriter(null),postTxnWriter(null))

.build();

}

6. Build the Job and Step

Finally, I collected all ingredients to build Spring Batch job and step:

The initialization script for card database generates total 10,000 card transaction records, and here the chunk size is set to 1,000, therefore the job has to process 10 chunks to complete the task.

@Configuration

public class JobConfig { @Autowired

private JobBuilderFactory jobFactory; @Autowired

private StepBuilderFactory stepFactory;



...

@Bean

Step runBalanceStep(

@Qualifier("chainTxManager")

PlatformTransactionManager transactionManager

) {

...

return stepFactory.get("run-balance-step")

.transactionManager(transactionManager)

.<AccountTxn,AccountTxn>chunk(1000)

.reader(txnReader(null))

.processor(txnProcessor())

.writer(txnWriter())

.transactionAttribute(txAttribute)

.listener(new MyListener())

.build();

}



@Bean

Job runBalanceJob() {

return jobFactory.get("run-balance-job")

.start(runBalanceStep(null))

.build();

}

}

If you have question about this example, or any question about the Spring and her whole punch of libraries, please feel free to contact me in following channels.

email: kwonghung.yip@gmail.com

linkedin: linkedin.com/in/yipkwonghung

Twitter: @YipKwongHung

github https://github.com/kwonghung-YIP