In this article, we will discover how Alibaba optimizes flash sales for its annual Double Eleven online shopping festival using advisory locks on PostgreSQL.

11.11 The Biggest Deals of the Year. 40% OFF on selected cloud servers with a free 100 GB data transfer! Click here to learn more.

Flash sales has always been indispensable for e-commerce as well as for other online marketing efforts. For Alibaba, flash sales during the Double Eleven online shopping festival (Singles' Day) on November 11 is a necessity. However, the scale of Alibaba's Double Eleven poses significant challenges towards the servers and databases that support Alibaba's e-commerce platform.

Due to the competitive nature of flash sales, it's not unusual to see customers coming up with innovative but unfair ways to gain an upper hand over other customers. For example, on every November 11, Alibaba's transaction platforms receive heavy stress from bots, scripts, and external plug-ins for automated rush purchasing systems.

To cope with these challenges, the team at Alibaba has come up with many ways to optimize flash sales. For example, databases may adopt the queuing mechanism, asynchronous messages, or transaction merging to optimize transactions. In this article, I will introduce a flash sales optimization measure called advisory locks (AD LOCK) for PostgreSQL.

Flash Sales Typical Scenario

Although you may already be very familiar with flash sales, I still want to introduce some relevant ideas to make this article complete.

For example, let's imagine that there is one set of iPhone priced USD $1 for sale. This iPhone can be regarded as a record. When the flash sales starts, the customer who successfully buys this iPhone (updates the lock of this record) can win.

The challenge to the database is that the database must concurrently handle many update requests on only one record. Only one or a few requests can be successful, and the others fail or cannot update the record.

For example, if there are 100 iPhones for flash sales and 1 million customers participate in the flash sales, the minimum granularity for the database is a row lock. When one customer is updating this record, the other 999,999 customers are waiting. But ideally, the database should only process the 100 lucky winners in the flash sales. It is meaningless to process the requests from the customers who will not be receiving the iPhone as this will only waste resources.

Traditionally, a flag bit is used to indicate whether this record has been updated or how many times (how many iPhones) the record has been updated.

update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5; -- Assume that five iPhones are for flash sales.

However, this method has the following defect:

The customer obtaining the lock may be successful or fail in handling this record, or may handle the record for a long time (because the database responds slowly). Before the transaction is finished, other sessions have to wait. Waiting is unnecessary and a waste of time for customers who did not obtain the lock.

Commonly Used Methods to Coping with Flash Sales

Generally, for update, nowait can be used to avoid waiting. That is, if a customer does not obtain the lock, the customer does not need to wait.

begin; select 1 from tbl where id=pk for update nowait; -- If the customer does not obtain the lock, an error is returned, and the transaction is rolled back. update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5; end;

This method shortens the waiting time, because an error is returned when the lock is not obtained.

Another method is request merging. That is, multiple update requests are merged into one update request. This method requires a change to kernel and will damage ACID, because if the merged request fails, all the merged update requests fail. (This is different from group update. Group update will not damage the ACID.)

Next, let's learn about ad lock.

What Is Ad Lock?

Ad lock is a user-oriented lightweight lock. Its object is an integer. The lock types include transaction, session, shared, and exclusive.

In a single database, a lock can be obtained with a unique integer value. If the integer value is repeated, a new lock can be added using TRY. FALSE is returned immediately when no lock is obtained.

https://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS

Name Return Type Description pg_advisory_lock(key bigint) void Obtain exclusive session level advisory lock pg_advisory_lock(key1 int, key2 int) void Obtain exclusive session level advisory lock pg_advisory_lock_shared(key bigint) void Obtain shared session level advisory lock pg_advisory_lock_shared(key1 int, key2 int) void Obtain shared session level advisory lock pg_advisory_unlock(key bigint) boolean Release an exclusive session level advisory lock pg_advisory_unlock(key1 int, key2 int) boolean Release an exclusive session level advisory lock pg_advisory_unlock_all() void Release all session level advisory locks held by the current session pg_advisory_unlock_shared(key bigint) boolean Release a shared session level advisory lock pg_advisory_unlock_shared(key1 int, key2 int) boolean Release a shared session level advisory lock pg_advisory_xact_lock(key bigint) void Obtain exclusive transaction level advisory lock pg_advisory_xact_lock(key1 int, key2 int) void Obtain exclusive transaction level advisory lock pg_advisory_xact_lock_shared(key bigint) void Obtain shared transaction level advisory lock pg_advisory_xact_lock_shared(key1 int, key2 int) void Obtain shared transaction level advisory lock pg_try_advisory_lock(key bigint) boolean Obtain exclusive session level advisory lock if available pg_try_advisory_lock(key1 int, key2 int) boolean Obtain exclusive session level advisory lock if available pg_try_advisory_lock_shared(key bigint) boolean Obtain shared session level advisory lock if available pg_try_advisory_lock_shared(key1 int, key2 int) boolean Obtain shared session level advisory lock if available pg_try_advisory_xact_lock(key bigint) boolean Obtain exclusive transaction level advisory lock if available pg_try_advisory_xact_lock(key1 int, key2 int) boolean Obtain exclusive transaction level advisory lock if available pg_try_advisory_xact_lock_shared(key bigint) boolean Obtain shared transaction level advisory lock if available pg_try_advisory_xact_lock_shared(key1 int, key2 int) boolean Obtain shared transaction level advisory lock if available

Generally, the minimum lock (opened to customers) supported by database is row lock. Compared with LWLOCK and SPINLOCK, row lock is heavy. Therefore, row lock is often a bottleneck in flash sales, including the lock waiting time.

Purpose of Ad Lock

In addition to supporting flash sales, ad lock has the following functions:



Concurrent security check UPSERT in recursive call Ensuring atomic operation in service logic design

Performance of Ad Lock

Ad lock is lightweight. It does not need to access data or execute much code, so it has a high efficiency.

A 32-core, 64-thread computer can handle 1.31 million requests per second.

vi test.sql \set id random(1,100000000) select pg_try_advisory_xact_lock(:id); pgbench -M prepared -n -r -P 1 -f ./test.sql -c 96 -j 96 -T 100 transaction type: ./test.sql scaling factor: 1 query mode: prepared number of clients: 96 number of threads: 96 duration: 100 s number of transactions actually processed: 131516823 latency average = 0.072 ms latency stddev = 0.070 ms tps = 1314529.211060 (including connections establishing) tps = 1315395.309707 (excluding connections establishing) script statistics: - statement latencies in milliseconds: 0.001 \set id random(1,100000000) 0.074 select pg_try_advisory_xact_lock(:id);

Example of Using Ad Lock in Flash Sales

A product has a unique ID in the database. We can lock the ID (if the ID is repeated in different tables, add offset or use other measures to avoid conflict).

The row is locked and updated only when the ID is successfully locked. This avoids ineffective row lock waiting and redundant query code.

The QPS of ad lock in handling concurrent update requests for one record can reach 391,000/s. The state of product will be changed to sold out quickly, avoiding a waste of database resources.

create table test(id int primary key, crt_time timestamp); insert into test values (1); vi test.sql update test set crt_time=now() where id=1 and pg_try_advisory_xact_lock(1); pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 100 transaction type: ./test.sql scaling factor: 1 query mode: prepared number of clients: 64 number of threads: 64 duration: 100 s number of transactions actually processed: 39104368 latency average = 0.163 ms latency stddev = 0.216 ms tps = 391012.743072 (including connections establishing) tps = 391175.983419 (excluding connections establishing) script statistics: - statement latencies in milliseconds: 0.163 update test set crt_time=now() where id=1 and pg_try_advisory_xact_lock(1);

Now, the database host has 66.2% available CPU resource.

top - 13:12:43 up 51 days, 18:41, 2 users, load average: 1.12, 0.97, 0.78 Tasks: 1463 total, 28 running, 1435 sleeping, 0 stopped, 0 zombie Cpu(s): 24.5%us, 9.3%sy, 0.0%ni, 66.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 529321832k total, 235226420k used, 294095412k free, 903076k buffers Swap: 0k total, 0k used, 0k free, 62067636k cached

Comparison with Traditional Method

Typically, select for update nowait is used to avoid waiting.

begin; select 1 from tbl where id=pk for update nowait; -- If the customer does not obtain the lock, an error is returned, and the transaction is rolled back. update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5; end;

PG can use the do clauses to merge the preceding commands into one block.

The traditional method can handle 86,000 requests per second.

vi test.sql do language plpgsql $$ declare begin with t as (select * from test where id=1 for update nowait) update test set crt_time=now() from t where t.id=test.id; exception when others then return; end; $$ ; pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 100 transaction type: ./test.sql scaling factor: 1 query mode: prepared number of clients: 64 number of threads: 64 duration: 100 s number of transactions actually processed: 8591222 latency average = 0.744 ms latency stddev = 0.713 ms tps = 85888.823884 (including connections establishing) tps = 85924.666940 (excluding connections establishing) script statistics: - statement latencies in milliseconds: 0.744 do language plpgsql $$ declare begin with t as (select * from test where id=1 for update nowait) update test set crt_time=now() from t where t.id=test.id; exception when others then return; end; $$ ;

There is 54.5% available CPU resource.

top - 13:13:48 up 51 days, 18:42, 2 users, load average: 8.14, 2.69, 1.37 Tasks: 1464 total, 21 running, 1442 sleeping, 0 stopped, 1 zombie Cpu(s): 41.7%us, 3.8%sy, 0.0%ni, 54.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 529321832k total, 235256052k used, 294065780k free, 903176k buffers Swap: 0k total, 0k used, 0k free, 62068308k cached

Real Deduction Throughput of a Single Product

When testing the deduction throughput of a product, you can find that the QPS is high. However, the actual deduction throughput is lower because an error is returned if no lock is obtained.

postgres=# create table upd(id int primary key, cnt int8); postgres=# insert into upd values(1,0); vi t0.sql update upd set cnt=cnt-1 where id=1 and pg_try_advisory_xact_lock(1); \sleep 10 us .... vi t7.sql update upd set cnt=cnt-1 where id=1 and pg_try_advisory_xact_lock(1); \sleep 80 us pgbench -M prepared -n -r -P 1 -f t0.sql -f t1.sql -f t2.sql -f t3.sql -f t4.sql -f t5.sql -f t6.sql -f t7.sql -c 64 -j 64 -T 100 postgres=# select * from upd; id | cnt ----+--------- 1 | -611249 (1 row)

The inventory of a single product is deducted 6112.49 times per second. Generally, the inventory of the products for flash sales is within 1000; otherwise, customers do not need to scramble for the products.

Therefore, this value can meet requirement.

Real Deduction Throughput of All Products on the Platform

Assume 10 million products are available on the platform. Test the total deduction throughput using this method.

postgres=# create table upd(id int primary key, cnt int8); postgres=# insert into upd select generate_series(1,10000000), 0; vi test.sql \set id random(1,10000000) update upd set cnt=cnt-1 where id=:id and pg_try_advisory_xact_lock(:id); pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 100 postgres=# select sum(cnt) from upd; sum ----------- -27233112 (1 row)

Check the cnt for the real deduction statistics. On the entire platform, 272331.12 products are deducted per second. That is, 270,000 products are sold every second.

Advantages of Ad Lock Compared with Other Flash Sales Optimization Methods

Ad lock reduces CPU utilization and shortens waiting time to the minimum. According to the test case in this article, a record can be updated 391,000 times per second. However, the traditional method can reach only 86,000/s.

Furthermore, ad lock does not damage the ACID. A request affects a single transaction, and does not affect other transactions. In contrast, the merging method damages the ACID. If the merging fails, all merged request will fail.

To learn more about PostgreSQL on Alibaba Cloud, visit www.alibabacloud.com/product/apsaradb-for-rds-postgresql