In MySQL 5.7 we introduced JSON functionality into the MySQL Server. This work included the introduction of a JSON data type, virtual columns and a set of approximately 20 SQL functions that allow you to manipulate and search JSON data on the server side.

The JSON functionality has been well received, and in MySQL 8.0 we have plans to improve it in a number of ways. This post outlines improvements to the SQL functions with the addition of aggregate functions.

TL;DR

Starting with MySQL 8.0 (lab release)* two new aggregation functions were added and can be used to combine data into JSON arrays/objects:

JSON_ARRAYAGG()

JSON_OBJECTAGG()

*(To download MySQL 8.0-labs release go to this link and choose “MySQL Server 8.0.0 Optimizer)

mysql> CREATE TABLE `t1` ( -> `key` varchar(8) DEFAULT NULL, -> `grp` varchar(8) DEFAULT NULL, -> `val` varchar(8) -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1; Query OK, 0 rows affected (0,01 sec) mysql> mysql> INSERT INTO t1(`key`, `grp`, `val`) VALUES -> ("key1", "g1", "v1"), -> ("key2", "g1", "v2"), -> ("key3", "g2", "v3"); Query OK, 3 rows affected (0,01 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> mysql> SELECT JSON_ARRAYAGG(`key`) AS `keys` FROM t1; +--------------------------+ | keys | +--------------------------+ | [ "key1", "key2", "key3" ] | +--------------------------+ 1 row in set (0,00 sec) mysql> SELECT grp, JSON_ARRAYAGG(`key`) AS `keys_grouped` FROM t1 GROUP BY grp; +------+------------------+ | grp | keys_grouped | +------+------------------+ | g1 | ["key1", "key2"] | | g2 | ["key3"] | +------+------------------+ 2 rows in set (0,00 sec) mysql> SELECT JSON_OBJECTAGG(`key`, val) AS `key_val` FROM t1; +--------------------------------------------+ | key_val | +--------------------------------------------+ | { "key1": "v1", "key2": "v2", "key3": "v3" } | +--------------------------------------------+ 1 row in set (0,00 sec) mysql> SELECT grp, JSON_OBJECTAGG(`key`, val) AS `key_val_grouped` FROM t1 GROUP BY grp; +------+------------------------------+ | grp | key_val_grouped | +------+------------------------------+ | g1 | {"key1": "v1", "key2": "v2"} | | g2 | {"key3": "v3"} | +------+------------------------------+ 2 rows in set (0,00 sec) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 mysql > CREATE TABLE ` t1 ` ( -> ` key ` varchar ( 8 ) DEFAULT NULL , -> ` grp ` varchar ( 8 ) DEFAULT NULL , -> ` val ` varchar ( 8 ) -> ) ENGINE = InnoDB DEFAULT CHARSET = latin1 ; Query OK , 0 rows affected ( 0 , 01 sec ) mysql > mysql > INSERT INTO t1 ( ` key ` , ` grp ` , ` val ` ) VALUES -> ( "key1" , "g1" , "v1" ) , -> ( "key2" , "g1" , "v2" ) , -> ( "key3" , "g2" , "v3" ) ; Query OK , 3 rows affected ( 0 , 01 sec ) Records : 3 Duplicates : 0 Warnings : 0 mysql > mysql > SELECT JSON_ARRAYAGG ( ` key ` ) AS ` keys ` FROM t1 ; + -- -- -- -- -- -- -- -- -- -- -- -- -- + | keys | + -- -- -- -- -- -- -- -- -- -- -- -- -- + | [ "key1" , "key2" , "key3" ] | + -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in set ( 0 , 00 sec ) mysql > SELECT grp , JSON_ARRAYAGG ( ` key ` ) AS ` keys_grouped ` FROM t1 GROUP BY grp ; + -- -- -- + -- -- -- -- -- -- -- -- -- + | grp | keys_grouped | + -- -- -- + -- -- -- -- -- -- -- -- -- + | g1 | [ "key1" , "key2" ] | | g2 | [ "key3" ] | + -- -- -- + -- -- -- -- -- -- -- -- -- + 2 rows in set ( 0 , 00 sec ) mysql > SELECT JSON_OBJECTAGG ( ` key ` , val ) AS ` key_val ` FROM t1 ; + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | key_val | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | { "key1" : "v1" , "key2" : "v2" , "key3" : "v3" } | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in set ( 0 , 00 sec ) mysql > SELECT grp , JSON_OBJECTAGG ( ` key ` , val ) AS ` key_val_grouped ` FROM t1 GROUP BY grp ; + -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | grp | key_val_grouped | + -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | g1 | { "key1" : "v1" , "key2" : "v2" } | | g2 | { "key3" : "v3" } | + -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 2 rows in set ( 0 , 00 sec )

Now for the less impatient:

Let’s think about this scenario: you have a database which contains both structured and semi-structured data and you’ve decided to adopt the EAV model (Entity–Attribute–Value). The tables will look more or less like this:

You have a product table which contains the common attributes:

CREATE TABLE `product` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(120) DEFAULT NULL, `manufacturer` varchar(120) DEFAULT NULL, `price` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; 1 2 3 4 5 6 7 CREATE TABLE ` product ` ( ` id ` int ( 11 ) NOT NULL AUTO_INCREMENT , ` name ` varchar ( 120 ) DEFAULT NULL , ` manufacturer ` varchar ( 120 ) DEFAULT NULL , ` price ` int ( 11 ) DEFAULT NULL , PRIMARY KEY ( ` id ` ) ) ENGINE = InnoDB DEFAULT CHARSET = latin1 ;

Then you have the attribute table containing all the non-common attributes that a product might have:

CREATE TABLE `attribute` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(120) DEFAULT NULL, `description` varchar(256) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; 1 2 3 4 5 6 CREATE TABLE ` attribute ` ( ` id ` int ( 11 ) NOT NULL AUTO_INCREMENT , ` name ` varchar ( 120 ) DEFAULT NULL , ` description ` varchar ( 256 ) DEFAULT NULL , PRIMARY KEY ( ` id ` ) ) ENGINE = InnoDB DEFAULT CHARSET = latin1 ;

With some possible entries:

INSERT INTO attribute(id, name) VALUES (1, "color"), (2, "material"), (3, "style"), (4, "bulb_type"), (5, "usage"), (6, "cpu_type"), (7, "cpu_speed"), (8, "weight"), (9, "battery_life"), (10, "fuel_type"); 1 2 3 4 5 6 7 8 9 10 11 INSERT INTO attribute ( id , name ) VALUES ( 1 , "color" ) , ( 2 , "material" ) , ( 3 , "style" ) , ( 4 , "bulb_type" ) , ( 5 , "usage" ) , ( 6 , "cpu_type" ) , ( 7 , "cpu_speed" ) , ( 8 , "weight" ) , ( 9 , "battery_life" ) , ( 10 , "fuel_type" ) ;

And finally the value table which combines the product key, the attribute key with the actual value.

CREATE TABLE `value` ( `prod_id` int(11) NOT NULL, `attribute_id` int(11) NOT NULL, `value` text, PRIMARY KEY (`prod_id`,`attribute_id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; 1 2 3 4 5 6 CREATE TABLE ` value ` ( ` prod_id ` int ( 11 ) NOT NULL , ` attribute_id ` int ( 11 ) NOT NULL , ` value ` text , PRIMARY KEY ( ` prod_id ` , ` attribute_id ` ) ) ENGINE = InnoDB DEFAULT CHARSET = latin1 ;

Now let’s insert a few products and their attributes:

INSERT INTO product(id, name, manufacturer, price) VALUES (1, "LED Desk Lamp", "X", 26); INSERT INTO value VALUES (1, 1, "black"), (1, 2, "plastic"), (1, 3, "classic"), (1, 4, "LED"), (1, 5, "Indoor use only"); INSERT INTO product(id, name, manufacturer, price) VALUES (2, "Laptop", "Y", 800); INSERT INTO value VALUES (2, 1, "blue"), (2, 6, "quad core"), (2, 7, "3400 mhz"), (2, 8, "2,1 kg"), (2, 9, "9h"); INSERT INTO product(id, name, manufacturer, price) VALUES (3, "Grill", "Z", 300); INSERT INTO value VALUES (3, 1, "black"), (3, 8, "5 kg"), (3, 10, "gas"); 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 INSERT INTO product ( id , name , manufacturer , price ) VALUES ( 1 , "LED Desk Lamp" , "X" , 26 ) ; INSERT INTO value VALUES ( 1 , 1 , "black" ) , ( 1 , 2 , "plastic" ) , ( 1 , 3 , "classic" ) , ( 1 , 4 , "LED" ) , ( 1 , 5 , "Indoor use only" ) ; INSERT INTO product ( id , name , manufacturer , price ) VALUES ( 2 , "Laptop" , "Y" , 800 ) ; INSERT INTO value VALUES ( 2 , 1 , "blue" ) , ( 2 , 6 , "quad core" ) , ( 2 , 7 , "3400 mhz" ) , ( 2 , 8 , "2,1 kg" ) , ( 2 , 9 , "9h" ) ; INSERT INTO product ( id , name , manufacturer , price ) VALUES ( 3 , "Grill" , "Z" , 300 ) ; INSERT INTO value VALUES ( 3 , 1 , "black" ) , ( 3 , 8 , "5 kg" ) , ( 3 , 10 , "gas" ) ;

If you need to select complete products that combines all the attribute keys and values as JSON object, though combining the structured data (in product) and the semi-structured data (in attribute and in value): you can use the JSON_OBJECTAGG aggregation function:

SELECT JSON_OBJECT("key", p.id, "title", p.name, "manufacturer", p.manufacturer, "price", p.price, "specifications", JSON_OBJECTAGG(a.name, v.value)) as product FROM product as p JOIN value as v ON p.id=v.prod_id JOIN attribute as a ON a.id=v.attribute_id GROUP BY v.prod_id; +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | product | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | { "key": 1, "price": 26, "title": "LED Desk Lamp", "manufacturer": "X", "specifications": { "color": "black", "style": "classic", "usage": "Indoor use only", "material": "plastic", "bulb_type": "LED" } } | | { "key": 2, "price": 800, "title": "Laptop", "manufacturer": "Y", "specifications": { "color": "blue", "weight": "2,1 kg", "cpu_type": "quad core", "cpu_speed": "3400 mhz", "battery_life": "9h" } }| |{ "key": 3, "price": 300, "title": "Grill", "manufacturer": "Z", "specifications": { "color": "black", "weight": "5 kg", "fuel_type": "gas" } }| +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3 rows in set (0,01 sec) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 SELECT JSON_OBJECT( "key" , p.id, "title" , p.name, "manufacturer" , p.manufacturer, "price" , p.price, "specifications" , JSON_OBJECTAGG(a.name, v. value )) as product FROM product as p JOIN value as v ON p.id = v.prod_id JOIN attribute as a ON a.id = v.attribute_id GROUP BY v.prod_id; +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | product | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | { "key" : 1, "price" : 26, "title" : "LED Desk Lamp" , "manufacturer" : "X" , "specifications" : { "color" : "black" , "style" : "classic" , "usage" : "Indoor use only" , "material" : "plastic" , "bulb_type" : "LED" } } | | { "key" : 2, "price" : 800, "title" : "Laptop" , "manufacturer" : "Y" , "specifications" : { "color" : "blue" , "weight" : "2,1 kg" , "cpu_type" : "quad core" , "cpu_speed" : "3400 mhz" , "battery_life" : "9h" } }| |{ "key" : 3, "price" : 300, "title" : "Grill" , "manufacturer" : "Z" , "specifications" : { "color" : "black" , "weight" : "5 kg" , "fuel_type" : "gas" } }| +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3 rows in set (0,01 sec)

To select an array containing all the possible attribute keys for each product JSON_ARRAYAGG can be used:

SELECT p.id, JSON_ARRAYAGG(a.name) as product_attributes FROM product as p JOIN value as v ON p.id=v.prod_id JOIN attribute as a ON a.id=v.attribute_id GROUP BY v.prod_id; +----+--------------------------------------------------------------+ | id | product_attributes | +----+--------------------------------------------------------------+ | 1 | ["color", "style", "usage", "material", "bulb_type"] | | 2 | ["cpu_type", "weight", "color", "cpu_speed", "battery_life"] | | 3 | ["color", "fuel_type", "weight"] | +----+--------------------------------------------------------------+ 3 rows in set (0,01 sec) 1 2 3 4 5 6 7 8 9 10 11 12 13 SELECT p.id, JSON_ARRAYAGG(a.name) as product_attributes FROM product as p JOIN value as v ON p.id = v.prod_id JOIN attribute as a ON a.id = v.attribute_id GROUP BY v.prod_id; +----+--------------------------------------------------------------+ | id | product_attributes | +----+--------------------------------------------------------------+ | 1 | [ "color" , "style" , "usage" , "material" , "bulb_type" ] | | 2 | [ "cpu_type" , "weight" , "color" , "cpu_speed" , "battery_life" ] | | 3 | [ "color" , "fuel_type" , "weight" ] | +----+--------------------------------------------------------------+ 3 rows in set (0,01 sec)

The functions can also be used to help you migrate to a new schema having the semi-structured data stored in JSON columns:

CREATE TABLE complete_product AS (SELECT p.id, p.name, p.manufacturer, p.price, JSON_OBJECTAGG(a.name, v.value) as semi_structured_data FROM product as p JOIN value as v ON p.id=v.prod_id JOIN attribute as a ON a.id=v.attribute_id GROUP BY v.prod_id); SELECT * FROM complete_product; +----+---------------+--------------+-------+---------------------------------------------------------------------------------------------------------------+ | id | name | manufacturer | price | semi_structured_data | +----+---------------+--------------+-------+---------------------------------------------------------------------------------------------------------------+ | 1 | LED Desk Lamp | X | 26 | {"color": "black", "style": "classic", "usage": "Indoor use only", "material": "plastic", "bulb_type": "LED"} | | 2 | Laptop | Y | 800 | {"color": "blue", "weight": "2,1 kg", "cpu_type": "quad core", "cpu_speed": "3400 mhz", "battery_life": "9h"} | | 3 | Grill | Z | 300 | {"color": "black", "weight": "5 kg", "fuel_type": "gas"} | +----+---------------+--------------+-------+---------------------------------------------------------------------------------------------------------------+ 3 rows in set (0,00 sec) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 CREATE TABLE complete_product AS ( SELECT p . id , p . name , p . manufacturer , p . price , JSON_OBJECTAGG ( a . name , v . value ) as semi_structured_data FROM product as p JOIN value as v ON p . id = v . prod_id JOIN attribute as a ON a . id = v . attribute_id GROUP BY v . prod_id ) ; SELECT * FROM complete_product ; + -- -- + -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- + -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + | id | name | manufacturer | price | semi_structured_data | + -- -- + -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- + -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + | 1 | LED Desk Lamp | X | 26 | { "color" : "black" , "style" : "classic" , "usage" : "Indoor use only" , "material" : "plastic" , "bulb_type" : "LED" } | | 2 | Laptop | Y | 800 | { "color" : "blue" , "weight" : "2,1 kg" , "cpu_type" : "quad core" , "cpu_speed" : "3400 mhz" , "battery_life" : "9h" } | | 3 | Grill | Z | 300 | { "color" : "black" , "weight" : "5 kg" , "fuel_type" : "gas" } | + -- -- + -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- + -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + 3 rows in set ( 0 , 00 sec )

Can this be achieved in 5.7 too? Yes, but it’s more complicated.

SELECT p.id, CAST(CONCAT('[', GROUP_CONCAT(JSON_QUOTE(a.name)) , ']') AS JSON) AS product_attributes FROM product as p JOIN value AS v ON p.id=v.prod_id JOIN attribute as a ON a.id=v.attribute_id GROUP BY v.prod_id; SELECT JSON_OBJECT("key", p.id, "title", p.name, "manufacturer", p.manufacturer, "price", p.price, "specifications", CAST(CONCAT('{', GROUP_CONCAT(CONCAT(JSON_QUOTE(a.name), ':', JSON_QUOTE(v.value))), '}') AS JSON) ) AS product FROM product AS p JOIN value AS v ON p.id=v.prod_id JOIN attribute AS a ON a.id=v.attribute_id GROUP BY v.prod_id; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 SELECT p . id , CAST ( CONCAT ( '[' , GROUP_CONCAT ( JSON_QUOTE ( a . name ) ) , ']' ) AS JSON ) AS product_attributes FROM product as p JOIN value AS v ON p . id = v . prod_id JOIN attribute as a ON a . id = v . attribute_id GROUP BY v . prod_id ; SELECT JSON_OBJECT ( "key" , p . id , "title" , p . name , "manufacturer" , p . manufacturer , "price" , p . price , "specifications" , CAST ( CONCAT ( '{' , GROUP_CONCAT ( CONCAT ( JSON_QUOTE ( a . name ) , ':' , JSON_QUOTE ( v . value ) ) ) , '}' ) AS JSON ) ) AS product FROM product AS p JOIN value AS v ON p . id = v . prod_id JOIN attribute AS a ON a . id = v . attribute_id GROUP BY v . prod_id ;

The addition of aggregate functions makes it easier to aggregate data into JSON. Please try out these new features, and let us know your feedback!

Thank you for using MySQL.