Relational Databases: PostgreSQL, MySQL, SQLite

databases: architecture | client | select | where | dates | join | aggregate | sort and limit | insert, update, and delete | schema | sequences | indices | import and export | script | function | query tuning | user | python | ruby | help | admin

sql: types | casts | literals | dates | identifiers | operators | functions | distinct | qualified * | regular expressions | sequences | group by | aggregation functions | window functions | where clause subqueries | join | from clause subquery | as | limit and offset | with | null | sets and multisets | session objects | scripts | reflection | sargable expressions | transactions | idempotent sql

The version used to test the examples in this sheet.

How to determine the version of a database engine.

mysql:

MySQL supports different storage engines. Each storage engine has its own size limits, and features such as indexes, transactions, locking and foreign key support aren't available for all storage engines.

Here is how to determine the storage engine used for a table:

select engine from information_schema.tables where table_schema = 'test' and table_name = 'foo';

The location of the database engine.

How data is organized in a database.

How data is stored in files on the file system.

postgresql:

Tables are split into multiple files when they exceed 2G; large attributes are stored in separate TOAST files.

What durability guarantee is made and how this is accomplished.

Are indices available and what can be indexed.

Are transactions available and what can participate in a transaction.

Available security features.

Whether a server side programming language is available.

How to invoke the command line client.

postgresql:

If the database user is not specified, it will default to the operating system user. If the database is not specified, it will default to the operating system user. If the host is not specified, psql will attempt to connect to a server on the local host using a Unix domain socket.

How to get a list of commands available at the command line client prompt.

The default port used by the client to connect to the server.

The default ports used by PostgreSQL and MySQL are defined in /etc/services .

List the available databases on a server.

How to switch between databases when using the command line SQL prompt.

The name of the startup file used by the client.

List the tables in the current database.

Show the columns for a table and their types.

How to run a SQL script at the command line.

List the stored functions in the current database.

How to create a database.

postgresql:

The user must have the CREATEDB privilege. When creating the database from the command line using createdb , the PostgreSQL user can be specified using the -U option.

How to drop a database.

Writing SELECT queries for open-source databases.

The reader is assumed to have written SELECT queries with FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses.

When we say that something is standard, we mean it conforms to the most recent SQL standard.

When we say that something is portable, we mean works on PostgreSQL, MySQL, and SQLite.

SQL

A list of portable types:

BOOLEAN

INTEGER or INT

INT REAL

DOUBLE PRECISION

NUMERIC( total digits , fractional digits )

, ) NUMERIC( total digits )

) CHARACTER( len ) or CHAR( len )

) CHAR( ) CHARACTER VARYING( len ) or VARCHAR( len )

) VARCHAR( ) TIMESTAMP

DATE

TIME

Note that NUMERIC(len) defines an integer type.

mysql:

MySQL maps BOOLEAN to TINYINT(1); REAL and DOUBLE PRECISION to DOUBLE; NUMERIC to DECIMAL.

This is the standard and portable way to cast:

SELECT cast('7' AS INTEGER) + 3;

The standard calls for implicit casts between numeric types.

The standard also calls for implicit casts between character types. In particular, character types can be concatenated, and the length of the concatenation type is the sum of the length of the argument types.

postgresql:

Other ways to cast:

> SELECT '7'::INTEGER + 3; > SELECT INTEGER '7' + 3;

The type of string operations is TEXT, which is a character type of unlimited length.

It is an error to attempt to insert a string that is too long into a column with fixed or maximum length.

mysql:

When concatenating character types, the length of the type of the concatenation is the sum of the length of the type of the arguments.

MySQL silently truncates strings that are too long on insert.

sqlite:

SQLite does not enforce character type length limits.

The standard reserves these keywords: NULL, TRUE, and FALSE.

Numeric literals work like you would expect.

SQL strings are admirably simple. Single quote delimited, double the single quote to escape, double pipe for concatenation.

postgresql:

This code results in a type mismatch error:

SELECT TRUE = 1;

Converting a string containing a hex digit to an integer:

> select x'3bb'::int; int4 ------ 955

The chr() function takes an integer representing a Unicode point as an argument:

> SELECT 'one' || chr(10) || 'two' || chr(10) || 'three'; ?column? ---------- one + two + three > SELECT chr(x'3bb'::int); chr ----- λ

There is syntax for strings with C-style backslash escapes:

select E'one

two

three';

mysql:

TRUE and FALSE are synonyms for 1 and 0.

The || operator is used for logical disjunction. Use the concat() function for string concatenation.

SELECT concat('one', char(10), 'two');

+--------------------------------+ | concat('one', char(10), 'two') | +--------------------------------+ | one two | +--------------------------------+

sqlite:

SQLite does not have TRUE and FALSE literals. Use 1 and 0 instead.

Strings can be single quote or double quote delimited.

The standard provides the keywords CURRENT_TIME, CURRENT_DATE, and CURRENT_TIMESTAMP for observing the clock.

There are no date literals; strings are used instead. Inserting a string into a TIME, DATE, or TIMESTAMP column will work if ISO 8601 format is used.

> CREATE TABLE foo (a DATE, b TIME, c TIMESTAMP); > INSERT INTO foo VALUES ('2012-10-19', '18:00:00', '2012-10-19 18:00:00');

This works in both MySQL and PostgreSQL:

> SELECT date('2012-10-19'), time('18:00:00'), timestamp('2012-10-19 18:00:00');

EXTRACT, TIMESTAMP from DATE and TIME, STRFTIME, STRPTIME

INTERVAL type and date arithmetic

TIMESTAMP - TIMESTAMP

TIMESTAMP + INTERVAL and INTERVAL + TIMESTAMP

INTERVAL + INTERVAL and INTERVAL - INTERVAL

INTERVAL * NUMERIC and NUMERIC * INTERVAL

INTERVAL / NUMERIC

mysql:

MySQL does not have an INTERVAL type. Subtracting two TIMESTAMPs yields a NUMERIC(20, 0) and subtracting two DATEs yields a NUMERIC(11, 0).

According to the standard, identifiers with unusual characters should be double quoted. A literal double quote is represented by two double quotes.

mysql:

MySQL uses backticks `` instead of double quotes "" to quote identifiers.

AND OR NOT

< > <= >= = != (<>)

BETWEEN IN

+ - * / % ^

& | # ~ << >>

|| LIKE (ESCAPE)

-- select rows where foo.x ends with percent sign -- SELECT * FROM foo WHERE x LIKE '%\%' ESCAPE '\';

check mysql and sqlite

how to get a list of functions

MySQL 5.5 Functions and Operators

MySQL 5.5 Function Index

SQLite Core Functions

SQLite Aggregate Functions

SQLite Date and Time Functions

Some of the places DISTINCT can be used:

SELECT DISTINCT expr ...

SELECT count(DISTINCT expr )

) UNION DISTINCT

INTERSECT DISTINCT

EXCEPT DISTINCT

UNION ALL, INTERSECT ALL, and EXCEPT ALL can be used to indicate multiset operations. UNION DISTINCT, INTERSECT DISTINCT, and EXCEPT DISTINCT indicate set operations. Since this is the default the use of DISTINCT is superfluous.

-- Example of a qualified *: only -- return rows from foo: -- SELECT foo.* FROM foo JOIN bar ON foo.x = bar.x

SIMILAR TO Postgres

Here is the SQL standard syntax for external sequences:

CREATE SEQUENCE foo; CREATE SEQUENCE bar START WITH 1000 INCREMENT BY 10 MAXVALUE 2000; SELECT NEXT VALUE FOR foo; ALTER SEQUENCE foo RESTART WITH 10;

Here is the SQL standard syntax for internal sequences. None of the open source databases support this syntax, however.

CREATE TABLE foo ( foo_id INTEGER GENERATED ALWAYS AS IDENTITY ( START WITH 1 INCREMENT BY 1 MAXVALUE 10000) )

postgresql:

PostgreSQL lacks SQL standard syntax for reading external sequences:

CREATE SEQUENCE foo; CREATE SEQUENCE bar START WITH 1000 INCREMENT BY 10 MAXVALUE 2000; SELECT nextval('foo'); ALTER SEQUENCE foo RESTART WITH 10; SELECT setval('foo', 10);

The keywords WITH and BY are optional.

How to create an internal sequence:

CREATE TABLE foo foo_id SERIAL, foo_desc TEXT );

mysql:

MySQL does not have external sequences.

sqlite:

SQLite does not have external sequences.

CREATE TABLE foo ( foo_id INTEGER PRIMARY KEY AUTOINCREMENT, foo_desc TEXT );

GROUP BY, HAVING, and ORDER BY clauses can refer to SELECT list items by ordinal number. I don't think this is in the standard, but the feature is in PostgreSQL, MySQL, and SQLite.

postgresql mysql sqlite count, sum, min, max, avg count, sum, min, max, avg count, sum, min, max, avg count(distinct *)

count(distinct expr )

sum(distinct expr ) count(distinct *)

count(distinct expr )

sum(distinct expr ) count(distinct *)

count(distinct expr )

sum(distinct expr ) bit_and

bit_or bit_and

bit_or bool_and

bool_or string_agg( expr , delim ) group_concat( expr )

group_concat( expr separator delim ) array_agg stddev_samp

stddev_pop

var_samp

var_pop stddev_samp

stddev_pop

var_samp

var_pop cor( X , Y )

cov_samp( X , Y )

cor_pop( X , Y )

regr_intercept( X , Y )

regr_slope( X , Y )

limits on use

can they be used with group by

window functions in WHERE

different window functions in SELECT

usefulness

pct of total

pct of category

cumsum

rank

> SELECT state, fruit, avg(x) FROM produce GROUP BY state; ERROR: column "produce.fruit" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT state, fruit, avg(x) FROM produce GROUP BY state; ^ > SELECT state, fruit, avg(x) OVER (PARTITION BY state) FROM produce; state | fruit | avg ------------+------------+------------------------ arizona | banana | 1.00000000000000000000 california | orange | 8.6000000000000000 california | banana | 8.6000000000000000 california | apple | 8.6000000000000000 california | banana | 8.6000000000000000 california | orange | 8.6000000000000000 nevada | banana | 6.3333333333333333 nevada | apple | 6.3333333333333333 nevada | orange | 6.3333333333333333 oregon | grapefruit | 2.5000000000000000 oregon | grapefruit | 2.5000000000000000 washington | grapefruit | 2.5000000000000000 washington | apple | 2.5000000000000000

Subqueries can be used in a WHERE clause with EXISTS, IN, and the comparison operators: = < > <= >= != (<>).

The following two queries are equivalent.

SELECT * FROM a, b WHERE a.x = b.x AND a.y > 0;

SELECT * FROM a JOIN b ON a.x = b.x WHERE a.y > 0;

The latter form is perhaps preferred. The latter separates the join condition from the expression, keeping the expression simpler. Each JOIN clause must have an ON clause, reducing the chance of writing a Cartesian join by accident.

To perform an outer join—LEFT, RIGHT, or FULL—one must use a JOIN clause.

JOINs can be used to replace (NOT) EXISTS with a subquery:

SELECT * FROM customers c WHERE NOT EXISTS ( SELECT customer_id FROM orders o WHERE c.id = o.customer_id );

SELECT c.* FROM customers c LEFT JOIN orders o ON c.id = o.customer_id WHERE o.customer_id is NULL;

Subqueries inside parens can appear in FROM and JOIN clauses. They must be given an alias.

Select list items and tables in FROM and JOIN clauses can be given an alias using AS. If the aliased item is a table or column its previous name is hidden. Use of the AS keyword is optional and can be omitted.

The standard is:

OFFSET start { ROW | ROWS } FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY

Of these databases, only PostgreSQL provides a mechanism for selecting the row number, and it requires using a window function: row_number() OVER () .

CREATE TABLE directed_edge ( start_node TEXT, end_node TEXT ); INSERT INTO directed_edge VALUES ( 'a', 'b'); INSERT INTO directed_edge VALUES ( 'b', 'c'); INSERT INTO directed_edge VALUES ( 'c', 'd'); INSERT INTO directed_edge VALUES ( 'x', 'y'); WITH RECURSIVE directed_path(start_node, end_node) AS ( SELECT start_node, end_node FROM directed_edge UNION SELECT dp.start_node, de.end_node FROM directed_path AS dp JOIN directed_edge de ON dp.end_node = de.start_node ) SELECT * FROM directed_path;

start_node | end_node ------------+---------- a | b b | c c | d x | y a | c b | d a | d

NULL Handling in SQLite Versus Other Database Engines

NULL propagates in arithmetic: NULL + 0 is NULL, NULL * 0 is NULL.

NULLs distinct in UNIQUE

NULLs not distinct according to DISTINCT

NULL is FALSE in CASE: "CASE WHEN null THEN 1 ELSE 0 END"

THREE VALUE LOGIC: NULL OR TRUE is TRUE, NULL AND FALSE is FALSE.

-- return 'bar' if foo is NULL coalesce(foo, 'bar') -- return NULL if foo is 'bar' nullif(foo, 'bar')

SELECT 'foo', 3 UNION SELECT 'bar', 7; SELECT 'foo', 3 INTERSECT SELECT 'bar', 7; SELECT 'foo', 3 EXCEPT SELECT 'bar', 7;

ALL and DISTINCT can be used after UNION, INTERSECT, and EXCEPT to indicate multiset or set operations. Set operations (i.e. DISTINCT) are the default.

Temporary tables and variables.

The standard calls for a schema called INFORMATION_SCHEMA . The starting point for learning about a database is:

SELECT * FROM INFORMATION_SCHEMA.TABLES;

The standard also provides these:

> SELECT CURRENT_USER; > SELECT CURRENT_ROLE; > SELECT CURRENT_SCHEMA; > SELECT CURRENT_CATALOG;

sqlite:

SQLite does not have INFORMATION_SCHEMA . Use the .schema command to get a list of tables and their DDL.

Sargable (en.wikipedia.org)

Idempotent DDL scripts are desirable. CREATE TABLE statements fail if the table already exists. Both PostgreSQL and MySQL support DROP TABLE foo IF EXISTS; which is not part of the standard.

MERGE (MySQL REPLACE)

TEMP tables and WITH.

Query information_schema. This requires a language which can branch.