As I mentioned in an earlier post, PostgreSQL finally supports some very basic materialized views as of version 9.3, and this has me more excited than it should! I finally got a chance to give it a try.





Overview of Materialized Views

But first, what is a materialized view? Very briefly, it’s a view whose query results are stored. If a view is a stored query that spits results at you on demand, a materialized view is a view that stores the query results and spits those same results out at you on demand. The ordinary view works hard to get your results every time you query it, while the materialized view has already done the work for you in the past. View=slower, zero disk space; materialized view=faster; potentially lots of disk space.





If the query results change over time, then at some point the materialized view results will be “stale”. You must tell Postgres to refresh your materialized view from time to time. In more advanced databases, materialized views may refresh automatically, either when a dependent table is updated, on a regular schedule, or on demand. Postgres only supplies “on demand”.





A Simple Example: Baseball Stats

Let’s get right into an example. I like baseball, so the goal of this example will be to come up with team batting averages based on the players’ individual plate appearances. My data is very simple, and a plate appearance can either be a hit or an out.





So, we have these tables:

/* each row represents a player on a team */ CREATE TABLE player (id SERIAL PRIMARY KEY, name VARCHAR, num SMALLINT, team VARCHAR(3)); /* each row represents a plate appearance where a player attempts to hit the ball. In our simplistic model, there are two possible outcomes: he either gets a hit or he is out */ CREATE TABLE pa (pa_id SERIAL PRIMARY KEY, player_id INT REFERENCES player(id), gamedate DATE, inning SMALLINT, result VARCHAR(1));





I have populated them with dummy data. The teams I used are the MLB American League East teams: New York Yankees (NYY), Boston Red Sox (BOS), Tampa Bay Rays (TB), Toronto Blue Jays (TOR) and Baltimore Orioles (BAL). 25 players for each team, getting 4 plate appearances per day over 180 days. These are just numbers I made up. I’m not attempting total realism here.





Now I will write a view to get the batting averages per team:

CREATE OR REPLACE VIEW team_avg_v AS SELECT team, SUM(CASE WHEN result='H' THEN 1 ELSE 0 END)::float/COUNT(*)::float AS avg FROM pa JOIN player p ON pa.player_id=p.id GROUP BY team





The result:

mwrynn=# select * from team_avg_v; team | avg ------+------------------- BAL | 0.252296296296296 NYY | 0.244592592592593 TB | 0.246666666666667 BOS | 0.250444444444444 TOR | 0.248444444444444 (5 rows) Time: 248.679





249 ms is good time, but imagine if we had data for *all* players for *all* teams, even for *all* years baseball was played! This simple view would possibly take hours to run, at least on my wimpy little laptop.





Enter the materialized view. All we have to do is the following to materialize this view:





mwrynn=# CREATE MATERIALIZED VIEW team_avg_mv AS SELECT * FROM team_avg_v; SELECT 5 Time: 190.997 ms





And now let’s try querying it:

mwrynn=# select * from team_avg_mv; team | avg ------+------------------- BAL | 0.252296296296296 NYY | 0.244592592592593 TB | 0.246666666666667 BOS | 0.250444444444444 TOR | 0.248444444444444 (5 rows) Time: 0.755 ms



