Christophe Ladroue » R, and kindly contributed to Want to share your content on R-bloggers? [This article was first published on, and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you use MySQL, there’s a default schema called ‘ information_schema ‘ which contains lots of information about your schemas and tables among other things. Recently I wanted to know whether a table I use for storing the results of a large number experiments was any way near maxing out. To cut a brief story even shorter, the answer was “not even close” and could be found in ‘ information_schema.TABLES ‘. Not being one to avoid any opportunity to procrastinate, I went on to write a short script to produce a global overview of the entire database.

infomation_schema.TABLES contains the following fields: TABLE_SCHEMA, TABLE_NAME, TABLE_ROWS, AVG_ROW_LENGTH and MAX_DATA_LENGTH (and a few others). We can first have a look at the relative sizes of the schemas with the MySQL query “ SELECT TABLE_SCHEMA,SUM(DATA_LENGTH) SCHEMA_LENGTH FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema' GROUP BY TABLE_SCHEMA “.

Select All Code: 1 2 3 4 5 6 7 8 9 10 11 12 13 library("ggplot2") # You'll need ggplot2 0.9 for this. library("reshape2") library("RMySQL") connection

And for the whole overview, let’s break each schema down by tables:

Select All Code: 1 2 3 4 5 6 7 query



Also, using the AVG_ROW_LENGTH and MAX_DATA_LENGTH and assuming a relatively constant row length, we can derive the maximum number of rows that a table can use, which gives us an estimate of how much space there is left:

Select All Code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 query

Unless you are using very large tables, those last two graphs should come out pretty much all gray. You can check that the colouring works by using the commented out queries instead, which use random values for the estimates.

About dbConnect() : I left it here to make things easier to replicate but I normally call a simple function which is just a wrapper for it, with my username and password in. This way my credentials are in one single place instead of all over my scripts.

PS: This is my first anniveRsary! I’ve been using R for a year now. And I’m certainly planning to carry on.