Massive is the data layer I wish I had written. It’s small enough to just drop into any project and sensible enough that you won’t have a massive learning overhead. This is exactly what I needed when I got handed a legacy project recently and needed to make some page loads faster.

SQL Profiler reported an insane number of database calls for what looked like a relatively simple (but frequently visited) page in this application. Delving into the code left me feeling giddy as I navigated through layers of object oriented obscurity down to a custom data layer, finally discovering the root of the problem.

Recreating a cut down version of this application will afford us the pleasure of bypassing a whole heap of code with Massive. Rob Conery has my thanks and an open offer for some beers on my dime if he’s ever in London.

The Scenario

For argument’s sake, we will say we are writing a Sales system for a large organisation with a central office and several regional locations. Central needs an application to collect data from the regions. The spec calls for us to hold one row per item sold per region per quarter. A quick and dirty data diagram might look like this:

Sales are, as mentioned, line items; one item sold in one region in one quarter = one row in the Sales table. Let’s not argue too much about the data structure. It’s good enough for now.

The Code

Stored Procedure

In keeping with the wisdom of the time when the app was written, stored procedures are used – here’s our “get” proc for sales:

CREATE PROCEDURE [dbo].[GetSales] @Id uniqueidentifier = null AS BEGIN SET NOCOUNT ON; IF @Id is null BEGIN SELECT * FROM Sales END ELSE BEGIN SELECT * FROM Sales WHERE Id = @Id END END

We’ve only one parameter in this procedure. In reality, the app I’m working on has ten or twelve with logic in the stored procedure to match. The “if” statements, as you can imagine, multiply enough to confuse the most battle hardened SQL Guru. Ok, maybe not Joe Celko, but certainly the rest of us.

Data Layer

We need an interface – interfaces are good.

namespace DataLayerCatastrophe.DataLayer { public interface IDataAccess { IDataReader GetById(Guid id); IDataReader GetByParameters(DbParameter[] parameters); } }

Every class in our data layer will implement our interface. It’s pretty scanty up there, more messy in real life.

Now, of course, our implementation in the form of a Sales class:

namespace DataLayerCatastrophe.DataLayer { class Sales : IDataAccess { public IDataReader GetById(Guid id) { SqlParameter[] p = new SqlParameter[] { new SqlParameter() { ParameterName = "@Id", Value = id } }; return GetByParameters(p); } public IDataReader GetByParameters(System.Data.Common.DbParameter[] parameters) { return Common.Common.GetDataReader(parameters, "GetSales"); } } }

The actual Sales class fleshes things out; GetById will return a single row from the database based on the Id passed in (Guids all round). GetByParameters calls out a common function that creates a connection, command, executes a stored procedure and returns the result. A simple (and probably badly coded) implementation could look like this:

namespace DataLayerCatastrophe.Common { class Common { public static IDataReader GetDataReader(DbParameter[] p, String procName) { SqlConnection conn = new SqlConnection(ConfigurationManager.ConnectionStrings["production"].ConnectionString); SqlCommand cmd = new SqlCommand(procName,conn); cmd.CommandType = CommandType.StoredProcedure; foreach (SqlParameter parm in p) { cmd.Parameters.AddWithValue(parm.ParameterName, parm.Value); } conn.Open(); SqlDataReader reader = cmd.ExecuteReader(); return reader; } } }

Business Layer

So far so good. Now we need a business layer. No interface this time, I’m not sure why.

namespace DataLayerCatastrophe.BusinessLayer { public abstract class BusinessObject { protected IDataReader GetDataReader(Guid guid, IDataAccess dataAccess) { return dataAccess.GetById(guid); } } }

namespace DataLayerCatastrophe.BusinessLayer { class Sales : BusinessObject { public Guid Id { get; set; } public Guid RegionId { get; set; } public Guid ItemId { get; set; } public Guid QuarterId { get; set; } public decimal PriceSoldAt { get; set; } protected void Populate(Guid id) { IDataReader dataReader = GetDataReader(id, new DataLayer.Sales()); while (dataReader.Read()) { Id = dataReader.GetGuid(dataReader.GetOrdinal("Id")); RegionId = dataReader.GetGuid(dataReader.GetOrdinal("RegionId")); ItemId = dataReader.GetGuid(dataReader.GetOrdinal("ItemId")); QuarterId = dataReader.GetGuid(dataReader.GetOrdinal("QuarterId")); PriceSoldAt = dataReader.GetDecimal(dataReader.GetOrdinal("PriceSoldAt")); } dataReader.Close(); } public Sales(Guid id) { Populate(id); } } }

And hey presto – we now have class that can inflate itself to represent a single row of Sales data. The constructor takes a guid, this then calls the Populate method which in turn calls the GetDataReader method on the base class, passing in the guid and an instance of the Datalayer.Sales class. GetDataReader makes use of our IDataAccess interface reducing our coupling.

All is peaceful in the land of sales until we want to return a list of all sales data. At which point, for some unknown reason, those who designed the system figured this would be the most logical implementation:

namespace DataLayerCatastrophe.DataLayer { class ListCatastrophe { private static List GetIdList(SqlParameter[] p, IDataAccess dataAccess) { List guidList = new List (); IDataReader reader = dataAccess.GetByParameters(p); while (reader.Read()) { guidList.Add(reader.GetGuid(0)); } return guidList; } public static List GetSalesList() { List sales = new List (); SqlParameter[] p = new SqlParameter[0]; List ids = GetIdList(p, new DataLayer.Sales()); foreach (Guid id in ids) { sales.Add(new BusinessLayer.Sales(id)); } return sales; } } }

What we have here is a class that will, eventually, implement one (or more) methods for each business object, returning lists of objects as and when we need them. The lovely method GetSalesList first uses the GetIdList to query the database and return a list of each and every Id in the Sales table by calling the GetByParameters in our Sales class in the Data Layer. The Sales class, if you remember, calls out to our Common GetDataReader function which actually calls the stored proc, passing in the parameters. In this case, there are none and so, the lovely if statement in the stored procedure falls into a ‘Select *’.

Back in GetSalesList, we loop through the list of guids and construct a list of Business Layer Sales objects, instantiating each one by passing in its Id field. Our business object, of course, inflates itself through the hierarchy of objects, calling eventually, the stored procedure with a parameter.

In case this wasn’t clear, to return a list of Sales objects, we first hit the database for a list of guids, then hit the database once for each row of sales data (selecting a single row based on it’s guid), inflating the business objects one by one. As you can appreciate, this pattern makes for a gradual slow down as the dataset increases. The application literally collapses under it’s own weight.

The kicker

As if this wasn’t enough, the consuming code that rendered the page was doing this:

int counter; counter = ListCatastrophe.GetSalesList().Count;

Having gone to all the trouble of bringing back sales objects one by one, all we use the list for is a count.

Endemic

This pattern is endemic in this application. It’s the reason I decided to implement a new datalayer rather than try and patch what is currently there. Adding to the weight of the layers of code seemed to me, counter productive when what I really want is to redesign the data and business logic layer. I considered bolting on some methods in ListCatastrophe to return various aggregates. I also considered changing some of the base classes, either in the data layer or business layer, but it felt like I was just bloating an already over complicated situation.

Clean up

After I decided to side-step the current application anatomy, I thought about writing my own minimal data layer. Then, I stumbled on Massive and thought I would give it crack first.

Massive relies on the .NET DynamicObject and ExpandoObject. Dropping the single file in our application allows us to (among other things) execute a quick piece of SQL like this:

var db = Massive.DynamicModel.Open("production"); int counter = Convert.ToInt32(db.Scalar("Select count(*) from Sales"));

One database hit no matter how many rows. Much better. It feels good to regain control. Although I’m perhaps now more tightly coupled, I think stripping all the way back to this is a good thing. Backfilling the application a little at a time will allow me to refactor code incrementally rather than try and second guess a massive (hehehe) upfront design that will be just as bad as the one I’m replacing. I will most probably end up with a new structure that will have model classes that inherit from Massive’s DynamicModel (see Rob’s doc’s), into which I’ll move the various business layer and stored procedure logic (since in fact the logic encoded into the stored procs is pretty much the business logic of the app).

Most importantly, my users now have a super fast page load and hopefully, a restored faith in the power of code to help them do their jobs more efficiently.