There’s one thing that you can do terribly wrong when working with RDBMS. And that thing is not running your calculations in the database, when you should.

We’re not advocating to blindly move all business logic into the database, but when I see a Stack Overflow question like this, I feel the urge to gently remind you of the second item in our popular 10 Common Mistakes Java Developers Make When Writing SQL.

The Stack Overflow question essentially boils down to this (liberally quoted):

From the following medium-sized table, I wish to count the number of documents with status 0 or 1 per application ID: AppID | DocID | DocStatus ------+-------+---------- 1 | 100 | 0 1 | 101 | 1 2 | 200 | 0 2 | 300 | 1 ... | ... | ... Should I use Hibernate for that?

And the answer: NO! Don’t use Hibernate for that (unless you mean native querying). You should use SQL for that. Es-Queue-El! You have so many trivial options to make your SQL Server help you run this query in a fraction of the time it would take if you loaded all that data into Java memory before aggregating!

For instance (using SQL Server):

Using GROUP BY

This is the most trivial one, but it might not return result in exactly the way you wanted, i.e. different aggregation results are in different rows:

SELECT [AppID], [DocStatus], count(*) FROM [MyTable] GROUP BY [AppID], [DocStatus]

Example on SQLFiddle, returning something like

| APPID | DOCSTATUS | COLUMN_2 | |-------|-----------|----------| | 1 | 0 | 2 | | 2 | 0 | 3 | | 1 | 1 | 3 | | 2 | 1 | 2 |

Using nested selects

This is probably the solution that this particular user was looking for. They probably want each aggregation in a separate column, and one very generic way to achieve this is by using nested selects. Note that this solution might prove to be a bit slow in some databases that have a hard time optimising these things

SELECT [AppID], (SELECT count(*) FROM [MyTable] [t2] WHERE [t1].[AppID] = [t2].[AppID] AND [DocStatus] = 0) [Status_0], (SELECT count(*) FROM [MyTable] [t2] WHERE [t1].[AppID] = [t2].[AppID] AND [DocStatus] = 1) [Status_1] FROM [MyTable] [t1] GROUP BY [AppID]

Example on SQLFiddle, returning something like

| APPID | STATUS_0 | STATUS_1 | |-------|----------|----------| | 1 | 2 | 3 | | 2 | 3 | 2 |

Using SUM()

This solution is probably the optimal one. It is equivalent to the previous one with nested selects, although it only works for simple queries, whereas the nested selects version is more versatile.

SELECT [AppID], SUM(IIF([DocStatus] = 0, 1, 0)) [Status_0], SUM(IIF([DocStatus] = 1, 1, 0)) [Status_1] FROM [MyTable] [t1] GROUP BY [AppID]

Example on SQLFiddle, same result as before

Using PIVOT

This solution is for the SQL Aficionados among yourselves. It uses the T-SQL PIVOT clause!

SELECT [AppID], [0], [1] FROM ( SELECT [AppID], [DocStatus] FROM [MyTable] ) [t] PIVOT ( count([DocStatus]) FOR [DocStatus] IN ([0], [1]) ) [pvt]

SQL aficionados use PIVOT

Example on SQLFiddle, same result as before

Conclusion

You may freely choose your weapon among the above suggestions, and I’m sure there are more alternatives. All of them will outperform any Java-based aggregation implementation by orders of magnitude, even for trivially small data sets for sure. We’ll say this time and again, and we’ll quote Gavin King time and again for the same thing:

Just because you’re using Hibernate, doesn’t mean you have to use it for everything. A point I’ve been making for about ten years now.

And in our words: