Introduction

Building web sites/apps to serve millions visits per day is a real challenge, specially when it comes to keeping response time as low as possible. News websites are perfect example for high load web sites/apps. Building few of them and keeping it in good shape, required us to redesign/rewrite our data access layer from scratch.

“Rahawan” is a new revamped design for the data access layer, helped us improve our website response time from 250 milliseconds down to just 10 milliseconds. At the same time reducing database load from 800 patch requests/second down to just 150 patch requests/second.

Long Story Short

At its simplest form, “Rahawan” is nothing more than a data access layer powered by a data caching engine. Its main purpose is to instantly provide a fully loaded “Data Model” required to respond to web requests (like browsing some webpage).

The process of providing “Data Model” is done -most of the time- as a direct key value retrieve operation. Any change in database records triggers background tasks inside Rahawan to rebuild the “Data Models” related to the modified database records.

Note: before talking about Rahawan itself, I will do a flashback to show how the solution evolved. if you wish, you can skip iteration one & two.

Iteration I: Messy slow code :)

Like many developers, I did engage in some projects where all lights were “green” any team member can write any code anywhere by anyway. As long as it resulted in a working-website.

The client in a rush to launch their website. Usually this generates lot of pressure on development team, At this iteration doing anything but building a working-website is less priority. No talk about code quality or better coordination between team members. Which leads to fat Controllers communicating with database directly. This is a simplified example:

As shown, this is just the perfect start for “Spaghetti Code”. Controller is communicating with database directly. Data is transferred from the Controller to the View by two different ways (ViewModel and ViewBag). To do partial rendering inside View, “Html.RenderAction” has been used once, and “Html.Partial” has been used in another.

Such code hardly can survive the continuous change requests and bug fixing done by many team members. When some developer fix a bug here, new bug is generated there. If team managed to shoot down all bugs, no one could make an end for bad performance issue. All this consumes a great amount of time and effort to reach near-stable website.

At the end of this iteration, all team agreed that they need some rest for breathtaking and to think about better solutions for such problems. Which will drive us to:

Iteration II: Cleaned Up Code

After facing many problems in previous iteration, the team got it. Not all green lights are good, some red flags have to be put on roads leading to problems, and to follow some unified design guidelines.

The two following points, are real examples for red flags our team agreed on:

Do not use ViewBag or ViewData unless it is really necessary

Do not use Html.RenderAction unless it is really necessary

This is our simplified example after passing the second iteration

Code has been clean up by:

Use Unit Of Work pattern to encapsulate data retrieving logic, which will unlock the ability to optimize how data is retrieved without touching Controllers.

Use Dependency Injection to provide the Unit Of Work required for the Controller.

Use “ViewModel” as the only way to transfer data from Controller to View, having strict model types for Views helps developers make changes with more confidence & less unexpected behaviors.

Eliminate unneeded calls to “RenderAction” as it creates new Controller instance and take longer & slower path than “RenderPartial”. Until there is a real need to do so, “RenderPartial” is our first choice.

Use output cache, this will guide ASP.NET process NOT to call “Index Action” unless 60 seconds elapse from last call to the same action, and to serve web request — before 60 seconds elapse — from ASP.NET cache.

Iteration III: Rahawan

To better understand, let’s interview it.

Me: hi

Rahawan: hi there

Me: Can you please introduce yourself?

Rahawan: I’m an enhanced data access layer.

Me: Why do you claim that you are “enhanced”, what sets you apart from any other data access layer?

Rahawan: I’m designed to utilize a unidirectional data flow, as much as possible.

Me: What do you mean by utilizing a unidirectional data flow?

Rahawan: Before answering, let me first explain the problem I’m here to address:

Usually when some webpage is requested, and there is no cached version to serve, the server-side page controller will use a traditional data access layer to fetch needed data from the database, then prepare it for the page. All this logic is done isolated from any other pages logic. Which causes duplication in getting same data from the database, although it didn’t change for a while.

the server-side page controller will use a traditional data access layer to fetch needed data from the database, then prepare it for the page. All this logic is done isolated from any other pages logic. Which causes duplication in getting same data from the database, although it didn’t change for a while. For example, three news webpages -written by the same author- has been requested at the same time. Each page will query the database to get its own copy of author data, there are 3x3 issues in this example :)

* 3 database queries for the same exact data in the same time.

* 3 copies of the same author data in web server’s memory.

* 3 slower-than-needed webpages.

Me: That’s about the problem, what about your solution?

Rahawan: My solution is to always move modifications done on database items to web server’s in-memory caching engine, even if this data items has not been requested from presentation layer yet. This will ensure a mostly-unidirectional data flow, from the database to the memory to the presentation layer.

Me: And how this will be achieved?

Rahawan: By this design:

Rahawan design diagram

Me: Can you explain it?

Rahawan: Sure, it shows the 4 components used to accomplish my mission:

Change Detector: Its responsibility is to detect any change happening in the database and to inform the Cached Repository with.

Its responsibility is to detect any change happening in the database and to inform the Cached Repository with. Cached Repository: Its responsibility is to serve data requests by coordinating between Cache Engine and the Repository. Used as single point of contact for Rahawan.

Its responsibility is to serve data requests by coordinating between Cache Engine and the Repository. Used as single point of contact for Rahawan. Cache Engine: Its responsibility is to hold data in-memory.

Its responsibility is to hold data in-memory. Repository: Its responsibility is to communicate with the database to fetch data.

Me: But any ORM such as Entity Framework can do that, why all this?

Rahawan: There is great difference, Entity Framework for example , is not designed as thread safe, which means multiple web requests can NOT be safely handled using same Entity Framework context. While Rahawan is designed to be thread safe.

Me: Thread safe issue can be handled by a way or another, and continue using the ORM instead of such complexity.

Rahawan: This way you only solve half of the puzzle, how about the other half?

Me: What other half?

Rahawan: Slow database queries. You only need five minutes monitoring for live database queries -using sp who is active- before you can clearly tell that ORMs generally and Entity Framework for example, can’t efficiently translate complex queries. Which can cause many time-out errors specially on high load web sites/apps.

Me: And how did you handle the slow database queries issue?

Rahawan: by separating data caching logic from data retrieving logic, this will give us greater control on how data is retrieved (using ORM or 3rd party libraries like Dapper). Which unlocks the ability to rewrite specific queries manually to increase their efficiency without affecting the way it is being cached.

Me: How can the 4 components you mentioned earlier work together?

Rahawan: Take this -real- example

Some author publish news item.

The news item is saved in the database.

Change Detector picks up that some news item has been published, and call Cached Repository to get that news item before it is being requested from presentation layer.

Cached Repository retrieves the news item from the database and pass it to the Cache Engine.

One of the webpage controllers requested the news item from Cached Repository, which provided it directly from Cache Engine.

Controller paid no cost to get such news item from the database, which reflected as better response for website users.

Me: This means you will copy the entire database inside Cache Engine?

Rahawan: Of course not, there is some way to decide which data will be kept in Cache Engine and which data will be not.

Me: Details about such way?

Rahawan: Although this is being customized based on situation. But for example, any news items published during last month most probably will be in high demand. Then comes the priority of news items with high visits rate. Those can be kept in Cache Engine while others can be retrieved from the database on-demand.

Me: This means it is not always unidirectional data flow.

Rahawan: That’s why “Mostly-Unidirectional” was mentioned. As large percentage of data requests (based on real world situation) will be unidirectional, and less percentage will be retrieved from the database before being served.

Me: If some item has been retrieved on-demand from database, what is the cache invalidation policy for?

Rahawan: Cached items -as individual items- has no invalidation policy. Instead, Cache Engine capacity is monitored. Once max capacity is reached, least requested items is removed.

Me: I got your idea, what about the implementation?

Rahawan: Although the current 4 components implementation can be changed over time, but these are the current details:

Change Detector: The current implementation performs periodical SQL queries against the database to find out which records has been modified. SQL queries uses modified date to get these records, Hangfire is being used to schedule & run these periodical queries in background threads within the same web site/app.

The current implementation performs periodical SQL queries against the database to find out which records has been modified. SQL queries uses modified date to get these records, Hangfire is being used to schedule & run these periodical queries in background threads within the same web site/app. Cached Repository: Very thin layer has no special implementation, just have reference to Repository & Cache Engine to coordinate between. And manipulate data if needed.

Very thin layer has no special implementation, just have reference to Repository & Cache Engine to coordinate between. And manipulate data if needed. Repository: Both Dapper & Entity Framework is used to communicate with database, and there is important note here:

* One of Rahawan objectives is to reduce SQL joins as much as possible. For example, the news SQL select query will get only the author id. Then, author data will be requested from “Authors Cached Repository”. Most probably author data already exists in Caching Engine. so we win twice, #1 by reducing SQL joins and #2 by not to overloading database with duplicate queries.

Both Dapper & Entity Framework is used to communicate with database, and there is important note here: * One of Rahawan objectives is to reduce SQL joins as much as possible. For example, the news SQL select query will get only the author id. Then, author data will be requested from “Authors Cached Repository”. Most probably author data already exists in Caching Engine. so we win twice, #1 by reducing SQL joins and #2 by not to overloading database with duplicate queries. Cache Engine: Built using static ConcurrentDictionary for its simplicity and the amount of control it provides over data stored inside.

Me: What about the results?

Rahawan: At server-side level, Mini Profiler has been used to measure response time, this is an example for the difference:

Website response time before & after using Rahawan

Rahawan: At the database level:

Database usage before & after using Rahawan

Rahawan 2.0

Currently, there is a plan to rebuild the Change Detector based on Publish/Subscribe pattern, instead of periodical SQL queries. And other plan for Redis powered Cache Engine, and a lot more.

At Last:

Rahawan is not the silver bullet to eliminate any web site/app performance issues, it was born from news-based websites challenges, and it can be used in other web sites/apps facing similar challenges. Web sites/apps facing different challenges will need more suitable solution. It may be Rahawan or modified version of, or even something else.

This article talked about the design of Rahawan, and didn’t dig so much on our implementation. As this would require many articles to share our implementation details, and issues encountered during development. Such as memory leaks and sudden process termination just to discover after deep diagnosing journey that it was unhandled stack-overflow exception. will try to write about these details in other articles if needed.

Waiting for your feedback & comments

Thank You, Have a good day :)

Ahmed Mozaly