Well, the whole process is pretty complicated, but I'll try and answer you without writing a post that's as many pages (see what I did there?) as the last Game of Thrones book.

Assumptions

For the sake of discussion, we'll all agree that pagination is basically a function of pageNumber * pageSize . That is, to get the current set of questions in a sorted list of n questions, you can multiply pageNumber by pageSize to offset the number of pages, and then add pageSize to return the current results. In our case, it's really (pageNumber - 1) * pageSize since page 1 is index 0.

In terms of sorting, you never have to completely sort the whole set to return the data. You can effectively only completely sort pageNumber * pageSize data to return the current page correctly sorted, with the rest of the data being partially sorted (such as in a Quick Sort 3 bucket implementation). Rather than sort everything and return the first n results of a set, you can fully sort only the first n results of a set and return those. Make sense? Good.

Also worth noting: the most expensive queries are always the middle pages. To get the last n pages is as cheap as getting the first n pages in a well-built system: just invert the sorting criteria. Getting pageNumber 1 by creation date descending is as easy as getting pageNumber n - 1 where n is the total page count by creation date ascending. This is an optimization that many sorting engines (database, search, etc.) employ, as do we.

Also, for the sake of this discussion, assume that a question is a post and vice versa since I'll use them interchangeably. Good? Good. OK, on to the fun stuff then.

Step 1: Tag Engine

We have a custom built .NET application called the Tag Engine which holds post IDs as well as metadata. Think of it as an inverted index that you can use to look up a post ID by its data (such as creation date, tags, score, etc).

Without overly simplifying the description, the Tag Engine is a .NET application that basically does set theory based on predicates. It takes sets of post IDs and intersects, unions, etc. with other sets of post IDs to get to a final result, which it can also sort in-memory based on meta data.

We query the Tag Engine with page number and page size and any predicates that limit the data (such as Site ID since the Tag Engine handles all network sites). It does in-memory set operations (like union and intersection), and then sorts the results, returning the relevant subset of post IDs.

The Tag Engine also caches the results (the larger set, not just the page you're asking for) and can short-circuit based on a cache key derived from a hash of the query (page number, page size, sorting, etc.) to quickly select a page from that particular cached result set. This helps immensely with performance.

Step 2: The Database

The Tag Engine does not contain the actual question data, just the ID and metadata. So, we take the result set of post IDs and query the SQL database for them. The query looks something like this:

Select p.*, pm.ViewCount, u.Id, u.ProfileImageUrl, ... From Posts p Join PostMetadata pm On p.Id = pm.PostId Left Join Users u On p.LastActivityUserId = u.Id Where p.Id In @Ids";

@Ids here is the list of IDs from the Tag Engine. A query like this brings back the actual post data for us to display. But we're not done yet.

Step 3: Semi-Redundant In-Memory Sort

As discussed above, the Tag Engine might return cached data (an optimization that helps it have awesome performance). However, by its nature, cached data is never guaranteed to be accurate (since it's a snapshot of the past state of things). By contrast, the database always has up-to-date, authoritative data. These sometimes conflict.

To solve this, we sort the resulting page of posts again in memory. This final step resolves the problem where the details of the posts on the given page (their metadata) may have changed in a way that the database can see but the Tag Engine cannot yet see due to caching.

This part isn't very exciting: it's basically an in-memory List<T>.Sort call passing in a function to determine equality. The equality function differs based on the page you're looking at: for Newest tab it compares post creation dates, but for Votes it compares score and answer votes, etc.

If we did not do this final step, posts might sometimes appear out of order on the page since they'd be sorted by the Tag Engine in a way that reflects their past values on things like score or last activity data, as opposed to their current values which the database returns.

Finally, we show you the list of questions!