$results = Cache :: get ( 'item_search' , 'all' ) ; if ( ! $results ) { /* A stored procedure's the only sql way to do this, so instead we'll just grab the attribute types right here */ $akPrice = CollectionAttributeKey :: getByHandle ( 'price' ) ; $akCategory = CollectionAttributeKey :: getByHandle ( 'category' ) ; $akStatus = CollectionAttributeKey :: getByHandle ( 'status' ) ; $akEra = CollectionAttributeKey :: getByHandle ( 'era' ) ; $akImages = CollectionAttributeKey :: getByHandle ( 'images' ) ; $dbQuery = <<< DBQ SELECT c . cID , cv . cvName AS title , PagePaths . cPath AS path , MAX ( atMultipleFiles . value ) AS images , MAX ( atCurrency . amount ) AS price , MAX ( atCurrency . code ) AS currency , Viewing 15 lines of 47 lines. View entire code block.

$results = Cache :: get ( 'item_search' , 'all' ) ; if ( ! $results ) { /* A stored procedure's the only sql way to do this, so instead we'll just grab the attribute types right here */ $akPrice = CollectionAttributeKey :: getByHandle ( 'price' ) ; $akCategory = CollectionAttributeKey :: getByHandle ( 'category' ) ; $akStatus = CollectionAttributeKey :: getByHandle ( 'status' ) ; $akEra = CollectionAttributeKey :: getByHandle ( 'era' ) ; $akImages = CollectionAttributeKey :: getByHandle ( 'images' ) ; $dbQuery = <<<DBQ SELECT c.cID, cv.cvName AS title, PagePaths.cPath AS path, MAX(atMultipleFiles.value) AS images, MAX(atCurrency.amount) AS price, MAX(atCurrency.code) AS currency, MAX(atEra.value) AS era, MAX(atCategory.value) AS category, MAX(atStatus.value) AS status FROM Pages p1 LEFT JOIN Pages p2 ON (p1.cPointerID = p2.cID) LEFT JOIN PagePaths ON (PagePaths.cID = p1.cID AND PagePaths.ppIsCanonical = 1) LEFT JOIN PageSearchIndex psi ON (psi.cID = if(p2.cID IS NULL, p1.cID, p2.cID)) INNER JOIN CollectionVersions cv ON (cv.cID = if(p2.cID IS NULL, p1.cID, p2.cID) AND cvID = (SELECT cvID FROM CollectionVersions WHERE cvIsApproved = 1 AND cID = cv.cID)) LEFT JOIN PageTypes pt ON pt.ctID = cv.ctID INNER JOIN Collections c ON (c.cID = if(p2.cID IS NULL, p1.cID, p2.cID)) LEFT JOIN CollectionAttributeValues cav ON (cav.cID = c.cID AND cav.cvID = cv.cvID) LEFT JOIN atMultipleFiles ON (atMultipleFiles.avID = cav.avID AND cav.akID = {$akImages->getAttributeKeyID()}) LEFT JOIN atCurrency ON (atCurrency.avID = cav.avID AND cav.akID = {$akPrice->getAttributeKeyID()}) LEFT JOIN (atSelectOptions atEra INNER JOIN atSelectOptionsSelected atEraSelected ON atEra.ID = atEraSelected.atSelectOptionID) ON (atEra.akID = {$akEra->getAttributeKeyID()} AND atEraSelected.avID = cav.avID) LEFT JOIN (atSelectOptions atCategory INNER JOIN atSelectOptionsSelected atCategorySelected ON atCategory.ID = atCategorySelected.atSelectOptionID) ON (atCategory.akID = {$akCategory->getAttributeKeyID()} AND atCategorySelected.avID = cav.avID) LEFT JOIN (atSelectOptions atStatus INNER JOIN atSelectOptionsSelected atStatusSelected ON atStatus.ID = atStatusSelected.atSelectOptionID) ON (atStatus.akID = {$akStatus->getAttributeKeyID()} AND atStatusSelected.avID = cav.avID) DBQ ; $dil = new DatabaseItemList ( ) ; $dil -> addToQuery ( $dbQuery ) ; $dil -> sortBy ( 'cvDatePublic' , 'desc' ) ; $dil -> groupBy ( 'c.cID' ) ; $dil -> filter ( 'pt.ctHandle' , 'item' ) ; $dil -> setItemsPerPage ( 5000 ) ; $results = $dil -> get ( 5000 ) ; Cache :: set ( 'item_search' , 'all' , $results , 300 ) ; }

I'm currently working on a site which required me to write a migration script to scrape an old drupal4 site (which I did in node.js, and could be the topic of another thread), and a concrete5 ENVIRONMENT_ONLY script to load all the pages I screen-scraped from the old site into c. My site was up and running well before the migration, but after moving over 2,300 new pages (a number which wouldn't make your typical performance expert bat an eye), one page with a big pagelist (the class, not the block) on it took so long to render, it would time out. Removing all time limits, it could take upwards of 4 minutes to run.This page would build a big grid, with preview images, of the full set of pages. I wanted the users to be able to sort, filter, and do basic searches inside this grid, and have it all update dynamically right in the browser. I coded all this up in pure JavaScript (e.g. not relying on jQuery), and could do all this filtering so fast there was no visible delay.While I was initially worried that listing so many pages all together would be slow on the client-side, that part ended up working very well. I had a new problem now: the server was taking a very, very long time to return my pagelist. I turned on mysql query logging ( http://dev.mysql.com/doc/refman/5.1/en/query-log.html... ) and then the problem became abundantly clear: the log of just the queries (not including any actual results) of loading the page 1 time, was over 6.4MB. Unsurprisingly, running that many queries took a long, long time and put huge overhead against my server.Normally I'd turn to caching first to solve a problem like this, but to cache you need to at least be able to render your results when your cache is empty, and that just couldn't happen. Instead, I looked into the query log and saw a massive amount of optimization that could be done. The PageList was essentially selecting a list of cIDs, turning each of those cIDs into a Page object, looking up each Page, then looking up each AttributeValue that I needed. When I was listing ~50 pages, this would run in far less time than all the basic image loading/page downloading, so to the end-user the perceived delay from this was negligible. At 2300+ pages, it took longer than anyone would wait, and strained the server + DB so much that it would certainly crash the whole site.My solution? Roll everything up, as much as possible, into one big query that returns all the attributes.First thing, I switched my PageList for a DatabaseItemList. I looked into the PageList source, copied out the query it was building, and set that in my DatabaseItemList::addToQuery(). I then added Database::setDebug(true); Database::setDebug(false); around the code which fetched each attribute in order to grab its SQL. I rewrote these as LEFT JOINs in my SQL as best as I could.Getting the attribute key IDs turned out to be extremely tricky, so I eventually gave up on trying to do that in pure SQL, and just fetched them from the CollectionAttributeKey objects. The akIDs only require one lookup at the start of the query, so it shouldn't matter if I do a few extra lookups to get them.After adding the attribute names to the front my my select, I updated the code I used to have running $page->getAttribute() to instead grab the attribute my query returned.I put it all together, and the 4 minute page dropped to a clunky, but manageable 5 seconds. I threw a few Cache::set()s and get()s around the query results (timing out every 5 minutes), and around the rendered results of each page in the list. The page load time uncached climbed to ~10 seconds, but cache it and voila - whole thing loads in 40ms. Almost nothing.My SQL skills aren't as sharp as they were 5 years ago, so please feel free to recommend any updates I can do to this. My big next-steps are:- How can I generalize this some more? It would be awesome to take out all the LEFT JOINs I'm using to load each attribute, and include them as a standard function in the Attributes themselves (similar to the load() and getValue() functions' select statements in the AttributeTypeControllers, but one to add to the list of JOINs)- Am I missing some obvious, or at least more c5-standard, way to do something similar?- Please review my code and make suggestions for any updates, so I can expose this work as a howto