I just got back from a month abroad, and I wonder if I’ve forgotten how to code. I ask around the office, how has it been? Did anything break while I was away?

Not only did nothing break, my boss was so bored that he’s been bothering the office manager and taking extra days off. I should probably investigate and find out what his deal is, but first I have a burning question.

What cars sell quickly on the used car market in Victoria, and what do they sell for?

This info is available through UsedVictoria – kind of. They have an RSS feed. So you can enter some search terms and they’ll show you the last 25 items posted with those criteria. You can’t page through the results, so that doesn’t give me any historical data. Any info I gather will start from today. But I can scrape the feed and save it in my own database, and graph it later once I have a body of data. Ok, here we go.

<?php $url = "http://www.usedvictoria.com/index.rss?category=cars&seller_type=private&pricefrom=400&priceto=6000"; if($xml = simplexml_load_file($url)) { $counter = 0; foreach ($xml->channel->item as $k => $v) { $select = 'select `title` from item where `title` = "' . $v->title . '"'; $selected = $mysqli->query($select); $v->pubDate = date("Y-m-d H:i:s", strtotime($v->pubDate)) . " <br>"; $selected = $selected->fetch_assoc(); if(empty($selected)){ extract((array)$v); $description = $mysqli->real_escape_string($description); $link = $mysqli->real_escape_string($link); $insert = 'insert into item (`title`, `details`, `entry_date`, `search_url`, `price`) values ("' . $title . '", "' . $description . '", "' . $pubDate . '", "' . $link . '", ' . substr($title, strrpos($title, '$') + 1) . ')'; if(!$mysqli->query($insert)) { printf("Error: %s<br>", $mysqli->error); } $counter ++; } } echo $counter. "new items."; } ?> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 & lt ; ? php $ url = "http://www.usedvictoria.com/index.rss?category=cars&seller_type=private&pricefrom=400&priceto=6000" ; if ( $ xml = simplexml_load_file ( $ url ) ) { $ counter = 0 ; foreach ( $ xml - & gt ; channel - & gt ; item as $ k =& gt ; $ v ) { $ select = 'select `title` from item where `title` = "' . $ v - & gt ; title . '"' ; $ selected = $ mysqli - & gt ; query ( $ select ) ; $ v - & gt ; pubDate = date ( "Y-m-d H:i:s" , strtotime ( $ v - & gt ; pubDate ) ) . " <br>" ; $ selected = $ selected - & gt ; fetch_assoc ( ) ; if ( empty ( $ selected ) ) { extract ( ( array ) $ v ) ; $ description = $ mysqli - & gt ; real_escape_string ( $ description ) ; $ link = $ mysqli - & gt ; real_escape_string ( $ link ) ; $ insert = 'insert into item (`title`, `details`, `entry_date`, `search_url`, `price`) values ("' . $ title . '", "' . $ description . '", "' . $ pubDate . '", "' . $ link . '", ' . substr ( $ title , strrpos ( $ title , '$' ) + 1 ) . ')' ; if ( ! $ mysqli - & gt ; query ( $ insert ) ) { printf ( "Error: %s<br>" , $ mysqli - & gt ; error ) ; } $ counter ++ ; } } echo $ counter . "new items." ; } ? & gt ;

What’s going on: we ask UsedVic’s server for all the car ads they have with the following criteria: private sales only, priced between $400 and $6000. This returns a few hundred results, but the RSS feed will only give us 25. Oh well. I save each item in my database – the title, description, search criteria, date it was posted, and price.

The price, mind you, is in no way accurate. When I buy a car on UsedVic, I haggle. People who buy from me usually try to haggle as well. So the actual selling price is likely a couple hundred lower than what’s listed, in most cases. However, it does give you a little information. The listed price is the starting point of negotiation, so people generally won’t go see a car unless the listed price is at least in the realm of reality and their budget. If someone has set the price of a vehicle unrealistically high, the car will sit on the market for a long time.

This happened in the case of a rather spectacular Suzuki DRZ400SM that was posted last winter. It had several fancy racing upgrades, combined with an expensive but utterly tasteless metallic pink paint job. If I could have had that bike for $2000, I would have done so in a second even it was covered in swastikas and cocks. Paint is cheap. But the guy wanted $6500 for it, and the ad stayed up for something like 3-4 months. (Ask me how I know. Yes, I was checking every week. I have a problem. Don’t worry about me, mow your own lawn.)

So how to deal with the fact that we only get 25 items returned? Easy – check back every 2 hours. Usually 2-6 items are posted per hour. So I add a check to see if each item is already in my database, and if it isn’t, add it. This code is kind of inefficient. If any of the real programming nerds get hold of it, I will be embarrassed. But it’s good enough for my purposes, for now, probably.

So I wrote a cronjob to run this every 2 hours and report back to me when it’s done. Okay, cool.

Now how do we track items that are sold? It’s tough to count something that isn’t there. Not too tough though. More code:

<?php $select_unsold = "select * from item where `sold` = 0"; $unsold = array(); $marked_sold = array(); if($results = $mysqli->query($select_unsold)) { while($row = $results->fetch_assoc()) { $unsold[] = $row['title']; } foreach($unsold as $k => $v) { $unaltered_title = $v; $v = substr($v, 0, strrpos($v, "- $")); $strip_chars = array('$','!', '?'); foreach($strip_chars as $char) { $v = str_replace($char, '', $v); } $search = 'http://www.usedvictoria.com/index.rss?category=cars&description=' . urlencode($v); if($xml = simplexml_load_file($search)) { if(empty($xml->channel->item)) { //echo "Sold: $unaltered_title <br> $v<br><br>"; $mark_sold = 'update item set `sold` = 1, `sold_date` = NOW() where `title` = "' . $unaltered_title . '"'; $marked_sold[] = $unaltered_title; if(!$mysqli->query($mark_sold)) { printf("Error: %s<br>", $mysqli->error); } } } } } else { printf("Error: %s<br>", $mysqli->error); } echo strlen($mark_sold) . "Items marked sold." ?> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 & lt ; ? php $ select_unsold = "select * from item where `sold` = 0" ; $ unsold = array ( ) ; $ marked_sold = array ( ) ; if ( $ results = $ mysqli - & gt ; query ( $ select_unsold ) ) { while ( $ row = $ results - & gt ; fetch_assoc ( ) ) { $ unsold [ ] = $ row [ 'title' ] ; } foreach ( $ unsold as $ k =& gt ; $ v ) { $ unaltered_title = $ v ; $ v = substr ( $ v , 0 , strrpos ( $ v , "- $" ) ) ; $ strip_chars = array ( '$' , '!' , '?' ) ; foreach ( $ strip_chars as $ char ) { $ v = str_replace ( $ char , '' , $ v ) ; } $ search = 'http://www.usedvictoria.com/index.rss?category=cars&description=' . urlencode ( $ v ) ; if ( $ xml = simplexml_load_file ( $ search ) ) { if ( empty ( $ xml - & gt ; channel - & gt ; item ) ) { //echo "Sold: $unaltered_title <br> $v<br><br>"; $ mark_sold = 'update item set `sold` = 1, `sold_date` = NOW() where `title` = "' . $ unaltered _ title . '"' ; $ marked_sold [ ] = $ unaltered_title ; if ( ! $ mysqli - & gt ; query ( $ mark_sold ) ) { printf ( "Error: %s<br>" , $ mysqli - & gt ; error ) ; } } } } } else { printf ( "Error: %s<br>" , $ mysqli - & gt ; error ) ; } echo strlen ( $ mark_sold ) . "Items marked sold." ? & gt ;

You can also search for an ad using the exact title, and you’ll generally get back only that one exact result. Since I have the titles saved, I can do that. In my database, I have a column labelled “sold”. When the item is entered, that column is set to false.

This script gets a list of all items in my database that have “sold” set as false, and searches for them. If an item is not found, “sold” gets set to true, and the current date is recorded as well.

Listings expire after thirty days, which should help keep this script from getting out of control as it hits the UsedVictoria servers over and over again. After a few weeks, I’ll have hundreds of unsold listings in my database, and once a day, the script will request every single one of them. I may end up blocked by UsedVic pretty soon if the numbers get too high. But no worries for now.

Finally, I want the data in a manageable, bite-sized format. You can see the results at rocketships.ca/srs/scraper.

The table lists all cars that were posted and sold within the last thirty days, the number of days each was on the market, and the price requested. It’s still not as fine grained as I would like, but I think this will give me at least a vague answer to my question – which cars are hot in Victoria?

My theory: Miatas. I await hard data.

Next step is to decide what to do with the data. I had an idea about buying cars from the mainland, where they’re a bit cheaper, and flipping them locally. I feel like I could make a small profit doing this. However, to make it worth my time, I need a profit of at least $300 – that’s what I pay myself for working on the weekends.

So let’s say I bought a 1990 Miata for $1800 on the mainland and sold it for $2300 here. Those numbers are realistic, based on my experience to date.

I have to go the mainland ($18 there, $74 back), get my pedestrian ass to the seller’s house somehow (unless they’re kind enough to meet me at the ferry), check the car to make sure it isn’t shit (Honestly, what do I know? Not much, man. I’m a writer, not a mechanic.), and then the really fun part – figuring out insurance.

If I transfer my own car insurance to the new car, I will have to pay GST on the car. 12% of 1800 is $216. Big chunk of my profit gone right there. I can maybe get a temp permit, which is only $30 or so, but then I really have to get home smartly on the next boat, I don’t get to joyride the car while I’m waiting for a seller, and I can’t allow buyers to take a test drive. The insurance costs some money, so does registration, so does the plate if I haven’t organized that properly, and you have to pay a fee if you insure for less than a full year or if you want to pay monthly. Costs maybe $1200 for the full year? depending on the car and whether my points have expired yet.

There are ways around these issues, namely, lyin’ and breakin’ the law. I’ve bought and sold something like 20 vehicles over the years, and I will not claim perfect observance of the rules. But any business plan that relies on illegal behaviour is a bad idea, and out of the question according to my principles.

So out of $500 profit, I might get to keep $300, legally. But that’s not the main issue. You can only get away with buying and selling a certain number of cars per year before the government starts to get suspicious. I think it might be around 6. After that you have to get a dealer’s license, which is ex$pen$$ive. I’m not really prepared to go down that road – messing around with cars is alright for a hobby, but used car salesmen are considered the scum of the earth by most humans, and for good reasons. So I would have to find a way to respect myself for doing it, first. My friend Dylan does it by specializing in high end racecars. My friend Ben doesn’t do used cars – he’s an honest to god legit new car salesman who provides warranties. But there are others who are pretty slimy.

With that problem unsolved, I have made the data and the code publicly available. Enjoy.