Learning Structured Query Language, abbreviated as SQL, will be the most technically difficult part of this course. But for good reason: if you can read and write SQL, then not only have you learned a whole new language, you have also inadvertently become a programmer, one who is (clumsily) on the path of telling computers what to do rather than the other way around.

I've written a series of tutorials for this class on SQL. The good news is that if you've mastered spreadsheets and pivot tables, then you're already clever enough to know all the concepts that underlie our use of SQL. The bad news is, you certainly won't feel very clever as you diagnose one obtuse syntax error after another. But that can be overcome with the kind of repetition and practice you need to master any new language.

SQL in the wild

We've looked at important data that was scarce and unimportant data that was plentiful. But sometimes data that is both important and plentiful is just sitting there unexamined because no one knows quite knows what to do with it. In their investigation, "Unfit for Duty", Sarasota Herald-Tribune reporters Matt Doig's and Anthony Cormier's first step was to analyze Florida's police misconduct database:

This was a case where the government had this wonderful, informative dataset and they weren’t using it at all except to compile the information. I remember talking to one person at an office and saying: “How could you guys not know some of this? In five minutes of (SQL) queries you know everything about these officers?” They basically said it wasn't their job. That left a huge opportunity for us. – Matthew Doig

SQL Hell

Why is SQL so frustrating to learn? It being a language, learning it involves a fair amount of tedious memorization and practice. The level of existential frustration you may experience will be similar to what it feels like to spend years learning French only to be able to say things less intelligently than a 5-year-old.

Why learn a whole new language? Sometimes, feelings are just more succinctly expressed in a particular language, such as Sacré bleu or J'adore le fromage dans la bibliothèque.

SQL lets us express our data desires in a structured way:

SELECT Descript, Date , Time , Location FROM sfpd_incidents WHERE Category= " ASSAULT " AND Date > " 2012 " ;

Which results in a data table like this:

Descript Date Time Location BATTERY 2012-01-01 00:25 200.0 Block of MARKET ST AGGRAVATED ASSAULT WITH A KNIFE 2012-01-01 02:07 600.0 Block of EDDY ST AGGRAVATED ASSAULT WITH BODILY FORCE 2012-01-01 01:20 300.0 Block of MASON ST INFLICT INJURY ON COHABITEE 2012-01-01 01:20 300.0 Block of MASON ST INFLICT INJURY ON COHABITEE 2012-01-01 02:00 DRUMM ST / SACRAMENTO ST BATTERY 2012-01-01 02:05 200.0 Block of KING ST BATTERY 2012-01-01 22:22 400.0 Block of GEARY ST BATTERY 2012-01-01 03:35 800.0 Block of MARKET ST BATTERY 2012-01-01 12:50 1500.0 Block of OAKDALE AV BATTERY 2012-01-01 08:38 800.0 Block of BRYANT ST

Whereas with user-friendly spreadsheet software, we have to digest a vague instruction set, which assumes you've memorized the interface of a program such as Excel:

Open up the spreadsheet named sfpd_incidents.xls Open the Filter menu and filter for rows in which the Category column is equal to "ASSAULT" and in which the Date column contains a value greater than "2012" Hide all of the columns except Descript , Date , Time , and Location

Bye-bye spreadsheets?

No, not at all. Part of the reason that SQL is very good at working with massive datasets is because it doesn't try to do the fancy kind of things that spreadsheets do. For example, you won't be writing SQL to (directly) generate a chart. Instead, you have to export from the database into a spreadsheet (or other visualization software).

Spreadsheets are great for manipulating, re-organizing, and visualizing data – but all those features make them inefficient for pure data-querying. A database, on the other hand, won't go out of its way to make things easy for you. But what it does, it does very well and very quickly, whether you're working with a thousand or a billion rows. http://www.nytimes.com/2013/04/19/opinion/krugman-the-excel-depression.html

Exploring SFPD data with SQL and spreadsheets

Below is a demonstration of both the thought process in exploring a database and the actual queries. I'm using the Sequel Pro GUI for Mac and a MySQL dump of the incident reports from the San Francisco Police Department's CABLE system.

I won't be using just SQL here; the standard spreadsheet techniques are still important for analysis and visualization.

Simple queries

After the SQL dump has been imported, we can retrieve and display all 1.49 million records with this query:

SELECT * FROM sfpd_incidents

The result is not much different than what we get when opening a spreadsheet:

Since we're not interested in mapping the records (just yet), we can be selective in which columns to retrieve. In this query, I ignore the location and coordinate fields:

SELECT Category, Descript, Date , Time FROM sfpd_incidents

The result:

Category Descript Date Time FRAUD FORGERY, CREDIT CARD 2003-02-18 16:30 WARRANTS WARRANT ARREST 2003-04-17 22:45 LARCENY/THEFT GRAND THEFT PICKPOCKET 2003-02-18 16:05 DRUG/NARCOTIC SALE OF BASE/ROCK COCAINE 2003-02-18 17:00 OTHER OFFENSES CONSPIRACY 2003-02-18 17:00 OTHER OFFENSES PROBATION VIOLATION 2003-02-18 17:00 DRUNKENNESS UNDER INFLUENCE OF ALCOHOL IN A PUBLIC PLACE 2003-03-31 22:08 WARRANTS WARRANT ARREST 2003-03-31 22:08 WARRANTS WARRANT ARREST 2003-02-22 17:35 WARRANTS WARRANT ARREST 2003-04-01 15:00

To filter for a specific category of crime, we use the WHERE clause:

SELECT Category, Descript, Date , Time FROM sfpd_incidents WHERE Category = " VANDALISM "

Category Descript Date Time VANDALISM MALICIOUS MISCHIEF, VANDALISM 2003-05-10 08:18 VANDALISM MALICIOUS MISCHIEF, GRAFFITI 2003-01-13 23:44 VANDALISM MALICIOUS MISCHIEF, BREAKING WINDOWS 2003-02-18 17:00 VANDALISM MALICIOUS MISCHIEF, BREAKING WINDOWS 2003-02-18 20:16 VANDALISM MALICIOUS MISCHIEF, BREAKING WINDOWS 2003-02-18 20:30 VANDALISM MALICIOUS MISCHIEF, VANDALISM 2003-02-18 16:45 VANDALISM MALICIOUS MISCHIEF, GRAFFITI 2003-02-17 07:00 VANDALISM MALICIOUS MISCHIEF, GRAFFITI 2003-02-18 22:55 VANDALISM MALICIOUS MISCHIEF, GRAFFITI 2003-02-18 23:58 VANDALISM MALICIOUS MISCHIEF, VANDALISM 2003-02-19 00:15

Aggregate queries

A common inquiry of crime statistics is: just how many of each kind of crime was reported?

In SQL, we use a GROUP BY clause to aggregate the records by Category , which is similar to a simple pivot table

SELECT Category, COUNT (*) FROM sfpd_incidents GROUP BY Category

Category COUNT(*) ARSON 2622 ASSAULT 130097 BAD CHECKS 810 BRIBERY 329 BURGLARY 64890 DISORDERLY CONDUCT 8016 DRIVING UNDER THE INFLUENCE 4052 DRUG/NARCOTIC 99876 DRUNKENNESS 7774 EMBEZZLEMENT 2144 etc.

We can order this list by count of reports in descending order and show only the top 5 with the ORDER BY and LIMIT clauses, respectively:

SELECT Category, COUNT (*) AS ct FROM sfpd_incidents GROUP BY Category ORDER BY ct DESC LIMIT 5

So from 2003 to 2013, these were the five most reported categories of incidents to the SFPD:

Category ct LARCENY/THEFT 296585 OTHER OFFENSES 212411 NON-CRIMINAL 151558 ASSAULT 130097 DRUG/NARCOTIC 99876

Change over time

Aggregate counts over long periods of time aren't particularly helpful. We can group by year to get the count of reports per year:

SELECT YEAR ( Date ) AS yr, COUNT (*) FROM sfpd_incidents GROUP BY yr ORDER BY yr

The count of reports by year:

yr COUNT(*) 2003 152901 2004 150349 2005 144395 2006 136145 2007 135813 2008 139400 2009 137519 2010 121525 2011 117644 2012 123878 2013 132195

Again, not particularly interesting. We don't know if one kind of crime went up while another went down. So we can make our grouping more granular: the count of incident reports, by year, and by category:

SELECT Category, YEAR ( Date ) as yr, COUNT (*) as count FROM sfpd_incidents GROUP BY yr, Category ORDER BY yr, Category

The result contains 403 rows, one for each combination of year and category of report:

Category yr count ARSON 2003 294 ASSAULT 2003 13782 BAD CHECKS 2003 144 BRIBERY 2003 28 BURGLARY 2003 6088 DISORDERLY CONDUCT 2003 906 DRIVING UNDER THE INFLUENCE 2003 315 DRUG/NARCOTIC 2003 10176 DRUNKENNESS 2003 682 EMBEZZLEMENT 2003 246 …etc. VEHICLE THEFT 2013 6209 WARRANTS 2013 7059 WEAPON LAWS 2013 1166

Reduce the clutter

Let's step back a bit. Instead of looking at every year, let's look at the 5-year change in reports, i.e. 2008 and 2013:

SELECT Category, YEAR ( Date ) as yr, COUNT (*) as count FROM sfpd_incidents GROUP BY yr, Category HAVING yr = 2008 OR yr = 2013 ORDER BY yr, Category

The result is 74 rows:

Category yr count ARSON 2008 248 ASSAULT 2008 12670 BAD CHECKS 2008 78 BRIBERY 2008 49 BURGLARY 2008 5679 etc. SEX OFFENSES, NON FORCIBLE 2013 13 STOLEN PROPERTY 2013 12 SUICIDE 2013 73 SUSPICIOUS OCC 2013 3358 TRESPASS 2013 991 VANDALISM 2013 6654 VEHICLE THEFT 2013 6209 WARRANTS 2013 7059 WEAPON LAWS 2013 1166

A time for pivoting

While this is a more manageable list, it's still not easy to discern which categories have changed the most from 2008 to 2013.

At this point, it's worth remembering why we turned to SQL in the first place: because our spreadsheet software couldn't handle an import of 1.5 million rows (Excel caps out at 1,048,576 rows). By using the GROUP BY aggregate clause, we could reduce the data rows enough to then export into a spreadsheet right off the bat. In fact, the kind of grouping and filtering we've just accomplished may have been easier in pivot tables.

So again, there is no one perfect data tool: if you feel more comfortable organizing data in a spreadsheet, then it's fine to use a database simply to open a large dataset and export the parts you need.

Data reshaping

Now that we have 74 rows of aggregate counts (for the year 2008 and 2013), we can simply copy and paste them into a spreadsheet, and arrange them so that we can do the "subtract one cell from the other" formulas that we're familiar with.

Pasting the list directly into a spreadsheet, and then doing a pivot table with Category as Rows and year as Columns, and then a SUM of the count , will generate something that looks like this:

Category 2008 2013 ARSON 248 228 ASSAULT 12670 10380 BAD CHECKS 78 26 BRIBERY 49 4 BURGLARY 5679 5847 DISORDERLY CONDUCT 789 460 DRIVING UNDER THE INFLUENCE 408 429 DRUG/NARCOTIC 11462 6682

Then we add two columns to calculate the change:

Change : the 2013 column minus the 2008 column Change pct : the Change column, divided by the 2008 column. Then multiplied by 100 and rounded to make for a nice number.

When sorting the list by Change Pct , here are the top 10 categories based on drop in incidents from 2008 to 2013

Category 2008 2013 Change Change pct STOLEN PROPERTY 517 12 -505 -98 BRIBERY 49 4 -45 -92 LOITERING 441 52 -389 -88 PROSTITUTION 1663 269 -1394 -84 FORGERY/COUNTERFEITING 2460 710 -1750 -71 BAD CHECKS 78 26 -52 -67 LIQUOR LAWS 499 227 -272 -55 FAMILY OFFENSES 72 35 -37 -51 EXTORTION 52 30 -22 -42 DRUG/NARCOTIC 11462 6682 -4780 -42

Reversing that sort, we can get the top 10 categories according to increase in incidents from 2008 to 2013:

Category 2008 2013 Change Change pct KIDNAPPING 345 514 169 49 LARCENY/THEFT 25797 35661 9864 38 DRUNKENNESS 710 954 244 34 NON-CRIMINAL 12293 15343 3050 25 WARRANTS 5798 7059 1261 22 DRIVING UNDER THE INFLUENCE 408 429 21 5 VEHICLE THEFT 6053 6209 156 3 MISSING PERSON 4335 4458 123 3 BURGLARY 5679 5847 168 3 FRAUD 2556 2579 23 1

Video

Watch me write out the query, execute it, and paste the results into a Google Spreadsheet to find the change from 2008 to 2013 in incident reports. In this demo, I don't use Pivot Tables, but just do some extra cutting-and-pasting as needed:

Preview of JOINs

Note: This is just a technical aside, for those of you who wonder if there's a way to avoid spreadsheets altogether.

We can actually do this all in a SQL query with a JOIN clause and subqueries, which are two intermediate concepts we haven't yet learned. But here's a preview of the fun that will be:

SELECT ty.Category, COUNT ( 1 ) AS count_2008, ty.count_2013, (ty.count_2013 - COUNT ( 1 )) AS ` Change ` , /* the difference in count */ ((ty.count_2013 - COUNT ( 1 )) / COUNT ( 1 )) AS ` Change pct ` /* the difference expressed as a ratio of year 2008 incidents*/ FROM sfpd_incidents AS tx INNER JOIN /* create a new table of just 2013 incidents, name it `ty` */ ( SELECT Category, COUNT ( 1 ) AS count_2013 FROM sfpd_incidents WHERE YEAR ( Date ) = 2013 GROUP BY Category ) AS ty ON tx.Category = ty.Category WHERE YEAR (tx.Date) = 2008 GROUP BY ty.Category ORDER BY ` Change pct `

Which results in:

Category count_2008 count_2013 Change Change pct STOLEN PROPERTY 517 12 -505 -0.9768 BRIBERY 49 4 -45 -0.9184 LOITERING 441 52 -389 -0.8821 PROSTITUTION 1663 269 -1394 -0.8382 FORGERY/COUNTERFEITING 2460 710 -1750 -0.7114 BAD CHECKS 78 26 -52 -0.6667 LIQUOR LAWS 499 227 -272 -0.5451 FAMILY OFFENSES 72 35 -37 -0.5139 EXTORTION 52 30 -22 -0.4231 DISORDERLY CONDUCT 789 460 -329 -0.4170 etc. etc.

Pondering the numbers

Looking at the change in incident reporting over 2008 to 2013 brings up a few potential angles of investigation.

An increase in larceny

Among all the categories, "LARCENY/THEFT" has the biggest jump in numbers of reports: 25,797 to 35,661, in 2008 and 2013, respectively. Let's run a query on the database to see that jump compared to the years pre-2008:

SELECT YEAR ( Date ) as yr, COUNT (*) as count FROM sfpd_incidents WHERE Category = ' LARCENY/THEFT ' GROUP BY yr ORDER BY yr

yr count 2003 26853 2004 24822 2005 25623 2006 27349 2007 25767 2008 25797 2009 25545 2010 23888 2011 25161 2012 30119 2013 35661

Graphing that table results in:

My first guess would have been to attribute the drastic rise in thefts to the widespread adoption of smartphones. New York Mayor Michael Bloomberg blamed an apparent spike in 2012 crime to Apple lovers:

Major crime in New York City inched up this year, and Mayor Michael R. Bloomberg on Friday fingered the culprit: too many iPhones and iPads were being swiped… “If you just took away the jump in Apple, we’d be down for the year,” said Marc La Vorgna, the mayor’s press secretary. On the radio, Mr. Bloomberg said that Apple products appeared to be the preference for many thieves, noting that he was not including thefts of competing devices, like the Samsung Galaxy, in his count.

From 2011 to 2012, theft reports jumped from 25,161 to 30,119, so San Francisco may have experienced the same trend, though a closer look at the subcategories of theft (i.e. theft from auto, pickpocketing, grand theft, etc.) is necessary.

A decrease in narcotics

Let's turn our attention to categories of reports that have decreased and give credit where credit is due. According to our query, DRUG/NARCOTIC reports have dropped by 4,780 (42%).

Of course, a lot has happened since 2008, including a relaxation of laws and attitudes towards cannabis, in California and across the nation. Is this the main cause of the drop in DRUG/NARCOTIC reports from 2008 to 2013 in San Francisco?

The easiest way to confirm this is to group the data reports by subcategory, or the Descript column:

SELECT Descript, YEAR ( Date ) as yr, COUNT (*) as count FROM sfpd_incidents WHERE Category = ' DRUG/NARCOTIC ' GROUP BY yr, Descript HAVING yr = 2008 OR yr = 2013 ORDER BY yr, Descript

Using the previous strategy of pasting the results into a pivot table and creating a spreadsheet of the calculated changes, here are the top subcategories of narcotics reports, sorted by change in count:

Descript 2008 2013 Change POSSESSION OF BASE/ROCK COCAINE 1617 393 -1224 SALE OF BASE/ROCK COCAINE 1270 245 -1025 POSSESSION OF NARCOTICS PARAPHERNALIA 2359 1353 -1006 POSSESSION OF BASE/ROCK COCAINE FOR SALE 914 284 -630 POSSESSION OF MARIJUANA 1108 764 -344 SALE OF MARIJUANA 391 133 -258 POSSESSION OF MARIJUANA FOR SALES 571 350 -221 SALE OF CONTROLLED SUBSTANCE 233 78 -155 SALE OF HEROIN 174 43 -131 POSSESSION OF CONTROLLED SUBSTANCE 404 296 -108 POSSESSION OF HEROIN FOR SALES 199 120 -79 POSSESSION OF CONTROLLED SUBSTANCE FOR SALE 251 173 -78 POSSESSION OF COCAINE 267 214 -53 SALE OF COCAINE 70 20 -50

Marijuana-related reports do drop, but not as much as cocaine-related reports.

As always, our strategy should involve flipping our perspective: if cocaine-related reports have dropped, has any other kind of report increased?

Here is the table sorted in reverse-order of change:

Descript 2008 2013 Change POSSESSION OF METH-AMPHETAMINE 383 956 573 POSSESSION OF METH-AMPHETAMINE FOR SALE 129 266 137 TRANSPORTATION OF METH-AMPHETAMINE 24 49 25 POSSESSION OF OPIATES 19 40 21 UNDER INFLUENCE OF DRUGS IN A PUBLIC PLACE 40 53 13

Meth-related reports have increased by nearly 140%, though the change in terms of actual number of reports (~700) is significantly smaller than the change in cocaine-related reports (~3,000).

Investigating a drop in prostitution reports

One more category of reports is worth looking at: while DRUG/NARCOTICS reports dropped by 42 percent from 2008 to 2013, PROSTITUTION reports dropped by 84 percent. Like drug-related offenses, prostitution is a category of crime in which attitudes have seemingly shifted. In 2008, more than 40 percent of San Francisco residents voted in favor of decriminalizing prostitution. Though the measure ultimately failed (59 percent opposed), the political debate was heated, thanks to the measure's surprising popularity:

SAN FRANCISCO — When Proposition K was added to Tuesday’s ballot, many people likely snickered at the possibility that San Francisco might take its place alongside such prostitute-friendly havens as Amsterdam and a few rural counties in nearby Nevada. But this week, it became readily apparent that city officials are not laughing anymore about the measure, which would effectively decriminalize the world’s oldest profession in San Francisco. At a news conference on Wednesday, Mayor Gavin Newsom and other opponents seemed genuinely worried that Proposition K might pass. “This is not cute. This is not fanciful,” Mr. Newsom said, standing in front of the pink-on-pink facade of a closed massage parlor in the Tenderloin district. “This is a big mistake.”

Since 2008, laws on prostitution haven't changed as significantly as they have for marijuana usage. But technology has most certainly affected all parts of sex and society, both legal and illegal activity, and maybe this has impacted the prostitution business? Or perhaps this drop in reports is simply a result of a coding policy. Let's query the dataset for possible explanations.

For reference's sake, I've posted a spreadsheet of the more than 12,000 incident reports categorized as "PROSTITUTION" from 2003 to 2013.

Visualizing location of incidents

Even in cities where prostitution is illegal, the term "red light district" is used to refer where a concentration of such activity may be found. So instead of looking at bar graphs, let's use the location data in the SFPD incident reports to get an assessment of how the geography of activity has changed, if at all.

The SQL query here is simple, because we're only concerned with getting data in a form usable by a mapping application.

SELECT YEAR ( Date ) as yr, Descript, Date , Time , Y AS latitude, X AS longitude FROM sfpd_incidents WHERE Category = " PROSTITUTION " HAVING yr = 2008 OR yr = 2013

Using TileMill, we can generate a quick map. Here, the incidents in 2008 are orange, the incidents in 2013 are blue:

Here are the two years side-by-side:

Ignoring my crude mapping skills, we can at least tell that the number of reports may have decreased – the visual confusion is partly because records will be geocoded to a specific corner, so many of them will sit atop the exact same coordinate – but the general location of prostitution reports hasn't changed significantly.

Per subcategory?

As in the drug reports, it's worth taking a look at the Descript field:

SELECT Descript, Year ( Date ) as yr, COUNT ( 1 ) as c FROM sfpd_incidents WHERE Category = ' PROSTITUTION ' GROUP BY yr, Descript HAVING yr = 2008 OR yr = 2013 ORDER BY yr, Descript

Descript 2008 2013 Change Change pct INMATE/KEEPER OF HOUSE OF PROSTITUTION 8 0 -8 -100 SOLICITS TO VISIT HOUSE OF PROSTITUTION 987 1 -986 -99.9 SOLICITS LEWD ACT 12 1 -11 -91.7 ENGAGING IN LEWD CONDUCT - PROSTITUTION RELATED 21 5 -16 -76.2 SOLICITS FOR ACT OF PROSTITUTION 602 171 -431 -71.6 PLACING WIFE IN HOUSE OF PROSTITUTION 3 1 -2 -66.7 PIMPING 22 37 15 68.2 PANDERING 7 22 15 214.3 INDECENT EXPOSURE - PROSTITUTION RELATED 1 31 30 3000

The majority of PROSTITUTION reports fall under SOLICITS FOR ACT OF PROSTITUTION and SOLICITS TO VISIT HOUSE OF PROSTITUTION . Both have fallen drastically in a five-year period, though the latter subcategory has all been eliminated: going from 987 reports in 2008 to just one report in 2013.

Let's make a chart

Let's visually quantify the drop in the main two subcategories of prostitution reports, across all the years from 2003 to 2013:

SELECT Descript, YEAR ( Date ) as yr, COUNT ( 1 ) as x FROM sfpd_incidents WHERE Category = " PROSTITUTION " AND Descript = " SOLICITS FOR ACT OF PROSTITUTION " OR Descript = " SOLICITS TO VISIT HOUSE OF PROSTITUTION " GROUP BY Descript, yr ORDER BY Descript, yr

The following results table can be pasted into a spreadsheet and pivoted (with yr creating the rows and Descript creating the columns).

Descript yr x SOLICITS FOR ACT OF PROSTITUTION 2003 1111 SOLICITS FOR ACT OF PROSTITUTION 2004 825 SOLICITS FOR ACT OF PROSTITUTION 2005 732 SOLICITS FOR ACT OF PROSTITUTION 2006 645 SOLICITS FOR ACT OF PROSTITUTION 2007 899 SOLICITS FOR ACT OF PROSTITUTION 2008 602 SOLICITS FOR ACT OF PROSTITUTION 2009 521 SOLICITS FOR ACT OF PROSTITUTION 2010 476 SOLICITS FOR ACT OF PROSTITUTION 2011 385 SOLICITS FOR ACT OF PROSTITUTION 2012 224 SOLICITS FOR ACT OF PROSTITUTION 2013 171 SOLICITS TO VISIT HOUSE OF PROSTITUTION 2003 843 SOLICITS TO VISIT HOUSE OF PROSTITUTION 2004 627 SOLICITS TO VISIT HOUSE OF PROSTITUTION 2005 291 SOLICITS TO VISIT HOUSE OF PROSTITUTION 2006 563 SOLICITS TO VISIT HOUSE OF PROSTITUTION 2007 876 SOLICITS TO VISIT HOUSE OF PROSTITUTION 2008 987 SOLICITS TO VISIT HOUSE OF PROSTITUTION 2009 855 SOLICITS TO VISIT HOUSE OF PROSTITUTION 2010 117 SOLICITS TO VISIT HOUSE OF PROSTITUTION 2011 7 SOLICITS TO VISIT HOUSE OF PROSTITUTION 2012 2 SOLICITS TO VISIT HOUSE OF PROSTITUTION 2013 1

The resulting area chart:

Researching the data

Never take data at face value. This interesting drop in prostitution reports could be a result of a data glitch or ignorance on my part.

Look for previously-reported facts: On April 19, 2011, the San Francisco Examiner published an article titled, San Francisco police crack down on Polk Street prostitution. One of the numerical details:

More cops are on the streets making arrests, Mannix said. During a 24-hour period last week, at least four prostitutes were arrested, including a 16-year-old runaway.

Translating the above statement into a query:

SELECT IncidntNum, Descript, Date , Time , PdDistrict, Resolution, Location FROM sfpd_incidents WHERE Category = ' PROSTITUTION ' AND Date BETWEEN ' 2011-04-11 ' AND ' 2011-04-19 ' AND Location LIKE ' %POLK% '

IncidntNum Descript Date Time PdDistrict Resolution Location 110296727 PANDERING 2011-04-12 01:38 NORTHERN ARREST, BOOKED PINE ST / POLK ST 110296727 PIMPING 2011-04-12 01:38 NORTHERN ARREST, BOOKED PINE ST / POLK ST 110302437 SOLICITS FOR ACT OF PROSTITUTION 2011-04-13 23:00 NORTHERN ARREST, CITED CALIFORNIA ST / POLK ST 110302540 SOLICITS FOR ACT OF PROSTITUTION 2011-04-14 00:05 NORTHERN ARREST, CITED POLK ST / CALIFORNIA ST 110302653 SOLICITS FOR ACT OF PROSTITUTION 2011-04-14 01:05 NORTHERN ARREST, CITED POLK ST / CALIFORNIA ST 110308758 SOLICITS FOR ACT OF PROSTITUTION 2011-04-16 00:35 NORTHERN ARREST, CITED POLK ST / CALIFORNIA ST 110308952 SOLICITS FOR ACT OF PROSTITUTION 2011-04-16 02:15 NORTHERN ARREST, CITED POLK ST / CALIFORNIA ST 110309160 SOLICITS FOR ACT OF PROSTITUTION 2011-04-16 03:40 NORTHERN ARREST, CITED POLK ST / CALIFORNIA ST 110308827 SOLICITS FOR ACT OF PROSTITUTION 2011-04-16 01:29 NORTHERN ARREST, CITED POLK ST / CALIFORNIA ST

There were four prostitution-related arrests on April 16, but frankly, that could just be a coincidence. Capt. Mannix doesn't specify if those four arrests all happened on Polk Street, for example.

Another story of a sweep, published Jan. 20, 2010: Prostitution sting nets 51 so far

A crackdown on street prostitution in response to complaints from Polk Street and Russian Hill residents has netted 51 arrests, but officials say the crackdown can’t last forever…Of the 51 people arrested so far in the current operation, 36 were women accused of prostitution offenses, according to Lt. Tom Cleary of the vice crimes unit. Fifteen were men paying for sex. Cleary emphasized that a majority of those arrested came from outside The City.

Using this aggregate SQL:

SELECT SUBSTRING ( Date , 1 , 7 ) as yearmth, COUNT ( 1 ) AS count FROM sfpd_incidents WHERE Category = ' PROSTITUTION ' AND pddistrict = ' NORTHERN ' AND YEAR ( Date ) BETWEEN 2008 AND 2013 GROUP BY yearmth order by count DESC

– we observe that the month of January 2010 seems to be a high-water mark in prostitution reports from the Northern district (which covers Polk St. and Russian Hill):

yearmth count 2010-01 57 2008-07 47 2008-02 46 2008-06 40 2008-10 38 2008-01 36

Sociological reasons?

Should an 84% drop in prosecution reports from 2008 to 2013 be a surprise, given voter support for decriminalizing prostitution? Whatever the ambivalence of the average resident, I didn't find any direct reports of the SFPD deciding to go lenient on prostitution.

On June 16, 2010, the SF Weekly took an in-depth look at the SFPD's arguably aggressive anti-prostitution stings:

In San Francisco, where 41 percent of voters approved a measure that would have essentially decriminalized prostitution and where a sex worker is currently running for the Board of Supervisors, men are regularly arrested just for talking to undercover cops posing as streetwalkers. If a man avoids eye contact and walks on by, he's fine. But if he shows any interest, the decoy will approach him. By the time he knows she is selling sex, he's headed for trouble. After that, any expression of interest — even a sarcastic "Oops, don't have enough cash, guess I'll come back later" — is enough to land him in handcuffs… A decade ago, almost all the men arrested in the program's stings had shown more than interest; they had agreed to sex acts for money, according to statistics from the District Attorney's office. But in recent years, as would-be johns have gotten savvy to police tactics, roughly a third of those arrested in the stings — between 100 and 200 a year — were cited simply for demonstrating, in the eyes of the police, that they had "an intent to commit prostitution." According to the California penal code, there's no "single circumstance or combination of circumstances" that proves this intent; it has to be determined case by case.

The article cites a figure from a 2009 audit of the city's budget: previously, the sting program was funded by the fees paid by johns. However, "as prostitution has shifted to the Internet, it has become more difficult for police officers to make arrests in street-level stings. Since 2005, the number of johns arrested has dropped steadily."

In 2011, the San Francisco Public Press reported on the controversial use of anti-trafficking resources to do street-level sweeps.

This past summer, San Francisco prostitution made the news again when business owners complained about activity in Union Square.

This article quotes police as being more interested the stemming the bigger issues surrounding prostitution, rather than focusing on street-level busts:

Police said they plan to clean up Union Square, but also want to focus on human trafficking and helping women who have been forced into prostitution.

So what's the explanation for the 84% drop in SFPD prostitution reports? I sent a request to SFPD's public information officer but haven't heard back. I'm still open to the possibility of a mis-query on my part, or a coding change in the SFPD CABLE system that classifies the reports differently. Either way, I hope this has given you some ideas on how to query data along a variety of dimensions and time frames to find leads for what may be interesting stories.

Now all you have to do is grind in the syntax. Good luck!