If you thought pre-crime technology was the stuff of Tom Cruise fantasy movies, you might just be eating your words soon. According to The Register, researchers from the University of Cardiff have been awarded more than $800,000 by the US Department of Justice to develop a pre-crime detection system that uses social media. Unlike in Steven Spielberg’s Tom Cruise-starring Minority Report film, there are no precognitive or psychic beings in sight. Instead the system relies on drawing on big data from social media to identify on potential crimes before they happen.

The project builds on existing work conducted by Professor Matthew Williams, Director of the Social Data Science Lab at the Data Innovation Research Institute and Professor of Criminology at Cardiff University and Dr Peter Burnap, senior lecturer in Computer Science & Informatics, to look for 'signatures' of crime and disorder in open source communications, not crimes themselves.

“In our statistical models based on London Met Police crime records and Twitter data, we found social media mentions of the breakdown of the local social and environmental order – mentions of littering, anti social behaviour, drinking disorderly, tipping – were highly correlated with certain types of crime, such as burglary,” says Williams. “These correlations were stronger than those found between crime records and conventional data, such as census data on educational attainment, unemployment etc. This preliminary evidence indicates social media data might be useful in providing near-real-time insights into crime patterns.” The next step, then, is to apply this experimental data to hate crimes.

The new project will collect Twitter posts containing terms that have been labelled as hate speech by human annotators over a 12-month period. The team’s original hate speech detection algorithm was developed in the UK following the Rigby murder and the new project will use similar machine learning techniques to build new hate speech algorithms based on US data. The team will also gain access to 12 months’ LAPD recorded hate crime data. These two measures will then be entered into statistical models to identify if there is a correlation, that is, whether an increase in hate speech in a given area is also statistically linked to an increase in recorded hate crimes on the streets in the same area.

“If the model shows such a relationship, then this social media data may be used in conjunction with conventional data sources to improve predictions of hate crimes offline. These new forms of data are also attractive as they can provide new information on changing risks in near real time, unlike conventional data that is often weeks or months out of date,” he says, adding that since the project is experimental, the absence of a significant correlation will not be surprise.

Significantly the particular award for the University of Cardiff was to fight hate crime, and this, says Peter Wang, CTO and Co-Founder of Continuum Analytics, is an important distinction. Wang was core to the DARPA-funded Memex project to fight human trafficking. Using Anaconda and Continuum Analytics open source software projects, DARPA is able to effectively scale the Memex solution with the ever increasing web to index and cross reference interactive and social media, text, images, and video. These deep searches combined with rich visualisations identify patterns and connect the dots about typically elusive movements across locations. Reports by NYDA’s office attribute Memex to over 20 active sex trafficking investigations and nine open indictments.

Wang says that it makes sense that social media is being used to identify hate crimes before they happen. “Taking a data-driven ‘predictive policing’ approaches to fighting general crime is very difficult because crime itself is so varied, and the dimensions of each type of crime are so complex,” he says. “However, for hate crimes in particular, social media could be a particularly useful data stream, because it yields insight into a variable that is otherwise extremely difficult to assess: human sentiment.”

He explains that the general mechanism of the system would be to look for patterns and correlations between all the dimensions of social media: text in posts and tweets, captions on images, the images themselves, even which people, topics, organisations someone subscribes to. “Metadata on these would also feed into the data modelling: the timestamps and locations of their posts and social media activity can be used to infer where they live, their income and level of education, and so on,” he says.

All of this data and metadata could then be used to create baseline models of "normal" human sentiment towards certain racial groups and topics, specifically tuned and segmented by geography and demographic factors. “Once such a baseline model is created, it could then be used to look for ‘outliers’ across multiple dimensions,” Wang adds. “Ideally, the data may reveal subtle differences in behaviour between someone who has committed a hate crime and someone who is merely racist but has no criminal inclination: perhaps the latter expresses negative sentiment and uses racist terminology only in response to certain news events, but the former shows a consistent pattern of egregious behaviour.”

Paul Briault, Director of Digital Security and API Management at CA Technologies, emphasises that we are all constantly connected with social media and other databases, providing continuous updates to our online footprint which is a possible route for pre-crime detection systems to use in the future. “By using online information and processing the data through analytics software, patterns of online behaviour can be identified and predicted in advance to essentially detect crimes before they happen,” he says, adding that the system could track online protests and highlight warning indicators that physical riots were being planned.

There will be challenges, though. Zulfikar Ramzan, CTO at RSA, believes that the key challenge is how to separate the wheat in tweets from the chaff. “There is a treasure trove of data out there that can be leveraged. Only some of it will be useful. Much of it will be unstructured, meaning there is no ironclad way that a computer can interpret it easily without doing some careful work.”

Ramzan adds that crime, for the most part, is not a common event and any automated approach for predicting crime will have to connect numerous dots. “The challenge is that almost anything can look like a dot and it’s easy to think you’re seeing something where it doesn’t exist. Automated approaches for predicting crime have to sidestep numerous landmines on their way to one correct prediction.”

Although most believe that access to the kind of data available through social media could be a useful preventative tool across a range of criminal activities, there are legitimate concerns about the privacy and freedom of expression issues that such technologies raise. Claire Stead, Online Safety Ambassador at Smoothwall, says that while the technology for a pre-crime detection system already exists as we are seeing it being used in regulated industries such as schools where it can detect issues such as radicalisation, online grooming and abuse, the key will be who is monitoring it, as it will be effectively monitoring the general public.

“This is where the issue of freedom of speech comes into play, as who should determine what is a crime and what is harmless? People have the right to share what they wish on social media, however, as soon as we start considering law breaking, that is when there should be the justification for censorship and intervention and so it therefore needs the right body to regulate and monitor it,” she says.

Dr Tod Burke, Professor of Criminal Justice at Radford University, VA and a former Maryland police officer, says that one concern is the right to privacy. “This is ‘Big Brother’ in action. It is also not too far off from the Minority Report. Will we soon arrest people for crimes they have not yet committed in the expectation that computer statistical probability indicates that they will engage in future criminal activity?”

Burke also highlights the concern that predictive policing will encourage discrimination, in that certain members of society may become targets of police investigations based upon predictive models revealing potential criminal acts or patterns. “Law enforcement’s response thus far is that what people expose to public view, such as social media posts, are fair game. If I could coin a phrase here: Predictive Plain View. Police will need to make sure that they adhere to constitutional safeguards and closely follow court rulings and departmental policy on data mining activity. I guess the real issue is just because the police can use predictive policing; should they?”

However, Darrin Lipscomb, CTO of Public Safety and Smart Cities at Hitachi Insight Group, which has created its own commercially available crime prediction algorithm that leverages social media, believes that the use of technology will actually limit discrimination. Hitachi Insight Group’s system uses natural language processing to model Twitter topics by intensity and the most intense (density and frequency) are provided to its spatial-temporal model and used to augment the prediction.

“The key concerns over past law enforcement tactics usually revolve around profiling e.g., for stop and frisk. By leveraging a statistical model such as ours, we can effectively remove human bias,” Lipscomb says. He adds that by leveraging IoT sensors such as gunshot location, stolen license plates and other demographic data, there is no shortage of big data available as input to the model.

Neil Giles, Director of the Centre for Intelligence-Led Prevention at charity STOP THE TRAFFIK, says that there may be a concern that that enforcement officials could become dependent on predictive data, and may potentially focus too deeply on communities that are thought to be hotspots at the expense of new and ‘uncovered’ areas.

However, he believes that this concern is somewhat based on the notion that predictive data will replace existing intelligence work that law enforcement officials carry out. “The reality is that predictive helps statutory bodies use their current resources more effectively and helps the most vulnerable communities become more sensitised to the threat they face,” he says.

The Centre for Intelligence-Led Prevention is a global centre that collects, analyses and shares data on human trafficking. It takes data from the charity’s specially developed STOP APP – which empowers individuals from anywhere in the world to share their stories on any potential human trafficking activity – and matches it with open source and partner data to identify any patterns, using state-of-the-art analytical tools. The intelligence uncovers trends and hotspots around the world, it has the power to potentially highlight specific business areas and communities that are vulnerable and, in some cases, it can identify specific human traffickers or victims. It is not predictive in a Minority Report sense, but useful for identifying potential sites of trafficking.

Dr. Stephen Coggeshall, Chief Analytics and Science Officer at ID Analytics, suggests that such predictive policing using data from various sources will definitely be a positive step and provide improved direction to help position officers in high risk areas. “The technology is absolutely viable and will help, but it only provides auxiliary input among many factors. We still need a human in the loop to synthesise all the incoming signals. The accuracy remains to be seen and will depend on the quality of the data (breadth, depth, accuracy, etc.) and the relationship to these signals and subsequent human behaviour,” he says.

“People are predictable to some extent. That’s the basis for billions of dollars being spent in marketing analytics and risk management, such as credit scores. People have some irrational behaviour that will never be predictable, but we can predict certain kinds of behaviours quite well (credit repayment) and others somewhat well (purchase behaviour),” he says, adding that more accurate predictions are possible about larger populations of people than individuals.

Currently the use of data from social media to tackle crime seems to work more in specific types of crimes such as hate crime or human trafficking and not for crimes of all types. Wang says that while it is hard to tell whether such technologies could have broader applications at this time, it seems that most common sorts of physical crime are deeply correlated to socioeconomic, geographic, and demographic factors. He cautions that while these are things on which we have reasonably large amounts of data, many of those datasets are government data, stored in arcane locations and formats across a variety of different bureaucracies, and difficult to synthesise. “However, past evidence shows if you simply integrate the data that governments already posses, you can get some incredible insights. For instance, Jeff Chen's work with the Fire Department of New York shows that they can predict which areas have buildings that are more likely to catch on fire, and take preventative actions.