Google has become very interested in artificial intelligence in recent years, and particularly its applications for regular people. For example, here's a load of experiments that it's running involving machine learning.

Now, however, researchers at the Texas Advanced Computing Center have shown how artificial intelligence techniques can also deliver better search engine results. They've combined AI, crowdsourcing and supercomputers to develop a better system for information extraction and classification.

At the 2017 Annual Meeting for the Association of Computational Linguistics in Vancouver this week, associate professor Matthew Lease led a team presenting two papers that described a new kind of informational retrieval system.

Intelligent systems

"An important challenge in natural language processing is accurately finding important information contained in free-text, which lets us extract it into databases and combine it with other data in order to make more intelligent decisions and new discoveries," Lease said.

"We've been using crowdsourcing to annotate medical and news articles at scale so that our intelligent systems will be able to more accurately find the key information contained in each article."

They were able to use that crowdsourced data to train a neural network to predict the names of things, and extract useful information from texts that aren't annotated at all.

In the second paper, they showed how to weight different linguistic resources so that the automatic text classification is better. "Neural network models have tons of parameters and need lots of data to fit them," said Lease.

Consistently better results

In testing on both biomedical searches and movie reviews, the system delivered consistently better results than methods that didn't involve weighting the data.

"We had this idea that if you could somehow reason about some words being related to other words a priori, then instead of having to have a parameter for each one of those word separately, you could tie together the parameters across multiple words and in that way need less data to learn the model," said Lease.

He added: "Industry is great at looking at near-term things, but they don't have the same freedom as academic researchers to pursue research ideas that are higher risk but could be more transformative in the long-term."