4 min read

Since its founding, Kaggle has held many interesting data science and machine learning competitions. But its most interesting project might have just begun. It’s called the “Abstraction and Reasoning Challenge” and it addresses one of the most fundamental problems of current artificial intelligence techniques.

Current AI systems have proven to be very efficient at solving specific problems, but poor at general problem-solving. This is why measuring the real “intelligence” of AI systems is very tricky. Current evaluation methods mostly consist of testing the accuracy of an AI model against datasets composed of neatly picked data and very narrow benchmarks.

To address the problem of evaluating AI systems, François Chollet, AI researcher and the creator of the Keras deep learning library, published a very important paper in November titled “On the Measure of Intelligence.” Chollet’s work discusses the problems with current approaches to testing natural and artificial intelligence systems.

Chollet also presented the Abstract Reasoning Corpus (ARC), a set of problems that can test the general-problem solving capabilities of humans and AI systems.

Last week, Chollet recently launched the ARC challenge on Kaggle with a three-month timeline and $20,000 prize. Although it’s very unlikely that anyone will be able to solve the challenge in time, it will be an interesting test of how far we’ve come toward solving artificial general intelligence.

Measuring intelligence

In the early years of artificial intelligence research, scientists thought that creating thinking machines was just around the corner. But as the challenges of replicating the simplest functions of the human mind became obvious, the field became divided into narrow AI, algorithms that can tackle very specific problems, and artificial general intelligence, the original vision of creating general-problem solving AI.

Narrow AI is well defined and there are plenty of ways to measure it. For instance, ImageNet is a good benchmark for testing computer vision systems that can classify images into a predefined set of things. Machine learning engineers try to optimize their algorithms (and sometimes they cheat) to score higher on ImageNet and other similar datasets.

But as Chollet argues in his paper, such benchmarks don’t test the intelligence of the AI algorithm. Instead, they are a measure of the intelligence of the AI’s developers in developing a system that can solve that specific problem. Case in point: A deep learning algorithm that scores better than the smartest human on ImageNet or defeats the StarCraft 2 world champion can’t perform most tasks that the least intelligent human can perform without a second thought.

Current deep learning systems also suffer from distinct problems, including reliance on a lot of data and brittleness. Many of these AI systems break when they face the messiness and uncertainty of the real world. There are efforts to develop new datasets that better represent the messiness of the real world.

While these new efforts can help to create more robust algorithms, they will not help push develop AI systems that are truly intelligent.

The ultimate solution would be to develop machines that, instead of solving a preprogrammed problem, explore a problem and create a program that can solve it. For instance, an AI that can play Quake 3 should develop the high-level skills that will allow it to start playing any other first-person shooter decently instead of starting from scratch. This is the kind of generalization power that has eluded AI scientists for decades.

But such general intelligence will also need its own specialized benchmark and test dataset, which brings us to ARC.

The ARC challenge

The Abstract Reasoning Corpus is a set of problem-solving tasks that require general problem-solving skills. There are a few things about ARC that make it especially interesting. First, there aren’t a ton of training examples. The system that wants to solve a set of problems has to learn the rules from a few examples, like the following.

Any human looking at the previous examples will know that the first problem-set involves cohesion and the second is a denoising task. But current AI techniques can’t perform such reasoning and abstraction with so few examples. ARC is filled with such examples and is designed in a way that prevents developers to optimize for evaluation sets.

Another important aspect of ARC is that it levels the ground between AI and humans. The current artificial intelligence landscape is composed of many challenging fields such as computer vision and natural language processing.

Comparing AI performance to humans in those fields is very difficult because we humans have a lot of prior knowledge about the world and can easily take on new challenges. There still isn’t an AI system that incorporates that kind of knowledge. Therefore, any challenge that surrounding image classification and natural language would put AI algorithms at a disadvantage.

But ARC is based on simple visual elements that are easy to parse and require no prior. They strip the advantage that humans have and make it fairer for AI systems to compete. Humans can easily solve most of the problems proposed in ARC not because of their vast knowledge of the world, but thanks to their abstraction and reasoning capabilities. (But even the brightest people admit that some ARC problems are cognitively challenging.)

I am irrationally excited about this. @fchollet has asked a set of fundamentally interesting questions with ARC and I'll be playing for fun! It's like Sudoku for AI addicts 😉

(n.b. I personally got many examples wrong – huzzah, a dataset where you question your own cognition 😅) https://t.co/Nz9sWbZbKa — Smerity (@Smerity) February 14, 2020

As for current AI systems, they perform very poorly on ARC problems. “To the best of our knowledge, ARC does not appear to be approachable by any existing machine learning technique (including Deep Learning), due to its focus on broad generalization and few-shot learning,” Chollet notes in his paper.

Solving ARC will require “program synthesis,” the subfield of AI that involves generating programs that satisfy high-level specifications.

How long before it’s solved?

Kaggle ARC challenge has set May 27 as the final submission deadline for the ARC challenge. The timing somehow reminds me of the “2-month, 10-man study” that was supposed to solve the AI problem in 1955. Given the limits of today’s AI technology, I’d doubt that anyone will be able to solve the challenge by the end of May.

Neither does Chollet have any illusions about ARC-ready AI being right around the corner.

Regarding the ARC challenge feasibility. It is meant to be difficult, and I don't expect it will be fully solved in 10 years. But I know for a fact that it is partially approachable today using program synthesis. I will be very disappointed if no one scores better than 1 🙂 pic.twitter.com/QxbwajReVg — François Chollet (@fchollet) February 14, 2020

Ben Hammer, Kaggle’s CTO, called the ARC challenge “the toughest Kaggle competition in a long time.”

We just launched the toughest @kaggle competition in a long time with @fchollet. Can software learn to generalize complex, abstract tasks from a tiny number of examples? Easy to get started on, and a good result would mean a substantial leap forward in AI https://t.co/2r9zJSb7fh — Ben Hamner (@benhamner) February 13, 2020

Chollet provides some high-level guidance on where to start, including finding general ways to solve the problems presented in the ARC dataset. But he also notes that there is currently no AI textbook or tutorial that will guide you on creating the general intelligence required to solve the ARC challenge.

If you're not sure where to start on the ARC competition, here's my advice. It's a very difficult challenge, but I strongly believe (with some evidence) that someone smart & motivated can develop in a few weeks an approach that solves ~5-10% of the tasks in the hidden test set. pic.twitter.com/1hvZrXugnt — François Chollet (@fchollet) February 13, 2020

Despite the enormity (and impossibility) of the task, at the time of this writing, more than 190 teams have applied for the ARC challenge and will be testing their skills. It will be interesting how the competition develops. And perhaps more importantly, it will be exciting to see what new discoveries we make in the interim.