Recent developments in machine learning have resulted in various narrow artificial intelligence (AI) applications, but considerable progress towards artificial general intelligence (AGI) is still limited. By returning to a strong interdisciplinary approach, the Mindfire Foundation aims to understand the principles of human intelligence and apply them to the development of artificially intelligent organisms.

But how can we measure progress towards artificial human-level intelligence? This question, among many others, was brought up last May at Mindfire Mission-1. The two main authors of this post, an evolutionary biologist and a neuropsychology scientist, were inspired by methodologies established for testing natural cognition (including humans and animals). We initiated a project whereby we aim to define benchmarks of cognitive function to evaluate gradual progress towards an artificial intelligent organism/AGI. However, in conceptualising such an AI evaluation framework, we need to take into consideration what we are aiming to build and what we want to be tested. In this post we provide an overview of natural cognition inspired ideas of how the development of an artificial intelligent organism could be approached and our suggestions to empirically test its progress.

Definitions of intelligence vary across, and even within, disciplines. It is therefore challenging to agree upon its semantic meaning. In this article, we broadly define intelligence as a form of higher cognition, which is the ability to flexibly adjust a large variety of goals and behaviours in a dynamic environment and learn from experience. A large body of research supports that human intelligence may be different from other animals in the sense that it uses highly abstract concepts and language (symbolic reasoning). However, more and more animal cognition studies show that the differences in intelligence between humans and animals are mainly quantitative and not qualitative. Traditionally, it has been human-level intelligence, or cognitive abilities, that AI research aims to achieve.

The first wave of AI research which began in the 1950s, also called “good old fashioned AI” (GOFAI), aimed to develop AI and AGI kind of systems by focusing on symbolic representation and manipulation of knowledge. The systems were hand-programmed, custom-engineered algorithms and therefore purely based on human intelligence. The main problem of GOFAI is that it lacked grounding in the real world through perception. This led to low flexibility and adaptability in a real environment and no considerable progress towards an autonomous human-level AI.

The second wave of AI, which emerged in the 1980s, including the present hype, moved away from the main goal to develop AGI systems and focused on more practical and manageable specialised “smart” systems. Today’s AI systems are mainly based on statistical machine learning and show astonishing capabilities in their narrow, specialised application areas. The considerable progress in machine learning and especially deep learning provided a wide range of applications where machines can perform specific data interpretative tasks with often super human abilities. However, the research and funding focus on narrow AI systems could not considerably advance the progress in the direction of AGI.

Traditionally, AI research has been closely connected with research fields studying natural intelligence, particularly human intelligence, like neuroscience and psychology. After all, human intelligence has been the basic inspiration for the creation of intelligent machines for centuries. However, there has since been a drift away from this. In a recent scientific article, DeepMind’s Demis Hassabis and co-authors discussed the loss of this essential exchange and collaboration between artificial and natural intelligence research. They argue for the critical and ongoing importance of natural intelligence research for the creation of AGI.

Natural and artificial intelligent systems share basic characteristics: both can be described as agents, which process sensory inputs from the environment and integrate the actual/new information with its existing knowledge base and goals to decide upon the next action.

Both the first and second wave of AI research created progress in the field and many different technologies were developed with different specific sub-components of intelligence like symbolic representation, perception, learning and reasoning.

However, all these problems were approached and solved individually, creating different technology modules which are well functioning for their specific task but which have not been combined into a unified, collaborative architecture. Much like natural intelligence, in order to create AGI, we need an integrative cognitive architecture (CA) describing the whole agent. The single modules should be integrated into one coherent architecture synergistically supporting each other. This integrative system needs to be grounded in reality through perception and be able to process the ambiguous information from real environments dynamically with limited time and computational resources. The architecture must allow for real time learning and be able to generalise and adapt its knowledge and skill system. The agent should be fully autonomous, able to learn, develop and adapt (i.e., improve its performance through experience). The main challenge for CA’s, however, remains the design of a complex knowledge base and representation structure, which integrates current state knowledge (environment, goals) with predictions about near future states to make decisions for the next action. Some different CA’s have been developed but so far none of them can fulfil all requirements, particularly the representation problem.

Cognitive Skills

The best way to advance AGI seems to be based on an artificial cognitive architecture. Artificial cognitive architectures, and most AI systems, are mainly inspired by findings from human psychology and cognition. However, human intelligence is a result of evolution and evolution does not “invent” new mechanisms, but rather builds upon existing structures and processes by implementing small changes. These changes can lead to extension, deletion, recombination, restructuring and redeployment of aspects of the original mechanism while their mechanisms might still result in the same, a similar, or a different output. Therefore, we believe it is crucial to build upon the knowledge and test methodologies from all natural cognition (as opposed to only human psychology), including animal cognition, comparative psychology, developmental psychology and the evolution of cognition, which is combined in the young field of comparative cognition.

We suggest a thorough review of the natural cognition literature to identify different cognitive skills (also referred to as abilities) found in animals and humans. These skills should be classified according to their functions and dependencies. Analysis and synthesis of these skills should lead to the definition and a catalogue of necessary and additional cognitive skills for an artificial organism to function in accordance with its specific environment and goals.

Based on an initial review of the literature (e.g.: Sara J. Shettleworth, “Cognition, Evolution, and Behaviour”) , the cognitive skill catalogue and classification could, for example, include three levels of cognitive processes. The first level consists of the fundamental mechanisms, including perception, attention, basic learning mechanisms like conditioning and habituation, discrimination, classification, concepts, memory and knowledge representation (e.g.: symbolic). The second level mechanism consists of the three physical domains, spatial-, numeric- and timing cognition as well as the social domain including social relationship recognition and classification, theory of mind, empathy, social learning and communication. The third level could include skills like planning, causality, reasoning, problem solving and language.

Fig1: Preliminary cognitive skill catalogue and classification, including three levels of cognitive processes

The three levels might reflect the order of operation, where the second level processes operate on the first level and the third level operates on both the first and second level. The first and third level could function in a more domain general way, whereas the second level skills seem more specialised and domain specific.

We will first put a focus on the fundamental mechanisms identified in the previous section. Comparative cognition studies show that the mentioned first level cognitive skills seem to be present in most animals. They are also the basic mechanism needed to process sensory input and create a basic “idea” of the environment. According to Moreavec’s paradox, this step is likely the most technologically challenging. The development of a CA including the fundamental cognitive skills would already be a considerable progress in AI research. Inspired by the process of evolution, the second level skills should be implemented by building upon the first level skills by extending, recombining etc. Another important input the evolution of cognition can provide is the need of real time and economic processes. An organism, which must spend hours calculating the reaction to a threat or consuming a food item would simply not survive. This is especially important for the fundamental cognitive processes, the perception processing, attention and classification must happen in real time or the animal would become dinner itself. The artificial cognition architecture should therefore, from the very first implemented skills, be designed to process information in real time in an economic way.

Evaluation of AGI

The evaluation of AGI systems is of clear importance to assess the achievement of the desired abilities (binary output) but also to identify performance, strengths and deficiencies. Furthermore, testing a system several times is crucial to determine progress during the design, training and development (ontogeny) of the system. Most existing evaluation methods are human-centric and provide only a binary output that does not allow the mentioned progress assessment (e.g.: Turing test) and they strongly relay on human language. Currently, progress in AGI is mainly evaluated case-by-case with random, ad hoc test setups (as an example see this video of an iCub categorising boxes, bottles and cars). No standard, progressive, or independent evaluation framework for AGI exists to date, mainly because of: 1) the focus on narrow AI, which can be tested task-specifically, and 2) the inherent complexity of the AGI systems, which have to deal with a wide range of unexpected environments, goals and tasks. Because of the generality of AGI systems, a testing framework should be based on the standardised assessment of cognitive abilities and not on the performance of specific tasks (read more here).

In comparative cognition, cognitive skills are inferred indirectly by observing the behaviour in response of carefully designed experimental setups. The test methodology of comparative cognition is different from traditional psychological tests (except tests in developmental psychology and for aphasic humans) in the way that they do not rely on human language but use rewards and focus on observable behavioural output. The cross application of the natural cognitive testing methodology to artificial cognition could provide an empirically comparable framework for the stepwise development of an artificial CA. The basic concept of comparative cognition tests (see a video example of a fish object recognition task or a complex problem solving task for crows) is to present a situation where the subject (e.g.: human infant or animal) either has the choice between different behavioural options or reactions are observed (i.e., gaze direction). Each cognitive test is specific in that it assesses only one or a few subcomponents of cognition. For each cognitive test it can be assessed either if the test is passed or failed (for dichotomous skills) or a specific set of behavioural variables of the subject’s reaction are measured (e.g., reaction time, gaze following, percentage of right choices compared to random choice). Reversal learning for example is a cognitive skill that is repeatedly tested in humans and animals. The tested individual is first trained on a specific technique to gain a reward, such as the reward is under the blue and not under the red cup, until the test subject continually shows the right choice. In the testing setup, the technique to gain the reward will be exchanged so that the reward can now be found under the red cup and not anymore under the blue cup. The test measures the number of trials the subject needs to reverse the previously learned technique and consistently uses the newly correct one.

The testing methodology must fulfil several points: the basic concept should be clearly defined; it should be applicable in different forms (physical test, simulation), it should come in different versions, testing different stages of modularity/generality but also allowing several testing modes.

We would like to introduce an initial idea of a standardised goal and testing framework for artificial cognitive systems. Artificial cognitive systems can be characterised by defining their environment, goals, behaviours/actions and its corresponding complexity (further referred to as artificial system characters). Based on the characterisation a system requires a different set of cognitive skills and depending on the complexity, the skills might differ in their modularity or generality. The identification and formulation of cognitive skills found in natural systems and the categorisation into different levels and dependencies can result in the formulation of an artificial cognitive skill profile allowing the definition of goals and testing methodology for different characters of artificial cognitive systems. The system specific desired or achieved test results could be integrated into the artificial cognition skill profile that then can provide clear development and performance goals for the desired artificial cognitive system as well as some form of end “certification” including the final results of the test.

Conclusion

To create human-level AI, we believe that more focus should be placed on the desired cognitive abilities combined in an integrative CA. A clear concept, goals and evaluation framework should be created to significantly advance the development of artificial cognitive systems. Comparative cognition has identified numerous cognitive skills underlying human and animal intelligence and has established widespread testing methodologies to assess those skills. We therefore suggest the development of a practically applicable framework and assessment for the development of artificial cognitive systems based upon the findings and methodology from human and animal natural cognitive studies.

A standardised artificial cognition goal and testing framework could assist research and economic projects in the definition of a desired cognition profile, providing basic cognitive test setups for development, and providing independent testing and scoring of artificial cognition systems. This can be of economic and public benefit by providing performance metrics of the cognitive skills of an artificial system that could also be comparable between different artificial and/or natural systems. Furthermore, this could influence decisions about production, product release, and safety.

In establishing this framework, we suggest as a first step, the realisation of an interdisciplinary workshop bringing together experts from all relevant fields. In this way, we aim to consolidate the natural and AI fields.

About the authors:

Corinne Y. Ackermann holds an MSc in Biology/Anthropology from the University of Zurich, Switzerland, and is currently working on her PhD thesis at the Department of Comparative Cognition, University of Neuchâtel, Switzerland. Her research focus lies in questions related to the evolution of social behaviour, social learning, culture, and cognition in particular species that show a high level of social and cognitive complexity like primates and cetaceans. She is convinced that to understand human intelligence and create artificial intelligence inspiration should be taken from natural cognition in its present form but also its evolutionary history.

Nevicia Case: Nevicia Case is a Ph.D. student in Psychiatry at McGill University in Montréal, Canada. Her neuropsychology research experience began at the Hotchkiss Brain Institute, University of Calgary, while completing a B.A. in Psychology and a M.Sc. in Medical Science. Her Ph.D. research is on decision-making processes in humans, with a focus on the critical role of emotion in these processes. As emotion is a fundamental building block of human intelligence, she supports the view that it is vital to the development of authentic AI.

Dan Lovy: Is a software engineer (B.S. Computer Engineering, University of Michigan). Dan has over 35 years of experience both in creating technology and in product management. Projects and industries have ranged from creating computer languages for game designers, producing educational software and building multimedia devices. Creating human level artificial intelligence is the next frontier. Dan hopes to aid in the journey by contributing engineering structure to the effort. Measurement is always key and using animal and human examples to measure progress is an exciting approach.