In a paper published this week on the preprint server Arxiv.org, a team of researchers from Google Brain, Google X, and the University of Calfornia at Berkeley describe an extension to existing AI methods that enable an agent — for instance, a robot — to decide which action to take while performing a previous action. The idea is that modeling an agent’s behavior after that of a person or animal will lead to more robust, less failure-prone systems in the future.

The researchers point out that while AI algorithms have achieved success in video games, robotic grasping, and manipulation tasks, most use a blocking observe-think-act paradigm — an agent assumes that its environment will remain static while it “thinks” so its actions will be executed on the same states from which they were computed. This holds true in simulation but not in the real world, where the environment state evolves as the agent processes observations and plans its next actions.

The team’s solution is a framework that can handle concurrent environments in the context of machine learning. It leverages standard reinforcement learning formulations — formulations that drive an agent toward goals via rewards — wherein an agent receives a state from a set of possible states and selects an action from some set of possible actions according to a policy. The environment returns the next state sampled from a transition distribution and a reward, such that the agent learns to maximize the expected return from each state.

In addition to the previous action, two additional features — action selection time and vector-to-go (VTG) — help to encapsulate concurrent knowledge. (The researchers define VTG as the last action to be executed the instant the state of the environment is measured.) Concurrent action environments capture the state while the previous action is being executed and after the state is captured. And the policy selects an action and executes it regardless of whether the previous action has been completed — even if that necessitates interrupting the previous action.

The researchers conducted experiments on a real-world robot arm, which they tasked with grasping and moving various objects from a bin. They say their framework achieved grasp success comparable to a baseline blocking model but that it was 49% faster than the blocking model in terms of policy duration, which measures the total execution time of the policy. Moreover, the concurrent model was able to execute “smoother” and swifter trajectories than the baseline.

“Concurrent methods may allow robotic control in dynamic environments where it is not possible for the robot to stop the environment before computing the action,” wrote the coauthors. “In these scenarios, robots must truly think and act at the same time.”

The work follows a Google-led study describing an AI system that learned from the motions of animals to give robots greater agility. The coauthors believed their approach could bolster the development of robots that can complete tasks in the real world, such as transporting materials between multilevel warehouses and fulfillment centers.