This article is part of our coverage of the latest AI research.
AI systems can mimic some aspects of human intelligence with impressive results, including detecting objects, navigating environments, playing chess, or even generating text. But cloning human behavior has its limits. Without supporting actions with thought, AI systems can become fragile and make unpredictable mistakes when faced with new situations.
A recent project by scientists at the University of British Columbia and the Vector Institute shows the benefits of getting AI systems think like humans. They propose a technique called Thought Cloning, which trains AI on thoughts and actions at the same time.
Thought cloning can allow deep learning models to generate some sort of reasoning process for their actions and transmit that reasoning to human operators. There are many benefits to thought cloning, including training efficiency, troubleshooting and error correction, and prevention of harmful behavior.
Behavior cloning vs thought cloning
Many deep learning systems are trained on human-generated data. For example, training data could be the list of moves in a game of chess or the sequence of actions in a strategy game. They can be real-world actions like completing tasks in a warehouse. By training on a large enough dataset, the AI agent will be able to model human behavior on such a task.
But while the model can learn to mimic human behavior and achieve the same results in many tasks, it doesn’t necessarily learn the reasoning behind those actions. Without the thought process, the AI agent will not be able to generalize the learned actions to new settings. As a result, it will require a much larger training dataset including all possible scenarios. And it will still remain unpredictable in the face of invisible borderline cases.
The assumption behind thought cloning is that if you train a model on the corresponding actions and thoughts, the model will learn the right associations between behavior and goals. And he will also be able to generate and communicate the reasons behind his actions.
To achieve thought cloning in ML models, feed the model multiple streams of information during training. One is observation of action, such as the moves a player is making in a game. The second is the flow of thought, as the explanation behind the action. For example, in a real-time strategy game, the AI observes that the player has moved some units in front of a bridge. At the same time, he receives a textual explanation that says something like blocking enemy forces from crossing the bridge.
There are several advantages to this approach. First, AI agents will learn faster because they will need fewer examples to understand why a particular action matters. Second, they’ll perform better, because they’ll be able to generalize the same reasoning to unseen situations. And third, they will improve safety by expressing the reasoning behind every action they take. For example, if the AI agent is pursuing the right goal but intends to take an unsafe action (for example, running through a red light to reach the destination in time), then it can be deterred before it causes harm. Consequently, if he is taking the right action for the wrong reason, he can be pointed in the right direction.
Teaching artificial intelligence to mimic human thinking
The researchers propose a deep learning architecture consisting of two parts trying to accomplish a mission. The upper component processes a stream of thoughts and observations about the environment and tries to predict the next thought that will help the model achieve its goal. The lower component receives the higher component’s observations about the environment and output and tries to predict the correct action to take.
The model repeats this process and uses the results from each stage as input to the next stage. During training, the model has access to the sequence of thoughts and actions produced by humans. It uses this information as ground truth to adjust its parameters and minimize loss of thought and action predictions. A trained model should be able to generate the right sequence of thoughts and actions for unseen tasks.
The model uses transformers, long-term memory networks (LSTM), and visual language models to process text commands and visual data, blend them together, and track multi-step embeds. The researchers posted their results on GitHub, including the model weights, the code to train the model, and the code to generate the training and test data. (This is a promising development against the backdrop of AI labs sharing less and keeping the details of their models a secret.)
For their experiments, the authors used BabyAI, a grid-world platform in which an AI agent has to accomplish several missions. The agent can perform various actions such as picking up items, opening doors, and navigating rooms. The advantage of the BabyAI platform is that it can programmatically generate worlds, missions, solutions, and narratives to train the AI system. The researchers created a dataset of one million scenarios to train their thought cloning model.
To test their technique, the researchers created two different models. The former was trained for pure behavioral cloning, which means that he only received observations of the environment. The second was trained in thought cloning, receiving both behavioral data and a stream of plaintext explanations of the reasoning behind each move.
The results show that thought cloning significantly outperforms behavior cloning and converges faster because it needs fewer training examples to generalize to unseen examples. Their experiments also show that thought cloning also outperforms behavior cloning in out-of-distribution (OOD) examples (tasks that are very different from model training examples).
Thought cloning also allowed the researchers to better understand the AI agent’s behavior because for each step it produced its planning and reasoning in natural language. Indeed, this interpretability feature allowed researchers to investigate some of the model’s early errors during training and quickly adjust their training regimen to point it in the right direction.
In terms of safety, researchers have developed a technique called Precrime Intervention that automatically detects and prevents risky behavior by examining the thought flow of models. They note that in their experimental setting, Precrime Intervention almost completely eliminates all unsafe behavior, thus demonstrating the promising potential of TC agents in promoting AI safety.
Applying thought cloning to real-world artificial intelligence
Thought cloning is an interesting and promising direction of AI research and development. It fits in with other businesses looking to build embodied, multimodal deep learning models, such as Google’s PaLM-E and DeepMind’s Gato. Part of the reason human intelligence is so much more robust than current AI is our ability to ingest and process different modes of information simultaneously. And experiments show that multimodal AI systems are much more robust and efficient.
However, thought cloning is not without its challenges. For one thing, the BabyAI environment is simple and deterministic, which makes it much easier for deep learning models to learn its nuances and intricacies. The real world is messier, unpredictable, and far more complex.
Another challenge of this method is creating the training data. People don’t necessarily recount their every action when performing tasks. Our shared knowledge and similar biology obviate the need to spell out our every intention explicitly. The authors propose that one solution could be to use YouTube videos where people explain as they go about activities. However, even then, human behavior is fraught with implicit reasons that cannot necessarily be explained in plain text.
How thought cloning performs on internet-scale data and complex problems remains to be seen. But as the authors of the article state, it creates new avenues for scientific investigation of artificial general intelligence, AI safety and interpretation.
#Teach #Mimic #Human #Thought #Action #TechTalks
Image Source : bdtechtalks.com