Artificial Intelligence Will Do What We Ask. That's a Problem. – Quanta Magazine
Uncertainty about our preferences may be key, as demonstrated by the off-switch game, a formal model of the problem involving Harriet the human and Robbie the robot. Robbie is deciding whether to act on Harriet’s behalf — whether to book her a nice but expensive hotel room, say — but is uncertain about what she’ll prefer. Robbie estimates that the payoff for Harriet could be anywhere in the range of −40 to +60, with an average of +10 (Robbie thinks she’ll probably like the fancy room but isn’t sure). Doing nothing has a payoff of 0. But there’s a third option: Robbie can query Harriet about whether she wants it to proceed or prefers to “switch it off” — that is, take Robbie out of the hotel-booking decision. If she lets the robot proceed, the average expected payoff to Harriet becomes greater than +10. So Robbie will decide to consult Harriet and, if she so desires, let her switch it off.
Russell and his collaborators proved that in general, unless Robbie is completely certain about what Harriet herself would do, it will prefer to let her decide. “It turns out that uncertainty about the objective is essential for ensuring that we can switch the machine off,” Russell wrote in Human Compatible, “even when it’s more intelligent than us.”
These and other partial-knowledge scenarios were developed as abstract games, but Scott Niekum’s lab at the University of Texas, Austin is running preference-learning algorithms on actual robots. When Gemini, the lab’s two-armed robot, watches a human place a fork to the left of a plate in a table-setting demonstration, initially it can’t tell whether forks always go to the left of plates, or always on that particular spot on the table; new algorithms allow Gemini to learn the pattern after a few demonstrations. Niekum focuses on getting AI systems to quantify their own uncertainty about a human’s preferences, enabling the robot to gauge when it knows enough to safely act. “We are reasoning very directly about distributions of goals in the person’s head that could be true,” he said. “And we’re reasoning about risk with respect to that distribution.”
Recently, Niekum and his collaborators found an efficient algorithm that allows robots to learn to perform tasks far better than their human demonstrators. It can be computationally demanding for a robotic vehicle to learn driving maneuvers simply by watching demonstrations by human drivers. But Niekum and his colleagues found that they could improve and dramatically speed up learning by showing a robot demonstrations that have been ranked according to how well the human performed. “The agent can look at that ranking, and say, ‘If that’s the ranking, what explains the ranking?” Niekum said. “What’s happening more often as the demonstrations get better, what happens less often?” The latest version of the learning algorithm, called Bayesian T-REX (for “trajectory-ranked reward extrapolation”), finds patterns in the ranked demos that reveal possible reward functions that humans might be optimizing for. The algorithm also gauges the relative likelihood of different reward functions. A robot running Bayesian T-REX can efficiently infer the most likely rules of place settings, or the objective of an Atari game, Niekum said, “even if it never saw the perfect demonstration.”
Our Imperfect Choices
Russell’s ideas are “making their way into the minds of the AI community,” said Yoshua Bengio, the scientific director of Mila, a top AI research institute in Montreal. He said Russell’s approach, where AI systems aim to reduce their own uncertainty about human preferences, can be achieved with deep learning — the powerful method behind the recent revolution in artificial intelligence, where the system sifts data through layers of an artificial neural network to find its patterns. “Of course more research work is needed to make that a reality,” he said.
Russell sees two major challenges. “One is the fact that our behavior is so far from being rational that it could be very hard to reconstruct our true underlying preferences,” he said. AI systems will need to reason about the hierarchy of long-term, medium-term and short-term goals — the myriad preferences and commitments we’re each locked into. If robots are going to help us (and avoid making grave errors), they will need to know their way around the nebulous webs of our subconscious beliefs and unarticulated desires.
The second challenge is that human preferences change. Our minds change over the course of our lives, and they also change on a dime, depending on our mood or on altered circumstances that a robot might struggle to pick up on.
In addition, our actions don’t always live up to our ideals. People can hold conflicting values simultaneously. Which should a robot optimize for? To avoid catering to our worst impulses (or worse still, amplifying those impulses, thereby making them easier to satisfy, as the YouTube algorithm did), robots could learn what Russell calls our meta-preferences: “preferences about what kinds of preference-change processes might be acceptable or unacceptable.” How do we feel about our changes in feeling? It’s all rather a lot for a poor robot to grasp.
Like the robots, we’re also trying to figure out our preferences, both what they are and what we want them to be, and how to handle the ambiguities and contradictions. Like the best possible AI, we’re also striving — at least some of us, some of the time — to understand the form of the good, as Plato called the object of knowledge. Like us, AI systems may be stuck forever asking questions — or waiting in the off position, too uncertain to help.
“I don’t expect us to have a great understanding of what the good is anytime soon,” said Christiano, “or ideal answers to any of the empirical questions we face. But I hope the AI systems we build can answer those questions as well as a human and be engaged in the same kinds of iterative process to improve those answers that humans are — at least on good days.”
However, there’s a third major issue that didn’t make Russell’s short list of concerns: What about the preferences of bad people? What’s to stop a robot from working to satisfy its evil owner’s nefarious ends? AI systems tend to find ways around prohibitions just as wealthy people find loopholes in tax laws, so simply forbidding them from committing crimes probably won’t be successful.
Or, to get even darker: What if we all are kind of bad? YouTube has struggled to fix its recommendation algorithm, which is, after all, picking up on ubiquitous human impulses.
Still, Russell feels optimistic. Although more algorithms and game theory research are needed, he said his gut feeling is that harmful preferences could be successfully down-weighted by programmers — and that the same approach could even be useful “in the way we bring up children and educate people and so on.” In other words, in teaching robots to be good, we might find a way to teach ourselves. He added, “I feel like this is an opportunity, perhaps, to lead things in the right direction.”