What it's like to watch an IBM AI successfully debate humans
At a small event in San Francisco last night, IBM hosted two debate club-style discussions between two humans and an AI called âProject Debater.â The goal was for the AI to engage in a series of reasoned arguments according to some pretty standard rules of debate: no awareness of the debate topic ahead of time, no pre-canned responses. Each side gave a four-minute introductory speech, a four-minute rebuttal to the otherâs arguments, and a two-minute closing statement.
Project Debater held its own.
It looks like a huge leap beyond that other spl ashy demonstration we all remember from IBM, when Watson mopped the floor with its competition at Jeopardy. IBMâs AI demonstration today was built on that foundation, it had many corpuses of data it could draw from, just like Watson did back in the day. And like Watson, it was able to analyze the contents of all that data to come up with the relevant answer â" but this time the âanswerâ was cogent points related to subsidizing space and telemedicine laid out in a four minute speech defending each.
Project Debater cited sources, pandered to the audienceâs affinity for children and veterans, and did a passable job of cracking a relevant joke or two in the process.
Thatâs pretty impressive â" it essentially created a freshman-level term paper kind of argument in just a couple of minutes flat when presented with a debating topic it had no specific preparation for. The system has âseveral hundred million articlesâ that it assumes are accurate in its data banks, around about 100 areas of knowledge. When it gets a debate topic, it takes a couple of minutes to spelunk through them, decides what would make the best arguments in favor of the topic, and then creates a little speech describing those points.
Some of the points it made were pretty facile, some quoted sources, and some were pretty clearly cribbed from articles. Still, though, it was able to move from the âpresent informationâ mode we usually think of when we hear AI to a âmake an argumentâ mode. But what impressed me more was that it attempted to directly argue with points that its human opponents made, in nearly real time (the system needed a couple minutes to analyze the humanâs 4-minute speech before it could respond).
It frankly made me feel a little unsettled, but not because of the usual worries like ârobots are going to become self aware and take overâ or âAI is comin g for our jobs.â It was something subtler and harder to put my finger on. For maybe the first time, I felt like an AI was trying to dissemble. I didnât see it lie, nor do I think it tried to trick us, but it did engage in a debating tactic that, if you saw a human try it, would make you trust that human a little bit less.
Here was the scene: a human debater was arguing against the motion that the government should subsidize space exploration. She set up a framework for understanding the world â" a pretty common debating tactic. Subsidies, she argued, should fit one of two specific criteria: fulfilling basic human needs and creating things that only could be done by the government. Space exploration didnât fit the bill. Fair enough.
Project Debater, whose job was to respond directly to those points, didnât quite rebut them directly. It certainly talked in the same zone: it claimed that âsubsidizing space exploration usually returns the investmentâ in the form of economic boosts from scientific discovery, and it said that for a nation like the United States, âhaving a space exploration program is a critical part of being a great power.â
What Project Debater didnât do is directly engage the criteria set forth by its human opponent. And hereâs the thing: if I were in that debate I wouldnât have done so either. Itâs a strong debating tactic to set the framework of debate, and accepting that framework is often a recipe for losing.
The question, then: did Project Debater simply not understand the criteria, or did it understand and choose not to engage on those terms? Watching the debate, I figured the answer was that it didnât quite get it â" but I wasnât positive. I couldnât tell the difference between an AI not being as smart as it could be and an AI being way smarter than Iâve seen an AI be before. It was a pretty cognitively dissonant moment. Like I said: unsettli ng.
Jeff Welser, the VP and lab director for IBM research at Almaden, put my mind at ease. Project Debater didnât get it. But it didnât get it in a really interesting and important way. âThereâs been no effort to actually have it play tricky or dissembling games,â he tells me (phew). âBut it does actually do â¦ exactly what a human does, but it does it within its limitations.â
Essentially, Project Debater assigns a confidence score to every piece of information it understands. As in: how confident is the system that it actually understands the content of whatâs being discussed? âIf itâs confident that it got that point right, if it really believes it understands what that opponent was saying, itâs going to try to make a very strong argument against that point specifically,â Welser explains.
âIf itâs less confident,â he says, âitâll do itâs best to make an argument thatâll be convincing as an argument even if it doesnât exactly answer that point. Which is exactly what a human does too, sometimes.â
So: the human says that government should have specific criteria surrounding basic human needs to justify subsidization. Project Debater responds that space is awesome and good for the economy. A human might choose that tactic as a sneaky way to avoid debating on the wrong terms. Project Debater had different motivations in its algorithms, but not that different.
The point of this experiment wasnât to make me think that I couldnât trust that a computer is arguing in good faith â" though it very much did that. No, the point is that IBM showing off that it can train AI in new areas of research that could eventually be useful in real, practical contexts.
The first is parsing a lot of information in a decision-making context. The same technology that can read a corpus of data and come up with a bunch of pros and cons for a debate could be (and has been) used to decide whet her or not a stock might be worth investing in. IBMâs system didnât make the value judgement, but it did provide a bunch of information to the bank showing both sides of a debate about the stock.
As for the debating part, Welser says that it âhelps us understand how language is used,â by teaching a system to work in a rhetorical context thatâs more nuanced than the usual Hey Google give me this piece of information and turn off my lights. Perhaps it might someday help a lawyer structure their arguments, ânot that Project Debater would make a very good lawyer,â he joked. Another IBM researcher suggested that this technology could help judge fake news.
How close is this to being something IBM turns into a product? âThis is still a research level project,â Welser says, though âthe technologies underneath it right nowâ are already beginning to be used in IBM projects.
In the second debate, about telemedicine, Project Debater once again had a difficult time parsing the exact nuance its human opponent was making about how important the human touch is in diagnosis. Rather than discuss that, it fell back to a broader argument, suggesting that maybe the human was just afraid of new innovations.
âI am a true believer in the power of technology,â quipped the AI, âas I should be.â
Next Up In Tech
- Senate votes to reinstate ZTE ban thatâs nearly shut down the company
- Airbnb founders call Trumpâs family separation policy âheartless, cruel, and immoralâ
- China bans ASMR videos citing âvulgar and pornographic contentâ
- This unlicensed Harry Potter battery pack makes a bad pun out of an even worse product
- Mi crosoft says itâs âdismayedâ by child separations after criticism over ICE contract
- Google will fix Chromecast and Google Home bug that reveals a userâs location