Is a Picture Worth 1,000 Polls?
Bloomberg View , June 25, 2014
President Barack Obama’s poll numbers have been terrible lately, hitting a low of 41 percent approval in the most recent Gallup and Wall Street Journal-NBC polls. That’s hardly surprising, given the state of his foreign policy. But there’s hope.
Whoever selected the photos for Monday’s big New York Times story on the “achingly ephemeral” end of U.S. involvement in Iraq may have done the president a big favor. Presumably intended as irony, the images of Obama from December 2011, including one of him amid an enthusiastic crowd of soldiers, all make him look good. Although undercut by a more up-to-date photo of a downcast president, the upbeat images also outnumber it three-to-one.
New research suggests that positive images in the New York Times portend better poll numbers to come. The same is true of photos in the Washington Post, whose recent choices portray the president looking authoritative or family-friendly. Unlike presidential photos carried by Reuters and the Guardian, where positive images hew closely to current public opinion, the images in two U.S. papers seem to predict -- or to influence -- future attitudes.
That’s one conclusion of work presented by four researchers from the University of California at Los Angeles at this week’s IEEE Computer Vision and Pattern Recognition conference. By wedding computer vision to media analysis, the team of three computer scientists and a communications professor created a way of analyzing huge numbers of political images to determine their likely effects on audiences.
Images, they note, are especially persuasive. “Because we believe our own eyes, but know well that people are manipulative, we tend to be verbally skeptical and visually gullible,” write Jungseock Joo, Weixin Li, Francis F. Steen, and Song-Chun Zhu. There’s ample research, much of it about television, suggesting that viewers recall images more than verbal messages.
Up till now, analyzing media images and their likely effects has required tedious hand coding, limiting researchers to small numbers even as the volume of images seen by audiences has exploded. “The contribution of our paper is to quantify this emotional signal so that it can become machine-interpretable, and to scale up to deal with dozens of thousands of photographs or even videos,” Joo says in an e-mail.
Here’s how they did it.
The researchers first identified nine common, objective characteristics of politician photos, which they call “syntactical attributes.” These include facial expressions, such as a smile; gestures, such as waving; and the scene’s context, such as a crowd. To see how these objective features might affect a photo’s subjective meaning, they collected 1,124 images of eight U.S. politicians from various news sites and recorded their syntactical attributes.
From these coded images, they generated about 4,000 random pairs of photos, each showing the same person. They then asked 10 student volunteers to evaluate the pairs, indicating which of the two photos ranked higher on a given quality, or “intent”: happy, energetic, powerful, trustworthy, angry, fearful, competent, comforting and favorable.
So, for instance, a volunteer might see a photo of Hillary Clinton whispering in the ear of Senator Charles Schumer and another of her waving and be asked, “In which image does Hillary Clinton look more COMPETENT?” Comparing different photos of the same person isolated the images’ composition from both the volunteers’ political views and the unchanging characteristics of the person, such as gender or big lips. Each photo showed up in multiple pairs, allowing researchers to see how it stacked up generally on each characteristic, including overall favorability.
Using this information, the researchers created a mathematical model mapping the relationship between the photos’ syntactical attributes and their inferred intents. “One can say, for example,” they write, “perception of competence arises from combination of facial displays (smile), gestures (hand wave, hand shake), and scene context (large-crowd).” With that information, they could give the computer a new photo of a politician to analyze, looking only at its syntactical attributes, and rank for its likely effect.
That’s how they found the relation between news photos and poll results. They crawled news articles that mentioned Obama between January 2008 through September 2013 in the four sources -- two domestic and two British -- and collected about 50,000 photos of him. They then analyzed the photos for their syntactical attributes and the implied “favorable” effect and compared the patterns against aggregate poll results.
They found that the foreign sources appeared to mirror public opinion while the U.S. newspapers anticipated fluctuations. Before the 2012 election, they note, the Times and Post “appear to take the lead in showing less favorable images of the President, foreshadowing the drop in popularity that follows some weeks after the election.”
The system should work for any politician seen in similar contexts, even if the person wasn’t one of the initial eight, but the initial application was to Obama. (It might not work for nonpoliticians, where the same gestures could have different meanings. “Politicians’ hugs may have different meanings from those of businesspeople like CEOs,” Joo notes.)
It’s not perfect, of course. When I asked what the system might make of Tuesday’s viral photo of Obama reaching over a sneeze guard at Chipotle, Joo admitted it would be stumped. “This is quite context sensitive,” he said, noting that “it would be trivial but pointless to train the model with an enforced prior knowledge to state ‘It is a bad behavior to reach out your arms over glasses of food bars,’ because the society has thousands of such rules and it’s almost impossible to pre-specify all of them.”