Oct 10, 2013 9:30 AM

These Guys Are Teaching Computers How to Think Like People

A new algorithm developed at Stanford University could give computers the power to more reliably interpret language. Called Neural Analysis of Sentiment — or NaSent for short — the algorithm seeks to improve on current methods of written language analysis by drawing inspiration from the human brain.

Image may contain Electronics Computer Keyboard Computer Hardware Hardware Computer Keyboard Laptop and Pc

Each day, millions of people use Twitter, Facebook, and other social networks to air their opinions on everything from the government shutdown to the latest version of Apple's iPhone software.

For the web's biggest companies -- including not only Twitter and Facebook but Amazon and Google -- this ever-expanding online discourse is a treasure trove, a collection of personal information that can help them better understand who you are and, ultimately, get you in front of stuff you want to buy. But this is easier said than done. Their ability to mine all that data hinges on how well their computer algorithms can understand what you’re saying. And let’s face it, machines aren’t too good at that.

But a new algorithm developed at Stanford University could help change this reality, giving computers the power to more reliably interpret language. Called Neural Analysis of Sentiment -- or NaSent for short -- the algorithm seeks to improve on current methods of written language analysis by drawing inspiration from the human brain.

NaSent is part of a movement in computer science known as deep learning, a new field that seeks to build programs that can process data in much the same way the brain does. The movement began in the academic world, but it has since spread to web giants such as Google and Facebook.

"We see deep learning as a way to push sentiment understanding closer to human-level ability -- whereas previous models have leveled off in terms of performance," says Richard Socher, the Stanford University graduate student who developed NaSent together with artificial-intelligence researchers Chris Manning and Andrew Ng, one of the engineers behind Google's deep learning project.

The aim, Socher says, is to develop algorithms that can operate without continued help from humans. "In the past, sentiment analysis has largely focused on models that ignore word order or rely on human experts," he says. "While this works for really simple examples, it will never reach human-level understanding because word meaning changes in context and even experts cannot accurately define all the subtleties of how sentiment works. Our deep learning model solves both problems."

Richard Socher.

Currently, the most widely used methods of sentiment analysis have been limited to so-called “bag of words” models, which don’t take word order into account. They just parse through a collection of words, mark each as positive or negative, and use that count to estimate whether a sentence or paragraph has a positive or negative meaning.

NaSent is different. It can identify changes in the polarity of each word as it interacts with other words around it. That’s important because to really decipher a statement’s meaning “you can't just look at each word on its own," says Elliot Turner, CEO of AlchemyAPI, a company that uses deep learning for sentiment analysis. "You have to meaningfully put words together into larger and larger structures.”

To build NaSent, Socher and his team used 12,000 sentences taken from the movie reviews website Rotten Tomatoes. They split these sentences into roughly 214,000 phrases that were labeled as very negative, negative, neutral, positive, or very positive, and then they fed this labeled data into the system, which NaSent then used to predict whether sentences were positive, neutral or negative on its own.

NaSent, the researchers say, was about 85 percent accurate, an improvement over the 80 percent accuracy of previous models. The system isn't yet licensed to outside organizations, but the team has been contacted by "a few startups" who are interested in using it, according to Socher.

Despite those promising early tests, the algorithm still has a ways to go. It gets tripped up, for instance, if it sees words and phrases it has never encountered before. To make the system more robust, Socher and his team have started feeding the system more data from Twitter and the Internet Movie Database. They’ve also set up a live demo where people can type in their own sentences. The demo creates a tree structure that assigns a polarity label to each word. If users think that NaSent is misinterpreting a particular word or phrase, they can relabel it. In just a few weeks, the demo has received 14,000 unique visitors.

"People are nice enough to teach it new things, to tell it when it’s incorrect or not," Socher says. "The beauty of giving a live demo is that people are trying to break it. They’re pushing the limits on this and giving us new training data. That helps the model."