Voice Control Will Force an Overhaul of the Whole Internet

The data centers of today, the infrastructure that forms the physical foundation of the "cloud," aren't equipped to process the kind of data demanded by voice control.
533155801
Getty Images

Jason Mars built his own Siri and then he gave it away.

Mars is a professor of a computer science at the University of Michigan. Working alongside several other university researchers, he recently built a digital assistant that could instantly respond to voice commands---much like Siri, the talking assistant offered on the Apple iPhone. Then he open sourced the thing, freely sharing the underlying code with the world at large.

Known as Sirius, the project is a way for all sorts of other software coders to explore the complexities of modern speech recognition, and perhaps even add speech rec to their own mobile apps. This, Jason Mars realizes, is where the world is moving.

But the project has another aim. Mars also realizes that the massive computing centers that underpin today's internet are ill-equipped for the coming voice revolution, and with his project, he hopes to show how these facilities must change. "We want to understand how future data centers should be built," he says.

You see, digital assistants like Siri and Google Now and Microsoft Cortana don't just run on your phone. They run across thousands of machines packed into these enormous computing centers, and as we extend such services to more and more people across the globe, we can't just run them on ordinary machines. That would take up far too much space and burn far too much energy. We need hardware that's significantly more efficient.

With their open source project, Mars and his colleagues, including a Michigan PhD student named Yunqi Zhang, can show how a tool like Siri operates inside the data center, and ultimately, they aim to identify the hardware best suited to running this kind of voice service---not to mention all the other artificially intelligent tools poised to remake the internet, from face recognition tools to self-driving cars.

Dwarfing Google Search

In testing Sirius, Mars has already shown if you run the service on traditional hardware, it requires about 168 times more machines, space, and power than a text-based search engine a la Google Search. When you consider that voice-recognition is the future of not only mobile phones but the ever growing array of wearable devices, from Apple Watch on down, that's completely impractical. "We're going to hit a wall," Mars says. Data centers don't just take up space. They don't just cost enormous amounts of money to build. They burn enormous amounts of energy---and that costs even more money.

The big question is: What hardware will replace the traditional gear? It's a question that will affect not only the Apples and the Googles and the Microsofts and so many other app makers, but also the companies that sell data center hardware, most notably big-name chip makers like Intel and AMD. "We're all over this," says Mark Papermaster, AMD's chief technology officer. "It's huge for us and our future."

Ultimately, that's why Mars is running his Sirius project. The Apples and Googles and the Microsoft know how this new breed of service operates, but the rest of the world doesn't. And they need to.

A Parallel Universe

Most web services, from Google's web search engine to Facebook's social network, run on basic server chips from Intel and AMD (mostly Intel). The problem is: these CPUs (central processing units) are ill-suited to a voice-recognizing services like Siri, which tend to run lots and lots of tiny calculations in parallel.

As companies like Google, Microsoft, and Chinese search giant Baidu have said, these calculations work better on simpler, less-power-hungry processors, such as the GPU (graphics processing unit) chips originally built for processing complex digital images, or on the FPGA (field programmable array) chips that can be programmed for specific tasks. Google is already using GPUs to power the brain-like "neural networks" that help drive its Siri-like service, Google Now. And Microsoft is using FPGAs to drive at least part of its Bing search engine.

No, Bing doesn't do voice. But like GPUs, FPGAs improve efficiency across all sorts of sweeping web services, mainly because they don't burn as much power or take up as much space.

Basically, with GPUs and FPGAs, you can pack more chips into each machine. Though each one alone is not as powerful as a CPU, you can divide larger calculations into small pieces and spread them across each chip. This becomes even more attractive with applications like voice recognition, which are so well suited to parallel processing. "A number of these emerging workloads require you to sift through a massive amount of information very quickly," Papermaster says. "Those, by their very nature, can be accelerated [with GPUs or FPGAs] because of the repetitive nature of the work you're doing."

GPUs are now the chips of choice not only for voice recognition, but for all sorts of other services based on neural networks. These "deep learning" tools involve everything from the face recognition services on Google+ and Facebook to the ad targeting tech on the Baidu search engine, and eventually, they'll help power self-driving cars and othe robotics. Jeff Dean, who helps oversee much of the deep learning work at Google, says the company now uses a blend of GPUs and CPUs in running neural networks that now help power about 50 different Google web services.

But as Microsoft has shown, FPGAs provide another option. With his open source digital assistant, Jason Mars---who has long explored modern data center architecture, at Michigan as well as the University of California, San Diego---seeks to determine which is the best option for our future internet services.

Beyond Apple and Google

The answer is still unclear. But with Sirius, Mars has at least shown that GPUs and FPGAs are much better options than what a company like Intel offers today. "It's going to be absolutely critical that future data center designs include GPUs or FPGAs," Mars says. "You can get at least an order-of-magnitude improvement."

Because you can program them to do whatever you want, he says, FPGAs are potentially much more efficient than GPUs (according to the University of Michigan tests, FPGAs can provide about 16 times the performance, versus about ten times for GPUs). But they require more design work. Companies like Google and Apple and Microsoft must hire engineers who can program them.

GPUs also require some extra work. As with FPGAs, you must tailor your software to these particular chips. But you needn't program the chips themselves. And for that reason, these could be a more viable option, especially when you consider that voice recognition tools will eventually move beyond the Apples and the Googles and Microsofts, into companies even less willing to hire their own chip engineers.

"Siri and Cortana and Google Now---and even more advanced applications that deal with data analytics and process video in real time and give you personalized suggestions---is where our technology is going, where industry is going," Mars says.

However all this plays out, it will reshape the world of computer processors. Intel is already exploring FPGAs. GPU maker nVidia is riding the deep learning wave to new heights. And AMD, which bought GPU maker ATI years ago, is pushing even further into the field. As Papermaster says, the company is now working with companies across the industry to built tools that will let coders more easily write software for GPUs.

When you consider that many of the big internet companies, including Facebook and Microsoft, are also exploring the use of low-powered ARM chips inside their data centers, the chip market is poised for an enormous shift over the next several years. Jason Mars and his Sirius project aims to show what this will look like. But Sirius could also feed the big shift. After all, if everyone runs their own Siri, they'll need their own chips.