The $11M Tool That Could Help Computers Write Their Own Code

A group of computer scientists backed by DARPA want to create an autocomplete and autocorrect system for writing code.

Nowadays, if you start typing something into Google, it tries to guess what you're looking for. Type "Wi," and it might suggest "Wikipedia." Key in "Bra," and it'll guess "Brad Pitt. Yes, these "autocomplete" suggestions are sometimes hilariously off the mark, but more often than not, they're rather accurate, providing a handy shortcut to what you want.

Now, a government-backed research team wants to provide similar suggestions to the world's programmers as they're writing computer code. That's right: the aim is to guess what programmers are coding before they code it.

This week, Rice University said that Darpa, the Pentagon's mad science division, has invested $11 million in this autocomplete programming project, dubbed PLINY, after the ancient Roman author of the first encyclopedia, "Text search prediction is the best analogy," says Vivek Sarkar, the chair of the computer science department at Rice and the principal investigator on the project. "People will be able to will be able to pick from a list of possible solutions."

>That's right: the aim is to guess what programmers are coding before they code it.

The project involves researchers from from Rice, the University of Texas-Austin, the University of Wisconsin-Madison, and the developer tools company GrammaTech. PLINY will index massive amounts of opens source code gathered from the web to power a prediction engine that the researchers hope will be able to predict what coders are about to type. It could also, in theory, spot bugs or security vulnerabilities.

If successful, PLINY could be a boon to companies struggling to find enough qualified programmers to work on increasingly complex software projects. It's a problem a growing number of startups are trying to solve, ranging from code education companies like Codecademy to tools like Light Table that aim to make programming more intuitive.

Microsoft and Beyond

PLINY isn't the first attempt to build an autocomplete system for coders. Microsoft is working on something similar with its Bing Developer Assistant, which was released last summer. But Sarkar says PLINY is an even more ambitious project. "Most others are just text analysis with some knowledge of code structure," he says.

Sarkar's team is trying to develop software that analyzes not only text, but also the concepts expressed in code, regardless of the programming language it's written in. Sarkar hopes this will enable PLINY to suggest even large chunks of code that can seamlessly integrate with what a developer has already written. Better still, it might correct security vulnerabilities and other mistakes.

The rub is that this isn't exactly easy. If you've ever struggled with Microsoft Office's old Clippy tool, or paged through the Damn You Autocorrect blog, you know how difficult it can be to get these sorts of predictive systems right. And while Google is able to predict your searches in part by looking at what the most common search terms are, the programming world is a little different. The most common solutions might not be the best solutions.

Sarkar admits the team will face big challenges, particularly in assuring high quality code and in usability.But he thinks his team is uniquely suited to the challenge, thanks to their background doing big data analysis for other applications in the energy sector and health care. He says Rice has been wanting to apply some of its machine learning algorithms to software development for years. Darpa has now given it the means to do so.

Pooling Open Source

The PLINY team will begin by analyzing open source code from around the web, drawing on code hosting services like GitHub and Sourceforge, along with various major open source projects, such as those managed by the Apache Foundation. Eventually, though, he envisions a corporate version that will index all of a company's own proprietary software projects.

The team is also building a custom database system specifically designed for the purpose of storing and analyzing code. The new database will give them ways to structure and prioritize the code it indexes. This could help with the code quality issue. Projects known for exceptionally good good could be prioritized, or perhaps code written by specific people would be given preference.

The end result could be something that looks an awful like Google's autocomplete---only more useful.