The Best Infographic for Comparing Rapper Vocabularies, From Wu-Tang to Yeezy

In partnership with RapGenius, Matt Daniels analyzed the vocabularies of noted wordsmiths William Shakespeare, Herman Melville, and Lil Wayne.

When most people get into a new musical artist, they'll pop on a pair of headphones and bob their head. Matt Daniels, a digital strategist at marketing firm Undercurrent, busts out natural language processing software, interrogates their lyrics like a forensic linguist, and creates eye-catching infographics to display the results.

In partnership with RapGenius, Daniels analyzed the vocabularies of noted wordsmiths William Shakespeare, Herman Melville, and Lil Wayne. All told, Daniels analyzed 100 hip hop stars, from Aesop Rock to Jay-Z, taking a representative sample of 35,000 words from their first 5-7 albums, a decision made to level the field between OGs and breakout stars. As a control, he analyzed the first 5,000 words from seven of Shakespeare's plays and the first 35,000 words of Moby Dick. Daniels used a process called Token Analysis and a software tool called NLTK to tally a unique word count for each artist.

Daniels originally published the results of this hip hop inquiry on his website, which armed nerds with a treasure trove of data to share and debate on social media. Pop Chart Lab reached out to Daniels and overhauled the data set with color-coded headshots of each star, transforming a humble meme into a print suitable for hanging over Russell Simmons' 16-foot-tall fireplace.

Judging Epic Rap Battles With Data

Aesop Rock is a star on various subreddits, but the under-the-radar rapper is the undisputed champion of verbosity. "Aesop Rock uses so many unique words in his music that it's hard to design an x-axis that will fit his data-point," says Daniels.

The Wu-Tang Clan benefits greatly from economies of agglomeration. The group is in the top 10, but solo works from Ghostface, Raekwon, and Method Man are also in the top 20, with GZA ranked second overall. Daniels suggests this might have something to do with the time the group spent in the studio together sharing and expanding their vocabularies.

Some artists were able to pad their numbers by ensconcing regional vernacular in the lyrical lexicon as with Outcast's use of "Nahmsayin" and "Ery'day." The Atlanta-based duo are also particularly precocious with the use of portmanteaus like "ATLiens" and made-up phrases, such as "flawsky-wawsky."

E-40 may not be part of the hip hop pantheon like Run DMC, Tupac, or Snoop Dogg, but his contributions to vernacular vocabulary are unmatched, having coined the phrases "all good," "pop ya collar," "you feel me," and "shizzle."

Despite his superheroic self-importance, creative genius Kanye West was bested, miraculously, by noted simpletons the Insane Clown Posse. In fact, many of the best known artists ranked in the bottom 20%.

The fact that Jay Z, Snoop, and Tupac rank in the bottom fifth of Daniels rankings shows the limitations of quantifying creative endeavors. "This is not Sabermetrics—it's impossible to quantify quality in hip hop, music, or art," he says. "The vocab data is just a data-point that hip hop fans can geek out about."

Still, there are companies like Next Big Sound that are trying to create predictive algorithms that'll help identify hits based on Soundcloud plays, Facebook likes, and other leading indicators. And while it may have limited predictive value, analyzing the data retrospectively does help settle the East Coast/West Coast rivalry once and for all—the most scholarly MCs come from the Northeast.