Chief Keef is a famous rapper out of Chicago that started off making paranoia-driven gangster raps and quickly transitioned to molly-induced happy party songs after he got paid. Keef’s early stuff – BANG & Back From the Dead – is especially great as a look into the paranoia that drives divisions between these kids. These differences being partially g eographical and partially ‘merit-based’ {If you ain’t poppin pistols, I ain’t rocking wit ya}.

After a lengthy Gawker profile and two excellent tapes, Keef became thinkpiece fodder. The central thesis by outraged black people with rap/culture opinions was that “white people shouldn’t/aren’t equipped to discuss violent rap”.

“Brian “B.Dot” Miller, who is black, and an editor at Rap Radar, took Sargent to task directly, tweeting at him to “please stop writing about MY culture,” bemoaning “cultural tourists writing about the music of MY culture” and “outsiders like yourself in hipster media that get a hard-on by overanalyzing black music.”

Whatever. In the widely circulated New Republic article, a (black) blogger raised similar concerns re: white hipsters writing about Chief Keef:

“Motherfuckers see us as ONE fucking unit and THAT is what we want ‘white bloggers’ to understand. Someone sees Waka and then kills Treyvon … Y’all don’t know that fuckin’ struggle of being judged based on someone else’s actions and you NEVER will … You will never understand. Never feel the pain, shame, guilt … You get to be just you. But in America no matter how hard I try someone is ALWAYS judgin based on my skin and when the Chief Keefs appear, people are thinking OMG look at what years of oppression and demoralization have done to a group. They think: niggers.”

The guy who killed Treyvon[sic] probably hated black people way before Flockaveli dropped. But, whatever. Both of these people are implicitly assuming that (white) bloggers *made* Chief Keef. That the sustained interest in Keef’s work was due to white people that write on the internet. Without the interest of these *bloggers*, Keef would not matter. The fear being, I assume, that the *power* held by these White Bloggers could create an ecosystem of similar rappers “without artistic merit” (representing the worst of the worst of black culture) being given a platform. The truth is, Keef built his fanbase in a wholly organic way by getting listeners similar to him and that white bloggers DO NOT matter in terms of creating a “street rapper” like Chief Keef.

I believe that people who comment on Youtube videos, good or bad, provide a solid, measurable way to understand content. Social media data mining may or may not be a bullshit thing to study but I think if the results (especially ones on either extremes) make sense and pass the ‘eye test’, it’s probably something worth exploring further. One thing that I think makes sense to look at, especially for videos from ‘street’ rappers, is to see if the people commenting type in some sort of unique, measurable way.

Ebonics is a rule-governed language that can sometimes be studied on paper. For example, something like G-Dropping can be looked at on text. Another rule of ‘Ebonics’ has to do with word-initial fricatives. This is a fancy way to say that words like {This} get pronounced {dis} in spoken language. And sometimes, because this is an example where the spelling of the word goes with the general rules of sound and stuff, people will actually write {dis}.

Writing a little script to extract comments from a YouTube video, we can find how often users use words like [da, dis, dat] instead of [the, this, that]. It ends up working really well to distinguish artists, stylistically. The chart below shows rappers that have a high ‘da’ score (street rappers, generally), medium ‘da’ scores (mixed fan bases) and low ‘da’ scores (generally backpackers or ‘barely rap’ bros):

High /da/ Medium /da/ Low /da/ Soulja Slim GZA Ras Kass Geurilla Maab Esham Atmosphere Beanie Sigel Missy Elliot Beastie Boys Eightball & MJG RZA Brother ALi Pastor Troy Ghostface Lupe Fiasco

Stylistically, the High /da/ guys seem to be similar and might be classified under the umbrella term ‘street rapper’, although maybe unfairly. It turns out that as a quick separator of styles this metric works really well. It relies on a phonological rule of AAVE. It probably is ‘wrong’ to call the [th] -> [d] phenomenon a “spelling mistake”. This is a systematic rule of AAVE (just like any other phonological rule in any language) that in this instance finds its way data mine-able through text.

If the ‘hipster media theory’ holds up and Keef’s fan base was cultivated through white people blogosphere link sharing, his initial work should NOT have a High /da/ score. However, this is not the case. The BANG mixtape has consistently High /da/ scores which indicates that it was probably kids similar to Keef that listened to him first. That while the people writing about him online now may be mostly white nerds, the people that fell in love with him initially were black kids like Chief Keef.

The High /da/ guys have a median score of ~0.15 and Low /da/ guys have a median score of ~0.01. Keef’s BANG mixtape has a weighted average /da/ score of ~0.18. This allows us to classify him as a ‘street rapper’. His song Setz Up has a /da/ score of 0.13. Looking through the responses to this song, we find this particular comment below which has an instance of /da/ AND /dat/. It also specifically explains in detail one of the gang references in the song:

A song riddled with gang stuffs appealing to kids that are hyper-aware of these references. The ‘hipster media’ didn’t make these kids care about Keef or understand these references. Most likely, Keef’s music initially represented a reality to the kids in his city.

I think it is important to relate these High /da/ scores to actual lyrical content from songs. We see that the fans response for High /da/ rappers seems to follow a general trend. High /da/ score rappers are all generally ‘street rappers’. We need to find a way to link the lyrical content from these songs to the particular responses. Ideally, we should find that High /da/ scores in YouTube commentary is correlated to some sort of particular word-usage in songs.

There have to be certain trends in word usage that can be measured? For example, I’m sure the word {‘nigger‘} is almost exclusively limited to songs by black guys. Not sure if there are any exclusive ‘white’ words, since white artists probably don’t own any kind of similar exclusivity to lexical items.

No reason we can’t look at this scientifically. All you really have to do is get good enough datasets for ‘white’ raps and ‘black’ raps. Mathematically, of course, {Black} ∩ {White} = ∅ ⇔ One-drop Rule. So, once we have these two datasets we can run some cool machine learning algorithms to train a computer to identify specific ‘white’ and ‘black’ characteristics.

We know from the earlier chart that Pastor Troy is a High /da/ score guy and that Atmosphere is a Low /da/ score guy. Ideally, using the text classification tool, Pastor Troy should score as more ‘black’ and less ‘white’ than Atmosphere. It turns out he does. Considerably.

With average scores:

Artist Black White Atmosphere 7.62 27.45 Pastor Troy 47.56 1.81

The data supports our intuition with regards to Pastor Troy. It seems that Pastor Troy, a High /da/ guy, also has High ‘Black’ scores and Low ‘White’ scores. Does this data extend to other ‘street’ rappers? If we use an arbitrary cutoff of 0.05 (about 25% of the songs we mined in a 1500+ song dataset) we see that High /da/ scores generally correlate to Low ‘White’ Scores. That is, how the fans are talking about an artist is directly correlated to the actual lyrical content. A pretty sweet discovery.

We see that Low ‘White’ scores (0-15) correlate with High /da/ scores (>0.05). That is, there is a 92% chance that a song with a /da/ score greater than 0.05 will have a White Score less than 15. Pretty great evidence that the way fan bases discuss a street artist is a predictor of the kind of lyrical content an artist has. Without even listening to a song, we can know what kind of song we are dealing with just by how the fans are interacting with the work.

The blogosphere simply cannot ‘break’ a street artist. Any shit-talk to the contrary is without merit.