In this post, I’m going to analyze the frequency with which 2NE1 as a group uses English and how that compares to SNSD’s English.

I’m going to be looking at the fraction of total words in English and the fraction of unique words in English. I considered 2NE1’s 2 major album releases (To Anyone from 2010 and Crush from 2014) and their 2 EP’s (2NE1 from 2009 and 2NE1 from 2011). Lyrics were taken from colorcodedlyrics.com.

Here are the major results:

2NE1’s total use of English doesn’t substantially change from album to album (p-value =0.5789), with a mean of 38.3% ± 5.47% (95% CI).

2NE1’s unique English word rate also doesn’t change from album to album (p-value = 0.1341), with a mean of 23.7% ± 4.06% (95% CI).

There isn’t a significant difference between 2NE1’s total English word rate and SNSD’s 4 later albums total English word rate (p=0.1093).

2NE1’s unique English word rate is significantly greater than the unique English word rate in SNSD’s 4 later albums (p-value = 0.0324). The absolute difference is 5.51% ± 5.03% and the relative difference is ~ 37%, between 3.4% and 80%.

Album Total Average Standard Error (Total) Album Unique Average Standard Error (Unique) 2NE1 (2009) 0.354 0.0505 0.271 0.0478 To Anyone (2010) 0.407 0.0527 0.284 0.0396 2NE1 (2011) 0.385 0.0814 0.150 0.0252 Crush (2014) 0.399 0.0360 0.218 0.0275

Here, error bars represent standard error of the mean. Starting from the first album, English comprises 38.3% ± 5.47% (95% CI). A 1-way ANOVA test on the total English data fails to distinguish any large difference between the album’s means (p-value =0.5789 > 0.05).

Total ANOVA Degrees of Freedom Sum of Squares Mean Square F-Value Probability Albums 3 0.04867 0.016224 0.6674 0.5789 Residuals 29 0.70492 0.024308

Turning now to the 2NE1’s unique English word rate the mean across their entire discography is 23.7% ± 4.06% (95% CI). It appears as if 2NE1 from 2011 uses sharply less unique English than the other records, but again, a 1-way ANOVA fails to support that hypothesis (p-value = 0.1341 > 0.05):

Unique ANOVA Degrees of Freedom Sum of Squares Mean Square F-Value Probability Albums 3 0.07128 0.023761 2.0126 0.1341 Residuals 29 0.34238 0.011806

So 2Ne1’s English use is fairly constant. From their debut, 2NE1 uses a large fraction of English and they achieved a reputation as one of the more international, hip-hop influenced K-pop groups. Let’s check if that reputation is justified by comparing 2NE1’s English use to that of another K-pop group that is hugely popular internationally, Girls’ Generation.

Let’s look first at total English words. Before directly comparing the 2 data sets, we need to perform Bartlett’s test to check that no album has a significantly different variance than any other. Considering the 5 Girls’ Generation full albums (Girls’ Generation, Oh!, The Boys, I Got A Boy, and Lion Heart) together with the 4 2NE1 albums, we have 8 degrees of freedom, a K-squared value of 10.649, and a p-value = 0.2224. We can’t reject the hypothesis that all 9 albums have comparable variances, so we can proceed with a 1-way ANOVA test on the 9 albums:

2NE1+GG Total ANOVA Degrees of Freedom Sum of Squares Mean Square F-Value Probability Albums 8 0.66852 0.083566 3.6164 0.001344 Residues 74 1.70993 0.023107

The p-value of 0.001344 is much less than 0.05, so it’s likely that there’s some difference between each of the album’s means. This is to be expected, considering that in a previous post, I found that the first Girls’ Generation album has significantly less total English than later Girls’ Generation albums. Performing an ANOVA test on all 9 albums detected the difference that we already know exists within the SNSD set, so let’s compare the later 4 SNSD albums to the 4 2NE1 albums, excluding the outlying first SNSD album.

2NE1+GG Total ANOVA Degrees of Freedom Sum of Squares Mean Square F-Value Probability Albums 7 0.31132 0.044475 1.7653 0.1093 Residues 66 1.66281 0.025194

Interestingly, this test fails detect a difference within the set of 4 later SNSD albums and 4 2NE1 albums (p=0.1093 > 0.05). We can’t distinguish between 2NE1’s total use of English vs. SNSD’s total use of English while excluding the album Girls’ Generation. Let’s see if there’s a significant difference between the 2NE1 albums, which have total English fractions of 35%-40%, to the album Girls’ Generation, which has a total English fraction of 12.8% ± 3.8%.

Using the following linear combination of variables to compare the first Girls’ Generation album to the 2NE1 set of albums:

Girls’ Generation Oh! The Boys I Got A Boy Lion Heart 2NE1 (2009) To Anyone 2NE1 (2011) Crush 1 0 0 0 0 -0.25 -0.25 -0.25 -0.25

We obtain the following results (the p-value still printed as 0 at 22 digits of precision)

Contrast Standard of Error Lower Bound Upper Bound t-value Degree of Freedom Probability -0.2549 0.05475 -0.3694 -0.1403 -4.43 74 0

So in 2NE1’s discography, the fraction of total English words is 25.5% ± 11.5% greater than in the SNSD album Girls’ Generation, a relative difference of ~200%. This makes sense, since most of SNSD’s discography uses more total English than the album Girls’ Generation, and 2NE1 uses total English words at about the same rate as SNSD’s later albums, so there is a difference between the album Girls’ Generation and 2NE1’s albums.

Now we turn to unique English words. Again, performing Bartlett’s test on the 5 SNSD albums plus 4 2NE1 albums, we have 8 degrees of freedom, a K-squared value of 10.087, and a p-value of 0.259. This means the albums have comparable variations, so a 1-way ANOVA test can be done:

Unique SNSD+2NE1 ANOVA Degrees of Freedom Sum of Squares Mean Square F-Value Probability Albums 8 0.27874 0.034842 3.0907 0.004615 Residuals 74 0.83421 0.011273

The p-value of 0.004615 is less than 0.05, so there are likely significant differences between the unique English means of each album. Recall from before that there was no significant difference in unique English fraction between any of SNSD’s albums.

I performed a pairwise t-test checks if there are significant differences between the means of any individual albums and used Holm’s method to adjust p-values for multiple comparisons. The results weren’t too enlightening, only finding a difference between the 2NE1 albums that used the most unique English (To Anyone from 2010) and the SNSD album that used the least unique English (Girls’ Generation from 2007), with an adjusted p-value of 0.0026.

Looking at differences between overall discographies, I’m going to exclude the first SNSD album, Girls’ Generation. This makes the contrast easier to calculate technically, and excludes the one album which didn’t perform internationally. This is justifiable since Girls’ Generation has the lowest rate of unique English, so I’m decreasing the statistical significance of the difference, if there is one.

Using the following linear combination of variables to compare the mean of the 4 later SNSD albums to the 4 2NE1 albums:

Girls’ Generation Oh! The Boys I Got A Boy Lion Heart 2NE1 (2009) To Anyone 2NE1 (2011) Crush 0 0.25 0.25 0.25 0.25 -0.25 -0.25 -0.25 -0.25

And we get the following contrast table:

Contrast Standard of Error Lower Bound Upper Bound t-value Degree of Freedom Probability -0.0551 0.02524 -0.1054 -0.00475 -2.18 74 0.0324

There is a significant difference (p-value = 0.0324) and we can conclude that 2NE1’s fraction of unique English words is 5.51% ± 5.03% greater than SNSD’s. If we compare their unique English means (23.7%± 4.06% for 2NE1 and 17.3% ± 1.7% for SNSD’s 4 later albums), the relative difference is ~ 37%, or more strictly, between 3.4% and 80%.

This result reveals some of the differences between 2NE1 and SNSD, without considering their musical differences. Even though English comprised about the same percentage of each groups’ total lyrics, 2NE1’s greater use of unique English words reflects their more international orientation.

That’s it for this update! Next post is going to be zooming in on 2NE1, now that their discography is final :(, analyzing how the lyrics of each member of 2NE1 compare.