In this post, I’ll be analyzing how often English appears in Girls’ Generation lyrics, and how that rate has changed in each of their major albums. I won’t go too deeply into analysis of how English is used, where considerable scholarship exists [1], [2], [3]. Instead I’ll discuss how much English is used.

I’ll be using 2 metrics: fraction of total words, calculated by dividing the total number of English words by the total number of words, and fraction of unique words, calculated by dividing the total number of unique English words by the total number of unique words. Words like “yeah”, “wow”, “London”, “oh”, ‘la-la”, and “oops” were excluded from analysis. I also excluded songs that were entirely in English or Korean in order to focus on hybridization. Including those songs doesn’t change the overall conclusions, however. Song lyrics were taken from https://colorcodedlyrics.com/.

Introduction to SNSD

The first Girls’ Generation (or SNSD) album, also called Girls’ Generation, was released in November of 2007 and received little attention outside of Korea, where it eventually sold over 125,000 copies [4]. Already, SNSD was using significant amounts of English in their music. The album was 12.9% English in terms of total word count and 8.39% English in terms of unique word count.

This pattern of the total English word percentage being higher than the unique English word percentage persisted in every album that was studied. For most songs, the verses tend to be almost entire Korean, telling a story using a more varied vocabulary. The choruses often have English words or phrases that get repeated multiple times while contributing fewer unique words, such as the chorus of “Baby Baby“:

Baby baby baby 살며시 다가가

작은 목소리로 가까이

너만 들리게 말해 줄게

Compared to albums that SNSD would release later on, “Girl’s Generation” had no songs that were more than ~20% English by total word count or by unique word count. This will quickly change in further releases.

The next album released by GG, “Oh!”, from January 2010, propelled Girls’ Generation to stardom. The album peaked at number 1 on the South Korean Gaon Music Chart and eventually sold over 400,000 copies [5]. “Oh!” also sold 36,000 copies in Japan, marking the expanding global reach of K-pop. Girls’ Generation went on to release several albums and singles in Japanese, which sold terrifically well, but were not analyzed in this study.

The lyrics of “Oh!” had markedly more English lyrics than “Girls’ Generation” by total word count, with a total English word percentage of 25.4% (vs. 12.9% for “Girls’ Generation”) and a unique English word percentage of 17.8% (vs. 8.39% for “Girls’ Generation”). As discussed in the Statistics section, these large increases cannot be concluded to be statistically significant. However, every subsequent SNSD album will follow this pattern: more total English than “Girls’ Generation”, but not much more unique English.

While “Oh!” still had 1 Korean-only song (영원히 너와 꿈꾸고 싶다), the song “Show! Show! Show!” was 60% English by total word count and 36% English by unique word count. It even had a stanza of spoken English over a techno beat. “웃자 (Be Happy)” was 44% English by total word count and 44% English by unique word count. It opens with the verse:

(Hey boys) No more worries

Put on a smile for me

(Hey girls) No pain no gain

Is what they say, right

(Ok) No need to stress

(Ok) Brush it off your chest

Through the rain there’s a brighter day



This is a sharp break from the short repeated English phrases of Girls’ Generation.

The October 2011 album, The Boys, achieved global attention, including in the U.S. market. More than 450,000 albums sold in Korea and over 36,000 copies in Japan [5]. Despite a heavy American promotional campaign, including a performance of their single, ‘The Boys’ on the David Letterman Show beside the stunned Regis Philbin [6], only ~1000 albums sold in the U.S. and Girls’ Generation failed to enter the American pop consciousness (Kim p. 45 [3]).

The Boys continued the heavy use of English, at approximately the same rate as Oh!. Total English word count increased from 25.4% to 26.1% and the unique English word count decreased from 17.8% to 15.2%. Total English was still going up, but not too rapidly.

It seems the managers and creators of SNSD, SM Entertainment, decided the amount of English on Oh! was the proper blend to propel them to success. The next 2 albums, 2013’s I Got A Boy and 2015’s Lion Heart did not have significantly different rates of English use compared to Oh! and The Boys, and all 4 had much more English than Girls’ Generation, in terms of total English word count.

Statistics

The following data was obtained by scraping lyrics off colorcodedlyrics.com and classifying each word as English, Korean, or excluded.

Album Total Average Standard Error (Total) Album Unique Average Standard Error (Unique) Girl’s Generation (2007) 0.129 0.0282 0.0839 0.0199 Oh! (2010) 0.254 0.0532 0.178 0.0421 The Boys (2011) 0.261 0.0396 0.152 0.0312 I Got A Boy (2013) 0.296 0.0406 0.149 0.0371 Lion Heart (2015) 0.401 0.059 0.207 0.0273

Here, error bars represent standard error of the mean, displaying the estimated accuracy of each mean, as opposed to the variation within each album. Looking at the blue columns, representing total English words, the latter 4 albums have much great greater values than Girls’ Generation. For the orange column, representing unique English words, values do not increase significantly album to album, but by the time Lion Heart is released in 2015, the unique English fraction seems to have increased notably.

At this point, a 1-way ANOVA test can indicate if there are significant differences between the means of each album’s English usage. First, for the unique words:

Unique Words ANOVA Degrees of Freedom Sum of Squares Mean Square F Value Probability Albums 4 0.08085 0.020213 1.8494 0.136 Residual 45 0.49184 0.01093

In this case, the p-value is greater than 0.05, so we cannot conclude that there is a significant difference between any of the means of unique English words per album. In other words, SNSD uses about the same frequency of unique English words in each of their albums.

Looking at the 1-way ANOVA test for total words, the story is different:

Total Words ANOVA Degrees of Freedom Sum of Squares Mean Square F Value Probability Albums 4 0.37689 0.094221 4.2188 0.005516 Residual 45 1.00501 0.022333

For total words, the p-value = 0.005516, so we reject the null hypothesis that there is no significant difference between the means of each album’s total English words. Therefore, it makes sense to continue the analysis for total words with further comparisons.

The Tukey method compares all pairs of means and controls for the increased rate of a false positive that comes with multiple comparisons. It allows us to find out if any SNSD significantly differ in their total use of English.

Total Tukey Test P-value (adjusted) Oh! vs. Girls’ Generation 0.3481 The Boys vs. Girls’ Generation 0.3162 I Got A Boy vs. Girls’ Generation 0.1399 Lion Heart vs. Girls’ Generation 0.0018 The Boys vs. Oh! 0.999 I Got A Boy vs. Oh! 0.9702 Lion Heart vs. Oh! 0.1645 I Got A Boy vs. The Boys 0.986 Lion Heart vs. The Boys 0.2236 Lion Heart vs. I Got A Boy 0.5336

The only comparison that achieves statistical significance is the difference between Lion Heart, the most recent SNSD album, and Girls’ Generation, the first album. The difference in total English use between the two is approximately 27.2% ± 19.1%.

If we want to capture the way in which Girls’ Generation differs from the rest of the discography, we need to use Contrasts. This allows us to compare Girls’ Generation to weighted average of other albums. I’m going to use the following contrast, which serves as a cumulative averages:

Girls’ Generation Oh! The Boys I Got A Boy Lion Heart 1 -0.5 -0.5 0 0 1 -0.333 -0.333 -0.333 0 1 -0.25 -0.25 -0.25 -0.25

The following statistics are obtained:

Cumulative Albums Contrast Standard Error Lower Value Upper Value t-value Degrees of Freedom Probability Contrast #1 Oh! and The Boys 0.1291 0.05956 0.009134 0.2491 2.17 45 0.0355 Contrast #2 Oh!, The Boys, and I Got A Boy 0.1419 0.05684 0.02743 0.2564 2.5 45 0.0163 Contrast #3 Oh!, The Boys, I Got A Boy, Lion Heart 0.1744 0.05505 0.06358 0.2853 3.17 45 0.0027

In order to correct for multiple comparisons, the Holm method with an α = 0.05 is used. Following the notation of the linked article, no k exists, so we reject all null hypothesis. There is a significant difference between the total use of English words in Girls’ Generation versus the average of later albums, and the difference becomes greater with each additional record.

To sum up, Girls’ Generation uses some English in their first album and a lot more total English words in later albums. Only Lion Heart features a significantly higher rate of unique English words.

Big thanks to colorcodedlyrics.com, where all the lyrics were taken from.

Bibliography:

[1] Lee, Jamie Shinhee. “I’m the illest fucka.” English Today 23.2 (2007): 54., http://search.proquest.com/openview/bd270e6ffa8d7a115b95883688b6cfa3/1?pq-origsite=gscholar&cbl=37468

[2]Baratta, Alex. “The use of English in Korean TV drama to signal a modern identity.” English Today 30.03 (2014): 54-60. https://www.cambridge.org/core/services/aop-cambridge-core/content/view/S0266078414000297

[3]Kim, Daisy. “Reappropriating Desires in Neoliberal Societies through KPop.” (2012).http://escholarship.org/uc/item/6p04h9tf

[4]:Sales of Girls’ Generation.

https://web.archive.org/web/20081203032946/http://www.miak.or.kr/stat/kpop_200809.htm

[5]Top 100 albums 2010-2015

https://web.archive.org/web/20160610110640/http://theqoo.net/square/199544391

[6]Girls’ Generation on Letterman. https://www.youtube.com/watch?v=exa5-P0_uKo