The University of Waterloo is well known for its lack of social life and difficulty of finding romantic relationships. Like many other Waterloo CS majors, I wouldn’t be able to find a girlfriend, even if my life depended on it.

Some people feel love is unquantifiable, and you should “just be yourself”. Well, I’m UW Data Scientist, so I respectfully disagree. Why not learn how to find a girlfriend with…😎 machine learning?

I made an app to estimate your probability of finding a girlfriend! #innovation #sideproject #wow

Methodology

The question for this study is: what are the attributes that tend to correlate with having a girlfriend among male Waterloo students? It is commonly assumed that having a high paying job will make you more attractive. Physical characteristics like height and muscle may also play a role. We try to identify which attributes are the most predictive, and which are mere assumptions not supported by data.

Off the top of my head, I came up with the following attributes:

Dating (target variable): person has a girlfriend, or had one for at least 6 months over the last 5 years

International: person is an international student

CS: person majors in CS, SE, or ECE

Career: person is successful in academics and finds “good” jobs for internships

Interesting: person has interesting things to talk about

Social: person is outgoing and tries to meet new people

Confident: person appears confident

Tall: person is taller than me (>175cm)

Glasses: person wears glasses

Gym: person regularly works out at the gym or plays sports

Fashion: person cares about wearing nice clothes

Canada: person mostly lived and worked in Canada for the last 5 years

Asian: person is East Asian ethnicity

You might notice that some of these are quite subjective — what qualifies a person as interesting? In these cases, I tried to assign 1 to about half the population, and 0 to the other half. Therefore, we’re measuring the relation between my own (biased) perception of other people’s interestingness to their ability to find a girlfriend.

Yeah, if you expected a statistically rigorous study, you can stop reading now.

To collect data, I tabulated every person I could think of and rated them either 1 or 0 in each of these attributes. In this way, the dataset has N=70 rows. If you’re a guy, go to Waterloo, and talked to me in the last 2 years, then you’re probably included.

Analysis

First, we perform Fisher’s exact test on the target dating variable against each explanatory variable. The three variables that are the most significant are:

Gym — guys who go to the gym or play sports regularly are more than twice as likely to have a girlfriend (p-value = 0.02).

Glasses — guys who don’t wear glasses are about 70% more likely to have a girlfriend than guys who do (p-value = 0.08).

Confidence — guys that appear confident are more likely to have a girlfriend (p-value = 0.09)

Muscular and confident guys are attractive, as expected. I was quite surprised by the large effect of glasses, and wondered if it was an indication of something else, like general nerdiness. So I looked for more careful studies and confirmed that indeed, the majority of people consider glasses to be unattractive for both genders.

Some variables may be slightly predictive of dating success, but it’s hard to say for sure due to small sample size:

International students have better success with dating than domestic students

Asians men have worse chances with dating than other races

Controlling for other factors, guys in CS seem not to be at a disadvantage, despite the lack of women

The rest of the variables (height, career/academics, interestingness, sociability, fashion, Canada/US) have not much correlation with dating. Sorry, but even if you go to Facebook in Menlo Park in 4A, you will still not have a girlfriend.

Full results of this experiment:

Variable: international

N(international)=10, N(~international)=60

p(dating|international)=0.60, p(dating|~international)=0.38

p-value=0.299 Variable: cs

N(cs)=56, N(~cs)=14

p(dating|cs)=0.45, p(dating|~cs)=0.29

p-value=0.368 Variable: career

N(career)=46, N(~career)=24

p(dating|career)=0.43, p(dating|~career)=0.38

p-value=0.799 Variable: interesting

N(interesting)=34, N(~interesting)=36

p(dating|interesting)=0.47, p(dating|~interesting)=0.36

p-value=0.467 Variable: social

N(social)=29, N(~social)=41

p(dating|social)=0.45, p(dating|~social)=0.39

p-value=0.806 Variable: confident

N(confident)=37, N(~confident)=33

p(dating|confident)=0.51, p(dating|~confident)=0.30

p-value=0.092 Variable: tall

N(tall)=26, N(~tall)=44

p(dating|tall)=0.46, p(dating|~tall)=0.39

p-value=0.619 Variable: glasses

N(glasses)=41, N(~glasses)=29

p(dating|glasses)=0.32, p(dating|~glasses)=0.55

p-value=0.084 Variable: gym

N(gym)=22, N(~gym)=48

p(dating|gym)=0.64, p(dating|~gym)=0.31

p-value=0.018 Variable: fashion

N(fashion)=17, N(~fashion)=53

p(dating|fashion)=0.41, p(dating|~fashion)=0.42

p-value=1.000 Variable: canada

N(canada)=31, N(~canada)=39

p(dating|canada)=0.42, p(dating|~canada)=0.41

p-value=1.000 Variable: asian

N(asian)=59, N(~asian)=11

p(dating|asian)=0.37, p(dating|~asian)=0.64

p-value=0.181

Next, we examine the correlations between the variates; this can help identify incorrect model assumptions. Red means positive correlation, blue means negative correlation. We only show correlations that have statistical significance < 0.1, so most pairs of variates are blank.

It appears that {having girlfriend, appearing confident, going to the gym, not wearing glasses} are all mutually correlated.