All the lonely people Each day, countless Americans pass one another by, and share a moment. Occasionally, they take a chance and post a missed connection. Here is the sum of 10,000 such stories.

New York subway ride, Derek Key

As much as I've always wanted to be the subject of someone's missed connection, I doubt I'm suited for it. Aboard public transport, my go-to facial expression is a stern, furrowed brow. "Solving the crisis in the Middle-East, no doubt," my fellow passengers probably sympathize, contemplating the task of fixing the world's thorniest geopolitical dilemma from aboard the subway, and letting me be. I'm also painfully bad at eye contact with strangers — I have no issue with it myself, but having inherited a strong aversion to making others feel uncomfortable , can't shake the apprehension that a long, hard stare will make a stranger uneasy. The chances of a missed connection occurring on buses or subways, where so many seem to kindle, are therefore slim. Not that any of this has stopped me from posting missed connections myself. I can't recall what I wrote, but after some virtual digging found the subject lines of my posts buried years deep within my inbox. First, there was



"Girl in blue/ white floral dress on N train headed uptown at 9pm Monday" - m4w (missed connection)



Perhaps I was emboldened by the fact that I was on a trip to New York, or because she'd been looking at me for so long that a failure to reciprocate would border on discourtesy. We'd exchanged obvious glances, after which I got off at 57th street, and resolved to "put myself out there"; "out there" was the internet, and ended up going nowhere at all. The second,



"Cute brunette girl reading who shared my table at The Bean on Monday" m4w (missed connection)



had a wide, easy smile that left me flustered after she asked if I'd mind her sitting at my table. We'd both picked up paperbacks from The Strand (hers, I think, was a collections of Camus' plays), the much-loved bookstore two blocks south of New York's Union Square, and sat across from each other, sneaking glances and diligently re-reading the same paragraphs over and over . In the words of that great romantic, Holden Caulfield, this potent combination of coffee, good book, and attractive table-mate just about killed me, and later that evening I posted a missed connection. This, too, failed to lead to romance — either both of these girls had missed my posts, or I'd mistaken prolonged eye contact for heady, window-to-the-soul type stuff. That is, after all, how missed connections happen: strangers pass one another by, and, on the rare occasions that one feels a tinge of romance, or mistakes the other's studying a subway map behind their head for a charged and meaningful gaze, a posting goes up on Craigslist. That there exists a digital town square where lonely hearts can declare their feelings without fear of public rejection is both lucky and improbable, but the hit rate, by all accounts, is low. I've yet to hear any first-hand stories of missed connections that have resulted in anything more than a date or two before the romance petered out. Still, if it seems strange that a quirky section of a website which prides itself on an aggressively dialup-era design has gained such traction in popular culture — all in spite of the scarce likelihood of finding love — look no further than the motivations of gold miners or oil prospectors. Each successive romantic relationship is a failure until it isn't, and the lousy odds of forging a real connection don't have much impact on our inborn optimism. It may have been my own failures to connect which spurred me to take a closer look at the habits and behaviours of other posters. So too did the odd voyeuristic appeal of the whole missed connections section, putting those private and vulnerable declarations of affection into a ruthlessly public, yet anonymous, context. Finally, I suspect that it was also rooted in my old psych grad school mentality, making the promise of a large dataset, untainted by the spectre of observer effects , too tempting to ignore. Who were these people that posted hundreds of messages each day? To see, I gathered the missed connection postings from the largest U.S. cities, and got to work . Over the course of January, I collected over 10,000 missed connections from New York, LA, Chicago, Houston, Philadelphia, Phoenix, San Antonio, San Diego, and Dallas. I analyzed the language; the people who looked, and whom they looked for; the days they posted, the words they used, their ages, and a dozen of other points of comparison . And so, without further ado: the missed connections.

N ew York seems to be a city manufactured for missed opportunities to meet strangers. There is the pervasive reach of the subway; the relatively scarce number of drivers; the neverending throngs of people. There is, furthermore, ew York seems to be a city manufactured for missed opportunities to meet strangers. There is the pervasive reach of the subway; the relatively scarce number of drivers; the neverending throngs of people. There is, furthermore, New York's take on big-city loneliness that so many newcomers discover: long days at work, bookended by sizeable commutes that take New Yorkers to the far reaches of the vast metropolis. You won't be heading to drinks in Brooklyn if you finish work at six in midtown and catch the homeward-bound 1 train to Washington Heights, whether or not you're savvy enough to change from the express to the local at 96th street. Indeed, of all the cities, New York had the highest number of missed connections, and initially, it seemed that more people meant more missed opportunities for romance. L.A., however, dispells that notion (mouse over the circles in the map below to see each city's population and missed connections count). The second largest city in the U.S., L.A. sits at the bottom of the tally, with only a few hundred missed connections posted during January. I should note, here that if you're anything like me, your internal sleuth rears up, and you begin to postulate that this is because Angelenos drive everywhere, and are never within a 10 ft radius of any prospective strangers. "Big if true", as the media types like to say on Twitter, but the number of drivers doesn't relate to the number of missed connections; in fact, nothing singular really seems to.

Posts and population for nine U.S. cities in January, 2015

To get a sense of the true likelihood of landing a missed connection in the cities I've mentioned, I worked out the number of postings for every 10,000 inhabitants. The news, for lonely New Yorkers, isn't especially heartening: NYC sits on the tail end of the rankings, with just over 3 missed connections per 10,000 inhabitants; Los Angeles comes dead last. The cities where you're most likely to hear from that alluring stranger are Dallas, Phoenix, and San Diego, but even then, with Dallas' first place ranking of some 12 missed connections for every ten thousand people, things are pretty bleak. Still, things could be worse: the Whitlams, an Australian indie-rock band, once pronounced, "She was one in a million, so there's five more just in New South Wales." Against that backdrop, your chances of a missed connection are orders of magnitude more promising.

N umbers occupy an unusual position in our lives. Beneath their patina of impartiality lies the fact that the language of integers and equations can be skewed and corrupted as easily as that of nouns and adjectives — arguably more so, since so many of us claim to have no heads for math, and withdraw from critically engaging with numbers. Words, meanwhile, like tennis balls wallopped by groundstrokes, are exchanged with ferocity and abandon. There is, too, the commonly-held assumption that the ability to give a mathematical description to some event or phenomenon is, by its very nature, good and necessary, and that whatever mathematical catch of the day emerges after fishing for data is worth serving: figured out some average or another? Shout it from the rooftops! Unlike thoughtless writing, which is rightly met with swift opprobrium, such thoughtless numbers often go uncontested.

Thus, I hesitated to delve into posting times. Did they mean anything? I'd had doubts. I first assumed that people would be tripping over themselves to post, raising the chances that the object of their affections would see their ad, but the more missed connections I read, more doubt crept in. Some people, with evident exhiliration, posted minutes after the fact; others waited a day or two before giving in and throwing caution to the wind; others, years; others still, wistful decades. These latter tended to be the most affecting: a self-aware note by an English professor whose memory of a chance encounter with a woman he'd never see again remained undimmed for 40 years, or a teacher who shared an indelible kiss on a mournful day at the Coney Island acquarium many years past (Sophie Blackall has written the book on, or rather of, missed connections , illustrating dozens in sharp and charming style, and the latter post may be found therein). Eventually, I came to the conclusion that at the very least, the times should reveal whether or not people took time off from their workdays to indulge in a bit of romantic daydreaming.





Times of missed connections postings

The times and days when people post, depicted in the heat map above, suggests that they do (hover over the squares to see the average number of missed connections at that time). Throughout the U.S., the most lovelorn days seem to be Mondays, from early to late evening. There is, nevertheless, a good deal of variation from one city to another: Angelenos hardly post, and the few relative spikes in postings occur almost exclusively towards the start of the week; Houstonites, meanwhile, try their hand at romance on early Tuesday afternoons; Dallas, with the highest concentration of missed connections, has an impressive spread from Monday to Friday, with its inhabitants posting throughout the workday and late into the evening. Those solitary nighttime yearnings strike me as the most genuine and unadorned, bringing to mind the words of Philip Larkin, that eminent English chronicler of death and loneliness, who writes of waking up "in soundless dark" and thinking: "Most things may never happen: this one will, and realisation of it rages out in furnace-fear when we are caught without people or drink." Who better to reach out to, in those desperate moments, than an idealized stranger representing the sole bulwark against the hereafter?

A

recent study replication suggested that while women could detect flirtation with relative accuracy, men tended to label all interactions with women as "she wants me". At the risk of making unwanted advances, I've always leaned in the opposite direction, and had to admit that this degree of amour propre left me somewhat envious. I began to wonder whether this unwavering image of oneself as Casanova would be reflected in the posting times of men and women. As far as I could tell, it was: women tend to start slowly, leaving their posts until they clocked off work (with a responsible peak around lunchtime). Men, meanwhile, seem to have little interest in workplace propriety, and began their lovelorn postings in earnest soon after lunch is over. Posting habits of men and women, by hour And what of the sheer quantity of posts? Did men outnumber women as radically as my original numbers first suggested? Yes. A resounding, unequivocal, yes. Below, I've visualized the ratio of men to women in each city; I've also shown the number of posts by men to those by women. The degree to which men outnumber women on Craigslist is staggering, particularly on the west coast; maybe L.A. men are especially sensitive to the possibility of a chance encounter, or perhaps they're just generally more optimistic about all those hours in the gym paying off, but in any case Los Angeles men post 5.3 missed connections for every post made by a woman. New Yorkers, for their part, are much more egalitarian, with men's posts outnumbering those of women by a relatively modest ratio of 3:1.

Male : female ratios

I n an exceptionally insightful to on the colour scale, and a small but impassioned band of people who seek to reconnect with former neighbours from years back in hopes of finding a foot mistress, or offering themselves into indentured sexual labour. Star-crossed lovers, these are not. Below, you can find the most commonly used phrases used by each group (keep in mind that in a category like Women Seeking Women, which had few posts to begin with, even an infrequently occurring set of words may get flagged if used, by chance, on a couple of occasions). The larger, light blue circles represent the men and women who post the connections; each contains two darker circle that represents whom they're looking for. The white, innermost circles indicate the most commonly used phrases. n an exceptionally insightful New Yorker piece , Nick Paumgarten, discussing online dating profiles, remarks that "demonstrating the ability, and the inclination, to write well is a rough equivalent to showing up in a black Mercedes" — partly, he says, because "males know that the best way to get laid is to send messages to as many females as possible. To be efficient, they put very little work into each message and therefore pay scant attention to each woman’s profile." By this logic, I expected people wanting to impress a potential crush (only one, in this case, means that what I've dubbed Paumgarten's Lay-Maximization strategy should no longer apply) to put some hard yards into their messages; no Great American Novels, to be sure, but something witty, teasing, playful; in other words, a verbal equivalent of a black Mercedes. If you've had a chance to sample a few of the selections on offer in the Missed Connections section, you are, by now, smirking at my naïveté. Much like real life, the board is populated by a mixture of occasional gems filled with earnest feeling and self-reflection, a mass of posts whose allure ranges fromtoon the colour scale, and a small but impassioned band of people who seek to reconnect with former neighbours from years back in hopes of finding a foot mistress, or offering themselves into indentured sexual labour. Star-crossed lovers, these are not. Below, you can find the most commonly used phrases used by each group (keep in mind that in a category like Women Seeking Women, which had few posts to begin with, even an infrequently occurring set of words may get flagged if used, by chance, on a couple of occasions). The larger, light blue circles represent the men and women who post the connections; each contains two darker circle that represents whom they're looking for. The white, innermost circles indicate the most commonly used phrases.

Most commonly used phrases in missed connection posts

S pend even a brief moment on social media, and you'll quickly notice that many of the differences between men and women that have stood as established beliefs in North American society are fast eroding. Much of the time this occurs as a result of a critical reconsideration of their roots; on occasion, it is a consequence of the sheer momentum that the progressive movement has gained in recent years. That sex-based disparities in language exist — whether resultant of nature or nurture — is clear enough, and I grew curious about the manner in which any such differences manifested. So, too, I wondered about the ages of the posters. Sophie Blackall, the illustrator of the missed connections book I mentioned earlier, noted that posting such ads is largely an under-35 game, and I began to wonder whether this was, indeed, the case. pend even a brief moment on social media, and you'll quickly notice that many of the differences between men and women that have stood as established beliefs in North American society are fast eroding. Much of the time this occurs as a result of a critical reconsideration of their roots; on occasion, it is a consequence of the sheer momentum that the progressive movement has gained in recent years. That sex-based disparities in language exist — whether resultant of nature or nurture — is clear enough, and I grew curious about the manner in which any such differences manifested. So, too, I wondered about the ages of the posters. Sophie Blackall, the illustrator of the missed connections book I mentioned earlier, noted that posting such ads is largely an under-35 game, and I began to wonder whether this was, indeed, the case. In the chart below, I've used the axes to depict the length of posts and the average ages of the posters. Each of the four groups — men seeking men, men seeking women, women seeking men, and women seeking women — are represented by circles of different colours, and can be toggled on or off by clicking on the legend in order to get a clearer picture of the spread. Hovering over any of the circles will show you all other groups in the same city. The size of each circle, as in the first chart on this page, represents the number of missed connections posted.

Lengths of posts and ages of posters across cities

While women tend to post missed connections less frequently, the posts they write are often longer than those of men. From the scatterplot above, you can see that women's posts, regardless of whom they're after or the city they're in, are more wordy. Men looking for men write the briefest messages, with straight men writing slightly more verbose ones; straight women write more still. The most, it seems, is written by women looking for other women. It's key to note, however, that whereas men are fairly consistent, regardless of their location, women in different cities can differ by vast amounts, and are much less uniform in the amount that they write. Women looking to connect with strangers also tend to be younger than their male counterparts, with mean ages in their mid- to late-20s, while men posting missed connections tend to be aged between 33 and 37. I struggle to pinpoint the reason for this discrepancy, but can offer a measure of optimism: Jonathan Soma, the esteemed head of Columbia's Data Journalism program, mapped out the number of singles in U.S. cities, and found that all through youth, until about 35, single men overwhelmingly outnumber single women. By the mid-30s, however, the balance begins to tip in men's favour, and single women begin to outnumber single men (there is, unfortunately, no data on the proportion of singles by sexual orientation, so the following qualifier applies uniquely to straight men and women). Those thirty-something men looking for women are, by the time they see that striking stranger, more likely to find her single than they would have previously. Twenty-something women looking to cement that missed connection, too, are more likely to find them available; considering the research on men's overperceptions of sexual advances, they're also unlikely to have to do much apart from looking across a crowded subway platform for this hypothetical Adonis to hop on Craigslist after he gets off the subway and check his luck. Thus end the numbers. Now that the means and medians have been plotted and graphed, where does a detailed analysis leave us? More confident in our cartographical knowledge of the boundless romantic landscape? I'm not sure. I do know that cynicism deepens as the years wear on, and the idyllic promise of simple, unencumbered romance begins to ebb. Numbers, as numbers in a vaccuum so often do, provide cold comfort. In the end, we are altogether solipsistic accountants — whatever the romantic records of friends, or family, or other lonely souls, the sole tally that counts is our own.

Adam Phillips, a psychoanalyst and author, touched on these private balance sheets in his 2013 book, Missing out: In praise of the unlived life. "What we fantasize about, what we long for," he writes, is "...the parallel life (or lives) that never actually happened, that we lived in our minds, the wished-for life (or lives): the risks untaken and the opportunities avoided and unprovided." Little wonder that we treasure those moments of potential connection that held immeasureable promise, preserved in the amber of memory. Little wonder, too, that seeing the unlived lives of others makes our own encounters all the more vivid.

And these other lonely people, who are they? The swarms of on-their-way men and trickles of young women, all reaching for that stranger whose promise is yet unmarred by experience. They seek them out in the solitary evenings and dreary afternoons, the shy office temp and the giddy straphanger, writing of the eye contact they'd made with you last night, and doing their best to mask their bashfulness, offering an "I know it's a long shot, but..." before scraping up enough shaky courage to ask to see you over a coffee, or a drink, or something more visceral and immediate. They — the cautious, incidental, and maybe-just-this-once group of optimists, writing to their nurse, their mechanic, their childhood friend. All, like Camus's Sisyphus, offering the distinct hope that perhaps this last time will be enough.

New York subway, Kyrre Gjerstad

About me

Greetings, internet friend! My name is Ilia. I like words and data and interactive ways of telling stories. Before journalism, I used to do psych stuff (we were going pretty steady for a bit). Want to work with me? Have a fun idea for a project? Shoot the breeze about books/movies/whatever you're into over a coffee? I love all of those things ! Hit me up on Twitter or email (ilia.blinderman@gmail.com)!

Methodology





I'd collected the posts in a SQL database, which I then cracked open using R. Having spent some time in academia, I'd dealt with SAS and SPSS, and after working with R, am amazed by both how intuitive and user-friendly it is. If you're to get your hands dirty with data, I can't recommend using R enough (if you're working with a dataset that requires less bush-bashing, I'd also suggest getting a hold of



It is important to add a number of caveats regarding the integrity of the data before we proceed. First, it is impossible to say whether or not some people are serial missed-connections posters, thereby boosting the numbers of posts but not the number of people looking. It is also certain that a number of the posts I included in the analysis were jokes, spam, or otherwise unreflective or people tendencies to seek genuine missed connections. Nevertheless, I would make the assumption that these formed a small portion of the corpus, and are unlikely to influence the findings. Readers would also be wise to keep in mind that these results may be influenced by the time of year, since all posts originated in January, and may not be as representative of other cities as would be ideal.



Having cleaned the data, it was time to dig into the visualization. I'd decided that I wanted to keep things as simple as possible, so settled on using D3/CSS/HTML (I'd resolved to stay away from jQuery for all the visualization work, but wanted to use tooltips on the main site, and couldn't stay mobile-friendly without it). This, for the large part, worked because I'd found some great button css code, which allowed me to remain lightweight. Considerations for each of the pieces of viz occurred on a case-by-base basis: There are, I think, three separate threads necessary to weave together this kind of piece. The first is getting the data; the second is depicting it in some visual form; the third is building a narrative. In this case, this also happened to be the order in which this piece progressed. To source the data, I'd collected some 12,000 missed connections over the course of January 2015 for the nine largest cities in the U.S. I'd have done the top 10, but the Bay area wasn't aggregated in the same way the other cities were, so I omitted it for fear of screwing up its borders. I'd began to tinker with python's BeautifulSoup to source the posts, but soon came across Brian Abelson's terrific post on the topic (shoudlers of giants, etc.). I'd considered automating this procedure, but eventually settled on running this script manually each day, because I was worried about odd errors creeping in (these did, a couple of times, and I'd either had to tinker with the code or deal with connection issues).I'd collected the posts in a SQL database, which I then cracked open using R. Having spent some time in academia, I'd dealt with SAS and SPSS, and after working with R, am amazed by both how intuitive and user-friendly it is. If you're to get your hands dirty with data, I can't recommend using R enough (if you're working with a dataset that requires less bush-bashing, I'd also suggest getting a hold of OpenRefine as an alternative; most of my friends prefer using the pandas python library, but it's never felt quite as wieldy as the RStudio IDE; YMMV). I began by organizing the database into a dataframe, and cleaning the hell out of my data. There are a few great packages that I'd recommend: whenever you're dealing with multiple levels of variables interacting with each other, dplyr shows itself to be invaluable. Stringr is terrific when it comes to working with text. RJSONIO comes in handy when converting data frames to .json files. Additionally, I'll note that R provided a painless way to run some exploratory analyses, both visual and numerical. All of the linguistic analyses I included were done using python Natural Language Toolkit, and I highly recommend the (free!) book, available here It is important to add a number of caveats regarding the integrity of the data before we proceed. First, it is impossible to say whether or not some people are serial missed-connections posters, thereby boosting the numbers of posts but not the number of people looking. It is also certain that a number of the posts I included in the analysis were jokes, spam, or otherwise unreflective or people tendencies to seek genuine missed connections. Nevertheless, I would make the assumption that these formed a small portion of the corpus, and are unlikely to influence the findings. Readers would also be wise to keep in mind that these results may be influenced by the time of year, since all posts originated in January, and may not be as representative of other cities as would be ideal.Having cleaned the data, it was time to dig into the visualization. I'd decided that I wanted to keep things as simple as possible, so settled on using D3/CSS/HTML (I'd resolved to stay away from jQuery for all the visualization work, but wanted to use tooltips on the main site, and couldn't stay mobile-friendly without it). This, for the large part, worked because I'd found some great button css code, which allowed me to remain lightweight. Considerations for each of the pieces of viz occurred on a case-by-base basis:

Map of posts and population - I had two goals for this map. The first was to give the audience a sense of the fact that although city populations showed noticeable variations, differences between the quantities of missed connections across cities were much less pronounced. More important, however, was to give people a sense of how little population or the number of posts actually had to do with the more important statistic — the posts per city resident. I thought that transitioning from the absolute quantity view, displayed on the map, to the ranked view coupled with the scales, would make that point easier to digest.

I had two goals for this map. The first was to give the audience a sense of the fact that although city populations showed noticeable variations, differences between the quantities of missed connections across cities were much less pronounced. More important, however, was to give people a sense of how little population or the number of posts actually had to do with the more important statistic — the posts per city resident. I thought that transitioning from the absolute quantity view, displayed on the map, to the ranked view coupled with the scales, would make that point easier to digest. Heat map of posting times- Visualizing data is an exercise in judgment calls. Often, these decisions run to simple colours and sizes, but occasionally decisions at a more fundamental level arise. In this case, I was concerned with balancing the details of the visualization with the broader point I was trying to communicate (i.e., should I include a heat map of posts across every day during the month, or is this too granular a portrayal for readers to come away with a broader message?). I had settled on a time period of a week, which seemed to strike a good balance between detail, providing a close look at what happens, on average, during each hour of the day, and a broader message.

Visualizing data is an exercise in judgment calls. Often, these decisions run to simple colours and sizes, but occasionally decisions at a more fundamental level arise. In this case, I was concerned with balancing the details of the visualization with the broader point I was trying to communicate (i.e., should I include a heat map of posts across every day during the month, or is this too granular a portrayal for readers to come away with a broader message?). I had settled on a time period of a week, which seemed to strike a good balance between detail, providing a close look at what happens, on average, during each hour of the day, and a broader message. Male and female posting habits, per hour- Bar graphs are one of the most effective ways to communicate information visually, and this seemed to be a succinct way to demonstrate a difference between two broad sets of habits: that of men, to post quite steadily during waking hours, and that of women, which consisted of a short burst of activity during the day, and a steady increase in posts until bedtime.

Bar graphs are one of the most effective ways to communicate information visually, and this seemed to be a succinct way to demonstrate a difference between two broad sets of habits: that of men, to post quite steadily during waking hours, and that of women, which consisted of a short burst of activity during the day, and a steady increase in posts until bedtime. Male to female ratio- Divergent bar graphs are underrated when it comes to contrasting differences between data points across a single variable type. In this case, it seemed to be a good method of showing the tremendous disparities in ratios.

Divergent bar graphs are underrated when it comes to contrasting differences between data points across a single variable type. In this case, it seemed to be a good method of showing the tremendous disparities in ratios. Zoom-circles- If there's an ideal way to visualize words, I've yet to come across it. Mike Bostock, the almighty yahweh of D3, has created what seems to me to be one of the better ways, using concentric circles. In this case, my data fit the format, with the lowest levels containing the most commonly used phrases, and the top forming the groups. There is, admittedly, an issue with logical consistency: the top layers represent categories, and the lowermost level allows readers to view the phrases within each category. Although in the original the circle size is representative of some other variable, I didn't include it here; I may add this a little later.

If there's an ideal way to visualize words, I've yet to come across it. Mike Bostock, the almighty yahweh of D3, has created what seems to me to be one of the better ways, using concentric circles. In this case, my data fit the format, with the lowest levels containing the most commonly used phrases, and the top forming the groups. There is, admittedly, an issue with logical consistency: the top layers represent categories, and the lowermost level allows readers to view the phrases within each category. Although in the original the circle size is representative of some other variable, I didn't include it here; I may add this a little later. Scatterplot- Scatterplots are, perhaps, the best way to represent large numbers of characteristics for data points simultaneously. In this case, I used the axes to depict the mean ages of the posters (no real outliers here, so mean seemed appropriate) and the median post lengths (some posts were exceedingly long, and used the median because I didn't want outliers to skew the figures), of their posts; I'd used colour to depict the various groups of posters and whom they sought; circle size to demonstrate the absolute number of posts; and finally, on-hover highlights to allow readers to compare different groups in the same city. The fact that this was an interactive graphic also allowed me to both communicate the broader point that most posts were clustered around certain intervals, and the noteworthy differences between them, avoiding the dreaded "non-zero axis" issue.

Credits