There’s a lot of interest in data-related businesses and products everywhere these days, but it’s been particularly fun to see things accelerating in New York (where I’m based). Some purely anecdotal evidence: We had 50 very qualified data scientists show up at the recent hackathon we organized (as part of Big Data Week), despite the ungodly start time of 8am on a Saturday. The Data Meetup I host monthly went from 0 to almost 1,300 members in barely 5 months. General Assembly is starting a 10 week intensive program in data science. Microsoft just announced it chose to locate in NYC its new research lab, which includes plenty of data science brainpower (including machine learning specialist John Langford and Jake Hofman, formerly of Yahoo Research).

NYC is becoming a real “hub” for data startups. In fact, in my opinion data startups are becoming the next “layer” of the NYC tech scene — the way content and advertising startups (24/7, Doubleclick, Silicon Alley Reporter, etc.) were the foundational layer of “Silicon Alley” from 1995 to 2005, and the way social and e-commerce startups (Tumblr, Gilt, Foursquare, Etsy, Warby Parker, Rent the Runway, etc.) became the next building block that led to where we are today.

Due to their often intensely technical nature, data startups represent an interesting opportunity for NYC to develop more of a scientific and engineering-focused startup culture.

NYC has the key components of a thriving data startup ecosystem, including:

1) Customer demand: For those startups that sell to enterprises rather than consumers, NYC is where many of the key buyers are located – specifically, Wall Street and Madison Avenue, which have been among the most voracious and sophisticated users of data. It’s no accident that some of the key conferences in the space, such as GigaOm’s Structure:Data or Strata, take place in NYC (or have an NYC event in addition to their CA event) – there’s no better place for emerging vendors to show off what they’ve built to potential purchasers.

2) A relevant talent pool: in addition to solid engineering talent, data-driven startups need data scientists, who come in various flavors: statisticians, mathematicians, machine learning experts, programmers, etc. In part because there has been demand for this type of profiles for a while in financial services, there’s a fair concentration of them in NYC, and I’m seeing an increasing number of them making the jump to startup land. NYC has a number of prominent data scientists, including (but certainly not limited to), Drew Conway and Jake Porway (both of whom are co-founders of Datakind, f/k/a Data without Borders), Max Shron, Cathy O’Neil (who left D.E. Shaw for a startup, Intent Media), Gilad Lotan, etc. And of course, we have our very own emerging media star (deservedly so) in the person of Hilary Mason, most recently profiled here.

3) A data community: Whether it’s Data Drinks or meetups, there’s clearly appetite for data nerds to get together and geek out. Both the NYC Predictive Analytics meetup (organized by Alex Lin) and the NYC Machine Learning meetup (organized by Paul Dix and Max Khesin) have over 2,000 members, while the New York Open Statistical Programming Meetup has 1,700 members.

4) Investors with a deep interest in the space: As far as I know, IA Ventures is the only VC firm in the country that has an exclusive focus on data as an investment thesis (Accel’s big data fund is a little different, in that it’s a dedicated pool out of a much larger fund). Roger Ehrenberg and his talented team (Brad Gillespie, Ben Siscovick, Jesse Beyroutey) are having a tremendous impact on the data world in general, and in NYC in particular (about half of their portfolio is NYC-based). RTP Ventures is a new but very promising NYC investor in the space, with a focus on the infrastructure part of the big data world. Many of the main NYC investors are also “data friendly”, and have interesting data plays in their portfolio, as part of a broader focus: Union Square Ventures, Betaworks (see John Borthwick’s “data is the new plastic“), RRE, Lerer Ventures, Thrive Capital, kbs+ Ventures, but I’m sure I’m forgetting a number of others.

5) Universities that are willing to get involved: The key machine learning centers in the country may be Carnegie Mellon, MIT and Stanford, but Columbia is strong as well, and most importantly, there are some terrific professors who are both academically prominent and deeply involved in the NYC tech scene – in particular Chris Wiggins (in addition to being a prominent machine-learning expert, Chris is also the co-founder of HackNY and has mentored many of the data scientists currently employed in NYC startups) and Tony Jebara (who runs the Columbia Machine Learning Laboratory and has also founded and advised several startups including Sense Networks and Bookt). NYU has some leading authorities the data-intensive field of physical computing and Internet of Things: Tom Igoe and Dan O’Sullivan. Medium term, Cornell may be able to bring some additional academic expertise to NYC (for example, it is home to Joachims Thorsten who is arguably one of the top SVM researchers).

6) A crop of promising data startups:

A growing number of NYC based startups offer data and predictive analytics solutions – starting perhaps with Opera Solutions, which very people in the NYC tech scene had heard about until it raised a whopping $84 million in September 2011 from Silver Lake and Accel KKR (Opera Solutions employs some 150 data scientists, out of 400 employees). In addition, NYC startups have been building all sorts of interesting data and analytics products for social media (Bitly, SocialFlow, Kno.des), news (Visual Revenue), finance (Dataminr), music (NextBigSound, which is moving to NYC), sports (Numberfire, and our own Bloomberg Sports) and of course advertising and marketing (Sailthru, Collective[i], Custora, PlaceIQ, YieldBot, Mediamath, m6d, 33across, Clickable, Buddy Media, etc.).

While we’re nowhere near the Silicon Valley on this front, it’s great to see more big data infrastructure companies in NYC – some like 1010Data largely predate the whole big data craze; others have been appearing more recently, including FluidInfo, CrowdControl, Mortar Data (which is moving to NYC), Datadog, and of course 10Gen, whose MongoDB noSQL database is quickly becoming a must-have for a number of data-driven companies.

Finally, several exciting NYC startups are focused on the application of data to create disruptive products in various industries, such as education (Knewton) or consumer finance (Billguard, Bundle).

The fact that NYC recently saw a couple of acquisitions of data startups – Chris Dixon’s Hunch and Jordan Cooper’s Hyperpublic – doesn’t hurt either.

7) A data-centric business culture: perhaps it is because some of the key historical entrepreneurial successes in NYC were data companies (Bloomberg LP, Nielsen); or perhaps it is a reflection of the demand of East Coast investors who arguably tend to be very focused on metrics and business models (as opposed to pure vision)… but somehow, as far as I can tell, there’s always been a real culture around data and analytics in NYC. Now increasingly, I hear CEOs of NYC startups present their companies as data companies, even those you wouldn’t necessarily suspect (recent examples include Dennis Crowley of Foursquare and Yaron Galai of Outbrain). In addition, NYC startups have been quick to build data science teams, including many that don’t explicitly position “data” as a key part of their value proposition: Etsy, Gilt, The Ladders, GetGlue, Foursquare, Tumblr all have data scientists on board.

All of this is just a start, and I’m excited to see how it all progresses in the next few months and years.