This September, 51,000 4-year-olds filed through the doors of 1,655 preschool programs across New York City. For many of them, everything may have been new: new teachers, friends, cubbies, and a strange phrase, “crisscross applesauce, hands in your lap.” The children may not have sensed it, but for the city, lots of things were new, too. These children are a part of a broad-scale experiment with historic implications, unfolding in real time. They are part of Year 1 of universal pre-kindergarten in New York.

Public preschool programs come with very small chairs and massive expectations. Mayor Bill de Blasio has staked a great deal on universal pre-K as a tool for reducing economic inequality. He made the initiative a key element of his campaign. In his first month of office, the mayor engaged in a closely watched standoff with the governor about funding, eventually securing $300 million from the state as the first installment in a five-year commitment. After a hectic summer of health and safety inspections, the program launched.

De Blasio has said he supports universal pre-K because he thinks it is the right thing to do, and he thinks it is the right thing to do because the data says so. In drumming up support for the initiative, he leaned on phrases such as “study after study” and “decades of academic research,” armoring what’s perceived as a warm and fuzzy cause — preschool — in cold, hard facts.

These are the facts: The economic returns on each dollar invested in pre-K can reach 7 to 10 percent, because pre-K participation has been shown to boost lifetime earnings and decrease participation in welfare programs. The pressure is on for universal pre-K in New York to meet, or exceed, these figures.

Some have opined that educating tens of thousands of 4-year-olds is a greater challenge than orchestrating a moon landing — and that’s before considering the challenges involved in evaluating the universal pre-k’s impact. The de Blasio administration wants to begin gathering and publishing findings right away. The city sees a political imperative for data, with elections and funding battles perpetually looming. To collect that data, it will have to look beyond traditional measures. Third-grade test scores won’t come soon enough, let alone data on high school graduation rates for the class of 2028. It will be a brave new world by then: The toddler stars of “You Poked My Heart” will be as old as Rebecca Black!

Yet mention data in a conversation about preschool, and temperatures begin to rise. Since 2001’s No Child Left Behind law, educational data has become synonymous with standardized test scores, and standardized test scores have become synonymous with high-stakes accountability. Even those policymakers who advocate for standardized testing agree it has no role for children younger than 8. Because of the ways children’s development unfolds, and because of the many factors that can affect a child on a day-to-day basis, assessment is a delicate operation.

The studies that de Blasio referenced throughout the campaign for universal pre-K were, on the whole, small-scale and longitudinal. The Perry Preschool Project is perhaps the most famous and influential research on the effects of high-quality pre-K. Psychologist David Weikart and a team of colleagues designed a randomized trial in which one set of 3- and 4-year-olds from Ypsilanti, Michigan, attended a preschool program and another set didn’t. Tracking these 123 children, Nobel prize winning-economist James Heckman and his team have found that pre-K alumni are more likely to hold a job and less likely to have a criminal record. Their earnings outpace the control group. A study of a more recent program, Abbott preschools in New Jersey, shows greater fifth-grade academic achievement among pre-K participants.

Evaluating pre-K in New York falls to the Center for Economic Opportunity (CEO) a branch of the mayor’s office that pilots and tracks programs with an anti-poverty focus. Wiley Norvell, deputy press secretary at City Hall, said the administration won’t be able to rely on long-term measures to evaluate the universal pre-K program. For one thing, the city is interested in knowing how the rollout is going and what policy changes can improve it. The programs that compose universal pre-K vary drastically: Some serve mostly English-language learners, some have community organizations as hosts, some offer full immersion in Mandarin, Spanish or Yiddish. The city wants to know as soon as possible which practices generate the best outcomes for students.

Teachers in New York’s pre-K classrooms will collect data on their students in coordination with the New York Prekindergarten Foundation for the Common Core, mostly through observation. The city Department of Education plans to also use rating scales to gauge things like classroom set-up and quality of teacher-student interaction. Norvell said that kind of data, along with data on building safety, legal standing and pedagogical capacity, is part of the city’s quality-control apparatus.

The city is also contracting out to collect and analyze further data on the effects of universal pre-K on children and families. This data collection will be a team effort: The CEO has hired the research firm Westat, which is partnering with a local team from Metis Associates. Researchers in applied psychology at New York University will provide technical assistance on the project and also assess the assessments, in a kind of infinite regress.

Jennifer Hamilton, a senior study director at Westat, outlined the evaluation design: Westat, working with Metis, will recruit a representative sample of 200 pre-K centers out of the city’s more than 1,600, earning consent from site directors and parents for one year. At all 200 sites, teachers will receive an online survey about program implementation. At a smaller number of sites, a team of assessors from Metis will conduct assessments of students. At a different subset of sites, Metis will run teacher and parent focus groups.

Because there will be so much data collected, certain indicators might make the program look like a smashing success, and other indicators might make it look like a flop. Within the data haul, pre-K spectators will find information on the development of the cognitive skills people most associate with preschool: the ABCs and 123s. Legislators, taxpayers and funders may be tempted to lean on that short-term data in the absence of data on longer-term educational attainment. But these aspects of the study — old fashioned “reading, ’riting, ’rithmetic”– might turn out to be the least valuable.

If results of the Westat study show that a child who participates does not master a skill like counting to 20 in her pre-K year, the child and her family may nonetheless receive various lifelong benefits from participation, as the Perry study indicates. Furthermore, children’s development is uneven and unpredictable, and many studies show that a child who learns to read at age 6 or 7 may match or surpass a child who learned at 5.

Looking beyond the three R’s, other short-term data points might be illuminating. For example, the city hopes to announce hours of support services delivered to English-language learners or students with developmental delays, identified as a result of their participation in pre-K.

Researchers will gather useful information about children’s self-regulation, impulse control, inhibitory response and executive function — the skills that allow children to tune out distractions and follow through with meaningful work. Cybele Raver and Pamela Morris, both of the NYU team, have selected various tests of these skills. The tests look and feel like games like Simon Says, though, of course, they occur with a stranger. In “pencil tapping,” when an assessor taps a pencil two times, the child taps once, and vice versa. The game challenges children to override their natural inclination to copy, Morris said.

Parent focus groups will also provide a set of data about universal pre-K’s impact on workforce participation and financial stability. A 2012 analysis by Child Care Aware of America, an organization that works to promote access to child care nationwide, found that in New York, as in 19 other states, child care for a 4 year-old can cost more than in-state college tuition. So, while long-term economic benefits of pre-K won’t be known for a while, short-term economic effects are all but certain.

Metrics on financial benefits to families, or the hours of early intervention children receive, or how well the children do at Simon Says, will do a great deal to fill in the picture about universal pre-K. But there are some questions, like whether pre-K will promote long-term economic equality, that only the passage of time can answer.

The public can soon expect dispatches about the success of universal pre-K: instant gratification. Meanwhile, 4-year-olds across the city will receive lessons in delayed gratification: learning to walk in line to the playground without pushing ahead, and to raise their hands and wait to be called on. And in some ways, to see the impact of universal pre-K, the grown-ups in charge may also have to learn to wait.

CORRECTION (5:40 p.m., Dec. 8): An earlier version of this story incorrectly said that James Heckman designed the randomized trial conducted as part of the Perry Preschool Project. Heckman did not design the study, but has played a role in analyzing its data.