In November 2010, I visited Harrison District 2, a low-income, largely Latino school district in Colorado Springs. As part of a plan to evaluate and pay all teachers according to how well they “grow” student achievement, the district had just rolled out its first-ever testing program in the visual arts, music, and physical education—a program that has since become a national model.

On the first-grade art exam, students were asked to write a paragraph about a Matisse painting. In second-grade gym class, a pencil and paper test required students to “Draw a picture of how your hands look while they are catching a ball that is thrown above your head.”

The program, launched by crusading superintendent Mike Miles (who has since been appointed to a much more high-profile job leading the Dallas public schools), was not immediately embraced. Some Harrison art teachers complained about being assessed on their students’ writing skills, and gym teachers balked that they were now expected to teach drawing. This past school year, Harrison administrators responded to those concerns by showing teachers exam questions ahead of time, and allowing them to give feedback on whether the reading level and content expectations were appropriate for their students. (Administrators say complaints from teachers subsequently fell.)

Harrison supplements its paper exams with what testing experts call “performance-based assessments”: In elementary grades, phys-ed students are asked to show they can dribble a basketball and juggle two scarves; high school music students perform three songs; art students must demonstrate the difference between a one- and two-point perspective drawing. In all these courses, tests require students to write about their learning in full sentences and paragraphs, using subject-specific vocabulary.

Assessments like these are controversial. Many parents don’t like the idea of their already over-tested children taking even more exams, particularly in subjects like art and gym, which are usually thought of as relaxing breaks in an otherwise stressful school day. Bob Schaeffer of FairTest, a watchdog group, calls state standardized assessment in the arts “fundamentally ludicrous. Testing knowledge of terms used in artistic disciplines, as some have suggested, is not assessing the arts, but rather how well students memorize and regurgitate specialized language.” American Federation of Teachers president Randi Weingarten often advocates for holistic assessment systems, such as portfolios of students’ work collected over the course of an entire semester or year, instead of a drawing or musical performance done in a single sitting. Paraphrasing the (perhaps apocryphal) Albert Einstein, Weingarten likes to say, “Not everything that counts can be counted, and not everything that can be counted counts.”

But with the Obama administration’s Race to the Top program providing billions of dollars to states and school districts that agree to evaluate all teachers according to student achievement data, attempts to “count” learning in nontraditional subjects are proliferating. In response to this increased interest, the National Endowment for the Arts recently published a paper reporting that it could find only 30 high-quality arts assessments in use across the country, with far too many schools relying on pencil-and-paper exams to measure students’ art skills. The report recommends the creation of a national, online database of high-quality arts assessments. Under Secretary Arne Duncan, the Department of Education, too, is encouraging states to think beyond fill-in-the-bubble for these nontraditional subjects, but so far, has not released any formal guidelines.

Despite the lack of consensus, states are forging ahead. South Carolina’s fourth-grade music exam, administered via computer, asks: “When singing a melody together with a friend, what dynamic level should you sing? A) Louder than your friend B) Not too loud and not too soft C) Softer than your friend or D) the same as your friend.” (The correct answer is D.) Students are then shown a measure of sheet music and asked to identify which of four electronic recordings matches the notation. The multiple choice section of the state’s fourth grade arts exam shows students a picture, such as one of a vase and a bowl of fruit placed on a chair, and asks them to identify the drawing as either a “landscape,” “portrait,” “non-objective,” or “still-life.” The question is: Does a student’s ability to answer such queries correctly actually indicate arts proficiency? Can such a test measure creativity—or is creativity not the point?

Florida has launched a statewide Performance and Fine Arts Assessment Project to develop a “bank” of test questions and performance-based assessment scenarios for dance, music, and theater, from which local schools can borrow. The most potentially controversial idea Florida is exploring is whether artificial intelligence software might be able to score at least some portion of students’ musical performances, recorded and submitted electronically. AI musical assessment already exists, and is used to help students learn whether the notes they sing or play are on-pitch—which is, of course, just one of many elements that make up a competent musical performance, some of which, such as emotional engagement with the music, might be impossible to quantify.

As I’ve reported in Slate, the push toward computer grading of student essays has a lot to do with saving money; while the technology can assess grammar, spelling, and structure, it cannot yet tell whether students have real knowledge of facts from the curriculum. So as the education establishment moves forward with arts assessment, will states and test-makers follow recognized best practices or go with less rigorous, cheaper, easier to administer exams and grading systems? There is ample precedent for shortcut taking in public school testing: When No Child Left Behind required schools to assess all students in math and reading, many states made tests easier in order to inflate proficiency numbers. Last month, Florida rejiggered the grading of its writing exam in order to avoid embarrassingly high failure rates.

So, when it comes to arts assessment, what are the acknowledged best practices? The International Baccalaureate program, a college-preparatory curriculum available to schools around the world, provides one sophisticated assessment model. I talked to Tara Brancato, a fifth-year music teacher at the Knowledge and Power Preparatory Academy International High School (KAPPA), a Bronx public school with a high-poverty student population. As part of their end-of-course IB exam, Brancato’s 11th- and 12th-grade students listen to recordings and identify their composers, time periods, and musical features; compose original pieces of music; perform on their instruments; and write a research paper comparing musical cultures from around the world. Brancato has fair warning about what will be on her students’ tests, which are created and graded in Cardiff, Wales by IB administrators. She might know in advance, for example, that her students will hear recordings by Mozart and Aaron Copeland, but she won’t know which specific pieces they will hear or what they will be asked about them. Brancato’s teacher evaluation score is partially based on how well her students do on these tests from year to year, and so she gives a lot of practice assessments—something she doesn’t mind, because she thinks both the IB curriculum and the assessments attached to it are high quality.

Another model comes from the nonprofit College Board’s Advanced Placement studio art course, which requires students to electronically submit portfolios of artwork created over the course of a year—the kind of assessment program many teachers’ unions support. The College Board’s website provides free examples of high-scoring student drawings, sculptures, and other works of art. Even Bob Schaeffer of FairTest hails this assessment, which he says is “not perfect” but is at least “based on a real body of work.”

In 2007 the Theater Communications Group, a trade association, launched the TEAM project to help theaters across the country work with schoolchildren whose learning outcomes need to be measured, often to maintain funding for theater programs in a time of budget cuts. One of the biggest challenges in using student achievement data to evaluate teachers is getting a snapshot of what students know (or don’t know) when they enter a classroom, so their “growth” can be tracked over the course of a semester or year. TEAM recommends surveying students at the beginning of a theater program with questions like: “Have you seen a play before? “Have you read a play before?” “Are you confident performing on stage?” The students are asked similar questions at the end of the program, in order to measure their progress. It’s easy to imagine how data from such surveys could be included as part of a teacher’s evaluation grade.

Measuring teachers according to how well their students perform a monologue or create an oil painting—or even how much they learn to enjoy art— will never be exactly like tallying up test scores in algebra, and shouldn’t be. But if schools assess students fairly in the arts—and ideally involve teachers in creating these assessments—they’re sending an important message: The arts matter. After a decade of No Child Left Behind, in which the arts, social studies, and science were often scaled back as schools obsessed over math and reading scores, this could be a real upgrade. The challenge will be in training teachers, improving the curriculum, and communicating with students and parents about what nontraditional assessment is all about. In most districts, this work has barely even begun.