This post is coming to you from Austin, TX, where the American Astronomical Society meeting is taking place, and where you can buy a t-shirt that says "I was drunk on 6th Street," go running on a river trail before dawn and make friends with the people who are hanging out under bridges, see >100 antlers hanging from the same restaurant ceiling, and get your shoes really dusty. And, perhaps, learn some astronomy.







I learned some really interesting stuff yesterday about using the cosmic microwave background to constrain the mass of the neutrino. And then I typed up a blog post about it. And then the blog post disappeared, eaten by the internet, and I, tired from the kind of relentless scheduling--both scientific and social--inherent to conferences, had not saved it anywhere else. So then I took out my legally concealed weapon (yay, Texas) and shot myself in the elbow.





Today, I have not yet attended any talks, because I have been talking to people about the poster on which I'm an author, as are Sue Ann Heatherly, Maura McLaughlin, and Duncan Lorimer. And since I have no new information and I used up a lot of cognitive energy sorting out my manic neutrino notes, I'm going to talk about our research.





Our research is into the effectiveness of involving high school students and teachers in scientific research, and into what the definition of "effectiveness" is, in this context.





What program's effectiveness were you measuring?

NRAO-Green Bank, in conjunction with West Virginia University, runs the Pulsar Search Collaboratory program (which you can read more about here ). Every summer, we bring high school teachers and a few of their students to the observatory for an intense and intensive workshop about pulsars and how to find them and why a person would want to. Then, once we're satisfied that they could answer the questions "What is a pulsar?" "How do you find one?" and "Why would you want to find a pulsar?" we set them loose on terabytes and terabytes and terabytes of never-before-seen-grade-A-shiny data, and they search for the diamonds in the rough (of which there have been a few). These students and teachers them go back to their home institutions and recruit and initiate others into their data-mining cult.





How do you measure a program's effects?

When students and teachers are in Green Bank, we subject them to a series of tests (only one of which involves electroshocking). Upon arrival, they take tests measuring their knowledge of the nature of science (NOS) and their scientific self-efficacy, or their belief that they are competent researchers who can solve problems, and students are asked to rank their interest in science, technology, engineering, and math (STEM) careers.





After the 1-2-week training, after which they have learned how to analyze pulsar data and have done inquiry-based activities and research, we administer these same tests again, to see how their answers have changed. Do they have more scientific self-confidence? Are they more interested in pursuing STEM careers? Do they understand more about the nature of science?





So, were the students more interested in STEM careers?

Students were more interested in being astronomers, software developers, and electrical engineers, but not more interested in being mechanical engineers (I promise that we said no disparaging words about mechanical engineers, though!).





So, do they have more scientific self-confidence?

In short, yes. Both students and teachers were more comfortable doing research, using scientific tools, tackling problems, collaborating, and thinking of themselves as smart and capable (a statistically significant change in attitude from pre- to post-test was considered to be P<0.05). Female students made more significant gains in measures of self-efficacy than male students did.





Interestingly, when the teachers' results are separated by gender, the same is true: the female teachers had many more statistically significant attitude changes than the male teachers did.







A self-assessment, asking teachers to rank their identification with statements, was administered before and after the workshop. Teachers' opinions changed significantly for the statements above. Each shadowed box represents a statistically significant pre-post-test change (in the positive direction).





As you can see from the figure above, while teachers as a whole had positive changes in opinion on many measures of self-efficacy, when female teachers are compared to male teachers, their self-efficacy gains are markedly more frequent.





Why is that?

Well, the female started out, on the pre-test, by ranking themselves lower--more scared, less knowledgeable, less confident, more overwhelmed, etc--than the male teachers did. But while their post-test rankings were markedly different from their pre-test rankings (which is how statistical significance was measured), that change didn't necessarily mean their perceptions of their abilities exceeded the men's. In fact, the women's post-test rankings of their abilities were often lower than the men's post-test rankings, and, on some measures, lower than the men's pre-test rankings.





In short, in the beginning, female teachers tended to think of themselves as less scientifically competent than male teachers did, and while the summer workshop was effective in boosting their confidence, and less effective in boosting male teachers' confidence (possibly because it was pretty high to begin with), that change rarely made male and female teachers equally confident in doing science.





There's a lot of sociocultural stuff going on here, which I'll have to discuss another time, as this post is already too long and a literature search should probably be done.





So was the program effective at teaching the nature of science?

doing scientific inquiry. So we did something strange: we asked the observatory scientists to take the SUSSI, to see how well they, according to the test, understood scientific inquiry. Well, not really. On the Student Understanding of Scientific Inquiry test (SUSSI, explained here: http://www.ihpst2005.leeds.ac.uk/papers/Liang.pdf ), neither teachers nor students showed great improvement. These results were surprising, since the students had spent many hoursscientific inquiry. So we did something strange: we asked the observatory scientists to take the SUSSI, to see how well they, according to the test, understood scientific inquiry.





And you know what? They didn't do too much "better" than the post-test students.





There are a few ways to interpret that (and probably more, but these are the ones I thought of):

1. The test does not really measure understanding of scientific inquiry.

2. The parts of scientific inquiry that the testmakers believe are important are either not the important parts of scientific inquiry, or are not interpreted by test-takers as intended by test-makers.

3. Understanding scientific inquiry is not necessary for doing scientific inquiry.





I have a feeling that the answer is a mishmash of all of the above.





So what does this mean for evaluating an educational program's effectiveness?

Well, we have to ask ourselves what is the most important thing we want teachers and students to get out of doing scientific research.





Do we want them to like science more? Do we want them to be more excited about science? Do we want participants' self-efficacy to change, or do we want to have male and female participants end up in the same place?





Do we want them to be more interested in growing up to be STEM superstars? Or do we want them to understand what science is (ie what they're doing)? And, if the latter, do we need to reconsider how we measure scientific literacy?



