Subjects using the dynamically-typed language completed the scanner in an average of 5.2 hours, statically-typed in an average of 7.7 hours; the difference is statistically significant at level p=0.04.

Fourteen subjects using the dynamically-typed language failed to complete the parser (that is, passed only 50% of tests), 11 subjects using the statically-typed language failed.

Subjects using the dynamically-typed language passed an average of 60.2% of the tests, subjects using the statically-typed language passed an average of 64.5% of the tests; but the difference is not statistically significant at level p=0.40.

Stefan Hanenberg, An experiment about static and dynamic type systems: doubts about the positive impact of static type systems on development time , OOPSLA, Reno/Tahoe Nevada, pp. 22–35, October 2010.If programming language design is to become a science, we need more experiments like this one.The author measured time for 49 subjects to build a simple parser in Purity, a language similar to Smalltalk implemented for this experiment in two variants. Twenty-five subjects implemented the parser in the dynamically-typed variant, and 24 used the statically-typed variant. Two measurements were taken: the time at which the lexical scanner passed all its tests, and the percentage of tests passed by the parser after 27 hours; the tests were equally divided between accept and reject, so a random program would pass 50% of the tests.There are many potential objections to these results. Subjects were not told to complete the scanner before working on the parser. It is not clear why Mann-Whitney's U-Test is used to assess significance rather than Student's T-Test. (I didn't check whether the T-Test yielded different significance.)