That very first study, ANSI C vs K&R C seems like exactly the kind of test that's needed.
Unfortunately, most languages are now heavily shaped by the typechecker. That is, not many languages can flip the typechecker on or off with a switch - or have parallel typechecked and not implementations.
That was one golden chance to get people familiar with the language to try both variations.
And the effect of that? Slightly in favor of typechecked (he typed smugly). But that's not really enough. It's not clear there's even a 5% bump in productivity.
I like compile time typechecking. I like static guarantees. But there's just not enough evidence to say typechecking is a must-have. And really, I'm not sure it's possible to construct an environment to test that.
Clojure can also do this, since it has a Typed module [1].