It is available as a .doc
here, or the .pdf (if you have JSTOR access) is
here.
When I say 'weak form' efficiency I'm using the definition popularised by Fama in early '70s as part of EMH. To the best of my understanding, this means making a 'model' (although I feel this definition is where I am not explaining myself) from price data alone.
For example, the simplest test (which has been performed repeatedly) in sport betting is to group outcomes in the dataset by price level and test whether betting at a particular price level would have produced a profit. I can't see how this sort of efficiency test (or any other based on some sort of parameter-less 'model', such as the famous, in horse racing circles at least, HZR system) would require out of sample testing.
I think this points at the discrepancy between the systematic rule based 'model' I was getting at, and the probability prediction model you mean.
I'm not trying to claim anything you have written is wrong. Regardless, I'm sure you would point out that you definition of 'model' did not cover this

(and I think I would agree that it is a tenuous use of the word).
I'm just trying to add that there are (conceivably) profitable 'models' (or maybe a better term would be 'systems') that don't require out of sample testing.