Quote:
Originally Posted by Ganchrow
So this is either a "good" result or (when taken in conjunction with the above) is suggestive of a programming error. Question: If you split up your successful seasonal data using the other method do you find one of you thirds drastically outperforms the other two?
|
When I spilt up the data I actually put them on different spreadsheets to avoid any possible contamination. All of my assumptions, including initial hypothesis', are then drawn from the initial set only. To answer your question, very often the case is that one of the segments drastically outperforms the other two and it is not always the initial sample.
Quote:
Originally Posted by Ganchrow
Another possibility is that there exists autocorrelation within your time series (so results on days t-1 ... t-n need be used in formulating day t forecasts). If this were the case then segmenting your data set so it cut across relevant time horizons would be a bad idea.
|
That is the first time I have ever heard mention of this possibility. Is there a quick and easy test to determine if this is the case? When I first started seeing the results, I started fooling around with moving averages trying to develop some stochastic triggers. I really haven't got to deep with that though yet.
Quote:
Originally Posted by Ganchrow
If you make this claim with prior knowledge that the entire 10,000 game population had "average" results, then this would follow directly from conditional probability.
Without this precondition, then you're either falling prey to the Gambler's Fallacy or have uncovered what promises to be a very lucrative sports-based autoregressive moving average model.
|
In a way I do make this with prior knowledge but that knowledge is more
a priori since I know that the books would not leave something as blatant as favourites winning at an ROI of +50% for any amount of time and would correct the situation even if the sharps and public don't. So, even though I do not know what sport it is and where the sample is from, I strongly believe that if you gave me any 10,000 consecutive games from any sport and divided them up as I mentioned, the other half would have to have significant positive ROI for the dogs. I know what you mean about Gambler's Fallacy and I am not suggesting that things just have to even out. They don't. However, from what I have seen in sports betting, the distribution of results are often skewed or have such a high level of kurtosis such that the probability of it being random is highly unlikely (p<=.0001 etc.). So if the distribution of results is being "controlled" then I do not think it is a big leap to suggest that the entire population results would be average. What do you think of my assumptions above?
Quote:
Originally Posted by Ganchrow
If this works for you then that's certainly great news. If you've done everything correctly that I'd suggest you start moving ahead.
That said, the fact that it only works when you divide your data in such a manner does not inspire a whole lot of confidence within me. I'd strongly suggest double-, triple-, quadruple-, quintuple-, and sextuple-checking your work (you can stop there septuple-checking is just plain silly) to make absolutely certain that no out-of-sample data at all somehow crept in to your in-sample modeling.
But let me make this very clear ... there's nothing necessarilly "wrong" with segmenting data by season, and if you get good results then by all means go for it. But based upon what you;ve written above I'm just going to warn you again to make sure your programming and modeling are sound.
Finding meaningful results particularly difficult to come by when using proper sampling methodology is generally to be expected. 
|
Thanks again for the fair warning. I have done everything I can to avoid programming error. I am working in Excel so I can see the results line by line and it all appears good and the sample and test data are separated.