Forgive me, but I'm not sure I see where you're heading with this.
You have a model that you believe performs well out-of-sample. Why not just run with it? What's really your concern?
Quote:
Originally Posted by VideoReview
When I spilt up the data I actually put them on different spreadsheets to avoid any possible contamination. All of my assumptions, including initial hypothesis', are then drawn from the initial set only. To answer your question, very often the case is that one of the segments drastically outperforms the other two and it is not always the initial sample.
|
But you're saying that you've tried segmenting your data set in two different ways.
If you segment your data set based on seasons do you still find that three-day pattern?
Quote:
Originally Posted by VideoReview
In a way I do make this with prior knowledge but that knowledge is more a priori since I know that the books would not leave something as blatant as favourites winning at an ROI of +50% for any amount of time and would correct the situation even if the sharps and public don't. So, even though I do not know what sport it is and where the sample is from, I strongly believe that if you gave me any 10,000 consecutive games from any sport and divided them up as I mentioned, the other half would have to have significant positive ROI for the dogs. I know what you mean about Gambler's Fallacy and I am not suggesting that things just have to even out. They don't. However, from what I have seen in sports betting, the distribution of results are often skewed or have such a high level of kurtosis such that the probability of it being random is highly unlikely (p<=.0001 etc.). So if the distribution of results is being "controlled" then I do not think it is a big leap to suggest that the entire population results would be average. What do you think of my assumptions above?
|
I'm not sure I see a testable hypothesis here.
...
If you want to test for autocorrelation your very first step would be looking at the Durbin-Watson statistic. Try regressing your residuals on your lagged residuals (so if it's a 3-day cycle you're seeing regress e
t on e
t-1 and e
t-2.) nI suspect this will prove a waste of time.
It's very difficult for me to guess what's going on here but to perfectly blunt, my estimation is that you're either on to something really, really big or there's some sort of systematic error in your model.
I know you're doing this for NHL. Why not give it a try with MLB, a sport for which considerably more data exists.