Help With Kelly
Everyone: sorry about posting this in the public forum but I could not find a way to PM Ganchrow so I thought I would reply to a seldom looked at post that had no replies anyway.
Hello Ganchrow.
All though I have never posted yet, I have been actively lurking on SBR since about March and am a bit of a fan of your posts.
I am sure you must get many emails from people like myself who know just enough about math to make them dangerous. Here is another one.
I have read your posts on Kelly. I have also read the popular book and the original paper and several websites on the topic.
I keep coming back to the same problem. How do I calculate the sampling error and factor this into the Kelly formula of (bp-q)/b?
I have been assuming that .98/sqrt(n) will give me the percentage I need to subtract. What I have been doing is subtracting .98/sqrt(n) from sqrt(R^2) in my excel regression analysis. For example, if R^2=.05 and I have 100 samples then p=sqrt(.05)-.98/sqrt(n)=.1267067.
Here is a real life scenario. I have 2 variables that predict 1 result. The result (shown as RESULT below) is the fair odds (no vig) NHL moneyline from the closing price from Pinnacle. I have 144 samples representing 72 different games (home and away are each given a separate line). Here are the regression results:
Regression of variable RESULTS:
Goodness of fit statistics:
Observations 144.000
Sum of weights 144.000
DF 141.000
Rē 0.088
Adjusted Rē 0.075
MSE 1.040
RMSE 1.020
MAPE 92.490
DW 1.808
Cp 3.000
AIC 8.589
SBC 17.498
PC 0.951
Analysis of variance:
Source DF Sum of squares Mean squares F Pr > F
Model 2 14.089 7.044 6.775 0.002
Error 141 146.612 1.040
Corrected Total 143 160.701
Computed against model Y=Mean(Y)
Model parameters:
Source Value Standard error t Pr > |t| Lower bound (95%) Upper bound (95%)
Intercept 5.872 2.688 2.185 0.031 0.559 11.185
AH -11.700 3.186 -3.673 0.000 -17.998 -5.402
N 5.851 2.519 2.322 0.022 0.870 10.831
Equation of the model:
RESULTS = 5.87156995163331-11.7001702939112*AH+5.85070213587917*N
My first question, which is really just curiosity, is do you think the quality of these results justify continuing to develop this model?
My second question is how can I determine the correct maximum Kelly for a desired confidence level (I have set the regression to 95% and have also done this in .98/sqrt(n) I believe) using the above numbers? Just in case, I have pasted below 2 columns of numbers you may need. I have sorted the list from the highest level of prediction to the lowest. These predictions are NOT the predictions from the initial regression. Using a program called XLStat, I ran the regression analysis using the model on all of the samples but one (actually two - both home and away for the single game were removed). I then used the regression to predict the results of the one sample that was not included. In my mind, I was removing any bias because the event that was removed was entirely independent from the events which were used to predict it. I then proceeded to do the same for every one of the 72 events. In this way, I was able to gain 144 independent predictions. I do not know if this is good statistical practice but it was the best I could come up with on a small sample. Here are the results:
Actual (Not Fair)
Predict Fair Result Pin Close Odds
Obs168 0.787 -1 195 0.338983
Obs134 0.725 1.5483142 151 0.398406
Obs83 0.645 0.8303939 -126 0.557522
Obs174 0.628 1.2434783 120 0.454545
Obs71 0.579 1.3416667 130 0.434783
Obs32 0.547 0.8041958 -130 0.565217
Obs79 0.527 0.911983 -115 0.534884
Obs69 0.499 -1 -140 0.583333
Obs109 0.495 2.2391716 218 0.314465
Obs13 0.482 0.5761317 -180 0.642857
Obs78 0.446 -1 108 0.480769
Obs169 0.434 1.1062963 106 0.485437
Obs128 0.421 -1 158 0.387597
Obs136 0.42 -1 172 0.367647
Obs4 0.412 1.174843 113 0.469484
Obs34 0.387 -1 -170 0.62963
Obs9 0.376 1.4104858 137 0.421941
Obs74 0.374 0.7917008 -132 0.568966
Obs186 0.361 -1 122 0.45045
Obs73 0.345 0.9815846 -107 0.516908
Obs70 0.345 -1 -153 0.604743
Obs64 0.342 -1 103 0.492611
Obs77 0.338 -1 108 0.480769
Obs59 0.337 -1 132 0.431034
Obs105 0.316 1.4104858 137 0.421941
Obs55 0.312 0.75642 -138 0.579832
Obs1 0.306 -1 -142 0.586777
Obs135 0.304 1.0965116 105 0.487805
Obs104 0.285 1.3416667 130 0.434783
Obs130 0.281 1.2925532 125 0.444444
Obs53 0.262 1.4104858 137 0.421941
Obs35 0.257 0.6183093 -168 0.626866
Obs7 0.256 0.6670836 -156 0.609375
Obs60 0.255 -1 -151 0.601594
Obs163 0.249 1.4695257 143 0.411523
Obs57 0.246 0.6108597 -170 0.62963
Obs91 0.244 0.7620824 -137 0.578059
Obs124 0.235 1.5089105 147 0.404858
Obs147 0.227 0.7345799 -142 0.586777
Obs98 0.224 -1 186 0.34965
Obs156 0.216 -1 130 0.434783
Obs68 0.215 0.8959908 -117 0.539171
Obs95 0.213 1.5483142 151 0.398406
Obs127 0.209 1.637037 160 0.384615
Obs100 0.201 -1 146 0.406504
Obs137 0.192 -1 125 0.444444
Obs149 0.185 -1 105 0.487805
Obs181 0.183 1.3023729 126 0.442478
Obs131 0.182 -1 250 0.285714
Obs99 0.17 1.1062963 106 0.485437
Obs138 0.168 2.0821118 202 0.331126
Obs84 0.165 -1 -134 0.57265
Obs67 0.164 -1 -160 0.615385
Obs42 0.162 -1 -115 0.534884
Obs90 0.161 0.4628331 -230 0.69697
Obs58 0.159 0.7620824 -137 0.578059
Obs61 0.143 -1 -105 0.512195
Obs180 0.134 1.331841 129 0.436681
Obs108 0.127 -1 175 0.363636
Obs185 0.109 1.1258716 108 0.480769
Obs8 0.1 0.7917008 -132 0.568966
Obs40 0.096 0.7855963 -133 0.570815
Obs43 0.096 0.5696509 -182 0.64539
Obs14 0.096 0.8372093 -125 0.555556
Obs103 0.093 1.4104858 137 0.421941
Obs126 0.092 1.3613223 132 0.431034
Obs65 0.088 0.8583359 -122 0.54955
Obs86 0.085 0.7040654 -148 0.596774
Obs85 0.085 -1 -174 0.635036
Obs132 0.068 -1 125 0.444444
Obs107 0.06 -1 115 0.465116
Obs82 0.044 -1 -155 0.607843
Obs145 0.043 -1 110 0.47619
Obs16 0.042 -1 -238 0.704142
Obs148 0.036 -1 128 0.438596
Obs150 0.024 -1 160 0.384615
Obs45 0.009 -1 -222 0.689441
Obs11 0.005 -1 -140 0.583333
Obs10 0.001 -1 -147 0.595142
Obs63 -0.002 0.7453416 -140 0.583333
Obs178 -0.007 1.6764964 164 0.378788
Obs184 -0.02 -1 127 0.440529
Obs38 -0.033 0.3915344 -270 0.72973
Obs5 -0.038 0.5193835 -206 0.673203
Obs170 -0.06 0.8882008 -118 0.541284
Obs157 -0.064 0.9285496 -113 0.530516
Obs36 -0.066 -1 -139 0.58159
Obs183 -0.067 -1 210 0.322581
Obs6 -0.067 -1 -116 0.537037
Obs96 -0.069 0.9724972 -108 0.519231
Obs3 -0.076 -1 -102 0.50495
Obs33 -0.084 -1 -142 0.586777
Obs39 -0.098 0.7736626 -135 0.574468
Obs179 -0.111 -1 138 0.420168
Obs177 -0.114 1.282735 124 0.446429
Obs165 -0.134 -1 169 0.371747
Obs160 -0.16 1.5384615 150 0.4
Obs97 -0.16 -1 -123 0.55157
Obs161 -0.17 -1 107 0.483092
Obs166 -0.175 -1 -103 0.507389
Obs173 -0.177 0.65 -160 0.615385
Obs94 -0.19 1.3613223 132 0.431034
Obs125 -0.201 -1 120 0.454545
Obs153 -0.202 1.4498406 141 0.414938
Obs93 -0.209 0.7917008 -132 0.568966
Obs15 -0.228 0.5601966 -185 0.649123
Obs72 -0.229 0.5794272 -179 0.641577
Obs129 -0.23 1.331841 129 0.436681
Obs162 -0.235 1.3416667 130 0.434783
Obs31 -0.243 -1 -157 0.610895
Obs155 -0.243 1 -105 0.512195
Obs62 -0.245 -1 -105 0.512195
Obs154 -0.246 1 -105 0.512195
Obs87 -0.256 -1 -139 0.58159
Obs152 -0.259 0.7345799 -142 0.586777
Obs175 -0.261 1.4892157 145 0.408163
Obs151 -0.262 -1 127 0.440529
Obs88 -0.268 -1 -136 0.576271
Obs159 -0.271 0.903917 -116 0.537037
Obs101 -0.273 -1 122 0.45045
Obs102 -0.276 -1 -147 0.595142
Obs56 -0.277 0.911983 -115 0.534884
Obs37 -0.28 -1 -135 0.574468
Obs52 -0.316 0.8730159 -120 0.545455
Obs92 -0.321 -1 -118 0.541284
Obs106 -0.321 -1 170 0.37037
Obs80 -0.325 -1 150 0.4
Obs76 -0.328 -1 -116 0.537037
Obs2 -0.337 -1 -161 0.616858
Obs172 -0.353 -1 105 0.487805
Obs133 -0.371 -1 123 0.44843
Obs146 -0.389 -1 -147 0.595142
Obs167 -0.395 -1 122 0.45045
Obs75 -0.412 0.4966496 -215 0.68254
Obs158 -0.42 -1 112 0.471698
Obs54 -0.436 -1 132 0.431034
Obs66 -0.439 -1 106 0.485437
Obs176 -0.443 -1 116 0.462963
Obs12 -0.452 -1 -147 0.595142
Obs44 -0.484 0.7736626 -135 0.574468
Obs41 -0.543 -1 -161 0.616858
Obs81 -0.617 -1 -130 0.565217
Obs164 -0.755 -1 -140 0.583333
Obs171 -1.165 0.8882008 -118 0.541284
Regarding the above results, it appears to me that there is a good degree of correlation. BTW, I have noticed that the results did not seems to be randomly distributed. For example, having 9 losses in a row and then another 11 losses in a row later on seems highly improbable in a list as short as this. I noticed as well that if I grouped the prediction column into weighted quartiles (I sum up the column from top to bottom until I reach 1/4 of the total value of the column and this gives me my top quartile, and then go on to the next 25% etc.), that it seems to predict the exact spot where the results start to change dramatically. I even did this for 1/8's and it also worked almost perfectly. In fact, the 1/8 average results (from the top down) are:
81.14%
20.89%
23.83%
16.18%
00.62%
12.03%
-88.48%
-47.68%
I do not know if this is a coincidence.
If you want me to email you an excel spreadsheet with the data so you can work with it easier just let me know.
Thanks for looking at this and for all your advice in the forum.
VideoReview
PS If you think this discussion is better done as a PM, let me know how and maybe delete this irrelevant (to the thread) post. Thanks.
|