|
11-04-2007, 08:32 AM
|
#1 (permalink)
|
|
Moderator
Join Date: 08-28-05
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,587
|
reduced kelly bet sizing (received via PM)
Quote:
|
Originally Posted by 8lrr8
Ganchrow,
i was playing around w/ your kelly calc, and noticed something strange:
suppose my winrate is 59%, all at -113 odds. at full kelly, the median bankroll (if i start w/ $10k) after 600 bets is $731k.
i've heard that if one bets half kelly, one gets (in theory) 75% of the full kelly's return (based on median bankroll, and not average BR). similarly, if one bets 70% of full kelly, one's (median) return is ~90% of full kelly. but when i input 0.7 and 0.5 for 70% and half kelly (respectively) into the calc, the results are very different.
instead of expecting a median BR (after 600 bets) of 658k and 548k for 70% and 50% kelly betting (respectively), the calculator gives me a median BR of 498k (for 70% kelly) and a median bankroll of 251k (for half-kelly).
what's the explanation for this? have i been misinformed about the expected median return for half-kelly?
|
As a general rule it is indeed true that with the edges and odds one is likely to encounter in sports betting, the expected growth rate of half-Kelly and 70%-Kelly correspond to approximately 75% and 90% that of full-Kelly (respectively). You should note that these approximations will break down drastically at the extremes.
So why the discrepancy after 600 bets?
The two approximations you've noted apply to (geometric) average growth rates, while the median bankroll is a compounded growth figure. As Albert Einstein allegedly quipped, “The most powerful force in the universe is compound interest.”
Now while Einstein's authorship of the above statement is dubious, there's no question that the effect of compound interest over a large number of trials can be substantial.
So let's look at your example above:
- US Odds: -113
- Win Prob: 59%
- Bankroll: $10,000.000
- Trials: 600
At full-Kelly: - Stake: $1,267.000
- Expected Growth: $71.807 (71.807/10,000 = 0.71807%)
- Median bankroll after 600 trials: $731,869.998 ≈ (1+0.71807%)600 (slight difference due to rounding)
At 70%-Kelly: - Stake: $891.031
- Expected Growth: $65.375 (65.375/10,000 = 0.65375%)
- Median bankroll after 600 trials: $498,855.372 ≈ (1+0.65375%)600 (slight difference due to rounding)
At 50%-Kelly: - Stake: $638.125
- Expected Growth: $53.905 (53.905/10,000 = 0.53905%)
- Median bankroll after 600 trials: $251,696.177 ≈ (1+0.53905%)600 (slight difference due to rounding)
So indeed what we see is that 50%-Kelly expected growth is about 75% of full Kelly growth (0.53905% / 0.71807% = 75.069%), and that 70%-Kelly growth is about 90% of full Kelly growth (0.65375% / 0.71807% = 91.043%).
__________________
|
|
|
|
12-14-2007, 02:35 PM
|
#2 (permalink)
|
|
SBR High Roller
Join Date: 12-14-07
Location: Canada
Posts: 105
|
Help With Kelly
Everyone: sorry about posting this in the public forum but I could not find a way to PM Ganchrow so I thought I would reply to a seldom looked at post that had no replies anyway.
Hello Ganchrow.
All though I have never posted yet, I have been actively lurking on SBR since about March and am a bit of a fan of your posts.
I am sure you must get many emails from people like myself who know just enough about math to make them dangerous. Here is another one.
I have read your posts on Kelly. I have also read the popular book and the original paper and several websites on the topic.
I keep coming back to the same problem. How do I calculate the sampling error and factor this into the Kelly formula of (bp-q)/b?
I have been assuming that .98/sqrt(n) will give me the percentage I need to subtract. What I have been doing is subtracting .98/sqrt(n) from sqrt(R^2) in my excel regression analysis. For example, if R^2=.05 and I have 100 samples then p=sqrt(.05)-.98/sqrt(n)=.1267067.
Here is a real life scenario. I have 2 variables that predict 1 result. The result (shown as RESULT below) is the fair odds (no vig) NHL moneyline from the closing price from Pinnacle. I have 144 samples representing 72 different games (home and away are each given a separate line). Here are the regression results:
Regression of variable RESULTS:
Goodness of fit statistics:
Observations 144.000
Sum of weights 144.000
DF 141.000
Rē 0.088
Adjusted Rē 0.075
MSE 1.040
RMSE 1.020
MAPE 92.490
DW 1.808
Cp 3.000
AIC 8.589
SBC 17.498
PC 0.951
Analysis of variance:
Source DF Sum of squares Mean squares F Pr > F
Model 2 14.089 7.044 6.775 0.002
Error 141 146.612 1.040
Corrected Total 143 160.701
Computed against model Y=Mean(Y)
Model parameters:
Source Value Standard error t Pr > |t| Lower bound (95%) Upper bound (95%)
Intercept 5.872 2.688 2.185 0.031 0.559 11.185
AH -11.700 3.186 -3.673 0.000 -17.998 -5.402
N 5.851 2.519 2.322 0.022 0.870 10.831
Equation of the model:
RESULTS = 5.87156995163331-11.7001702939112*AH+5.85070213587917*N
My first question, which is really just curiosity, is do you think the quality of these results justify continuing to develop this model?
My second question is how can I determine the correct maximum Kelly for a desired confidence level (I have set the regression to 95% and have also done this in .98/sqrt(n) I believe) using the above numbers? Just in case, I have pasted below 2 columns of numbers you may need. I have sorted the list from the highest level of prediction to the lowest. These predictions are NOT the predictions from the initial regression. Using a program called XLStat, I ran the regression analysis using the model on all of the samples but one (actually two - both home and away for the single game were removed). I then used the regression to predict the results of the one sample that was not included. In my mind, I was removing any bias because the event that was removed was entirely independent from the events which were used to predict it. I then proceeded to do the same for every one of the 72 events. In this way, I was able to gain 144 independent predictions. I do not know if this is good statistical practice but it was the best I could come up with on a small sample. Here are the results:
Actual (Not Fair)
Predict Fair Result Pin Close Odds
Obs168 0.787 -1 195 0.338983
Obs134 0.725 1.5483142 151 0.398406
Obs83 0.645 0.8303939 -126 0.557522
Obs174 0.628 1.2434783 120 0.454545
Obs71 0.579 1.3416667 130 0.434783
Obs32 0.547 0.8041958 -130 0.565217
Obs79 0.527 0.911983 -115 0.534884
Obs69 0.499 -1 -140 0.583333
Obs109 0.495 2.2391716 218 0.314465
Obs13 0.482 0.5761317 -180 0.642857
Obs78 0.446 -1 108 0.480769
Obs169 0.434 1.1062963 106 0.485437
Obs128 0.421 -1 158 0.387597
Obs136 0.42 -1 172 0.367647
Obs4 0.412 1.174843 113 0.469484
Obs34 0.387 -1 -170 0.62963
Obs9 0.376 1.4104858 137 0.421941
Obs74 0.374 0.7917008 -132 0.568966
Obs186 0.361 -1 122 0.45045
Obs73 0.345 0.9815846 -107 0.516908
Obs70 0.345 -1 -153 0.604743
Obs64 0.342 -1 103 0.492611
Obs77 0.338 -1 108 0.480769
Obs59 0.337 -1 132 0.431034
Obs105 0.316 1.4104858 137 0.421941
Obs55 0.312 0.75642 -138 0.579832
Obs1 0.306 -1 -142 0.586777
Obs135 0.304 1.0965116 105 0.487805
Obs104 0.285 1.3416667 130 0.434783
Obs130 0.281 1.2925532 125 0.444444
Obs53 0.262 1.4104858 137 0.421941
Obs35 0.257 0.6183093 -168 0.626866
Obs7 0.256 0.6670836 -156 0.609375
Obs60 0.255 -1 -151 0.601594
Obs163 0.249 1.4695257 143 0.411523
Obs57 0.246 0.6108597 -170 0.62963
Obs91 0.244 0.7620824 -137 0.578059
Obs124 0.235 1.5089105 147 0.404858
Obs147 0.227 0.7345799 -142 0.586777
Obs98 0.224 -1 186 0.34965
Obs156 0.216 -1 130 0.434783
Obs68 0.215 0.8959908 -117 0.539171
Obs95 0.213 1.5483142 151 0.398406
Obs127 0.209 1.637037 160 0.384615
Obs100 0.201 -1 146 0.406504
Obs137 0.192 -1 125 0.444444
Obs149 0.185 -1 105 0.487805
Obs181 0.183 1.3023729 126 0.442478
Obs131 0.182 -1 250 0.285714
Obs99 0.17 1.1062963 106 0.485437
Obs138 0.168 2.0821118 202 0.331126
Obs84 0.165 -1 -134 0.57265
Obs67 0.164 -1 -160 0.615385
Obs42 0.162 -1 -115 0.534884
Obs90 0.161 0.4628331 -230 0.69697
Obs58 0.159 0.7620824 -137 0.578059
Obs61 0.143 -1 -105 0.512195
Obs180 0.134 1.331841 129 0.436681
Obs108 0.127 -1 175 0.363636
Obs185 0.109 1.1258716 108 0.480769
Obs8 0.1 0.7917008 -132 0.568966
Obs40 0.096 0.7855963 -133 0.570815
Obs43 0.096 0.5696509 -182 0.64539
Obs14 0.096 0.8372093 -125 0.555556
Obs103 0.093 1.4104858 137 0.421941
Obs126 0.092 1.3613223 132 0.431034
Obs65 0.088 0.8583359 -122 0.54955
Obs86 0.085 0.7040654 -148 0.596774
Obs85 0.085 -1 -174 0.635036
Obs132 0.068 -1 125 0.444444
Obs107 0.06 -1 115 0.465116
Obs82 0.044 -1 -155 0.607843
Obs145 0.043 -1 110 0.47619
Obs16 0.042 -1 -238 0.704142
Obs148 0.036 -1 128 0.438596
Obs150 0.024 -1 160 0.384615
Obs45 0.009 -1 -222 0.689441
Obs11 0.005 -1 -140 0.583333
Obs10 0.001 -1 -147 0.595142
Obs63 -0.002 0.7453416 -140 0.583333
Obs178 -0.007 1.6764964 164 0.378788
Obs184 -0.02 -1 127 0.440529
Obs38 -0.033 0.3915344 -270 0.72973
Obs5 -0.038 0.5193835 -206 0.673203
Obs170 -0.06 0.8882008 -118 0.541284
Obs157 -0.064 0.9285496 -113 0.530516
Obs36 -0.066 -1 -139 0.58159
Obs183 -0.067 -1 210 0.322581
Obs6 -0.067 -1 -116 0.537037
Obs96 -0.069 0.9724972 -108 0.519231
Obs3 -0.076 -1 -102 0.50495
Obs33 -0.084 -1 -142 0.586777
Obs39 -0.098 0.7736626 -135 0.574468
Obs179 -0.111 -1 138 0.420168
Obs177 -0.114 1.282735 124 0.446429
Obs165 -0.134 -1 169 0.371747
Obs160 -0.16 1.5384615 150 0.4
Obs97 -0.16 -1 -123 0.55157
Obs161 -0.17 -1 107 0.483092
Obs166 -0.175 -1 -103 0.507389
Obs173 -0.177 0.65 -160 0.615385
Obs94 -0.19 1.3613223 132 0.431034
Obs125 -0.201 -1 120 0.454545
Obs153 -0.202 1.4498406 141 0.414938
Obs93 -0.209 0.7917008 -132 0.568966
Obs15 -0.228 0.5601966 -185 0.649123
Obs72 -0.229 0.5794272 -179 0.641577
Obs129 -0.23 1.331841 129 0.436681
Obs162 -0.235 1.3416667 130 0.434783
Obs31 -0.243 -1 -157 0.610895
Obs155 -0.243 1 -105 0.512195
Obs62 -0.245 -1 -105 0.512195
Obs154 -0.246 1 -105 0.512195
Obs87 -0.256 -1 -139 0.58159
Obs152 -0.259 0.7345799 -142 0.586777
Obs175 -0.261 1.4892157 145 0.408163
Obs151 -0.262 -1 127 0.440529
Obs88 -0.268 -1 -136 0.576271
Obs159 -0.271 0.903917 -116 0.537037
Obs101 -0.273 -1 122 0.45045
Obs102 -0.276 -1 -147 0.595142
Obs56 -0.277 0.911983 -115 0.534884
Obs37 -0.28 -1 -135 0.574468
Obs52 -0.316 0.8730159 -120 0.545455
Obs92 -0.321 -1 -118 0.541284
Obs106 -0.321 -1 170 0.37037
Obs80 -0.325 -1 150 0.4
Obs76 -0.328 -1 -116 0.537037
Obs2 -0.337 -1 -161 0.616858
Obs172 -0.353 -1 105 0.487805
Obs133 -0.371 -1 123 0.44843
Obs146 -0.389 -1 -147 0.595142
Obs167 -0.395 -1 122 0.45045
Obs75 -0.412 0.4966496 -215 0.68254
Obs158 -0.42 -1 112 0.471698
Obs54 -0.436 -1 132 0.431034
Obs66 -0.439 -1 106 0.485437
Obs176 -0.443 -1 116 0.462963
Obs12 -0.452 -1 -147 0.595142
Obs44 -0.484 0.7736626 -135 0.574468
Obs41 -0.543 -1 -161 0.616858
Obs81 -0.617 -1 -130 0.565217
Obs164 -0.755 -1 -140 0.583333
Obs171 -1.165 0.8882008 -118 0.541284
Regarding the above results, it appears to me that there is a good degree of correlation. BTW, I have noticed that the results did not seems to be randomly distributed. For example, having 9 losses in a row and then another 11 losses in a row later on seems highly improbable in a list as short as this. I noticed as well that if I grouped the prediction column into weighted quartiles (I sum up the column from top to bottom until I reach 1/4 of the total value of the column and this gives me my top quartile, and then go on to the next 25% etc.), that it seems to predict the exact spot where the results start to change dramatically. I even did this for 1/8's and it also worked almost perfectly. In fact, the 1/8 average results (from the top down) are:
81.14%
20.89%
23.83%
16.18%
00.62%
12.03%
-88.48%
-47.68%
I do not know if this is a coincidence.
If you want me to email you an excel spreadsheet with the data so you can work with it easier just let me know.
Thanks for looking at this and for all your advice in the forum.
VideoReview
PS If you think this discussion is better done as a PM, let me know how and maybe delete this irrelevant (to the thread) post. Thanks.
|
|
|
|
12-14-2007, 04:11 PM
|
#3 (permalink)
|
|
SBR MVP
Join Date: 01-31-06
Posts: 1,870
|
Miniscule sample.
Back-fitting.
Trying to beat the market with widely available information.
|
|
|
|
12-15-2007, 04:02 AM
|
#4 (permalink)
|
|
Moderator
Join Date: 08-28-05
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,587
|
Quote:
Originally Posted by VideoReview
-snipped message-
|
Your question actually has little to do with Kelly per se. Given stated payout odds, then the mapping of probability to Kelly stake will be injective (i.e., one-to-one) for all positive expectation. As such, once you determine a confidence interval for your forecast probability, converting that to a confidence interval for Kelly is trivial.
I'm a little unclear as to why you're using R 2 as an estimator of probability. Loosely speaking, the R 2 of a model corresponds to the percent of the variability of your data set that's explained by that model. Unless I've misunderstood the nature of your regression, that's going to be very different from the win probability you're attempting to estimate.
Typically when trying to estimate a probability (which is obviously only defined on the interval [0,1]) using regression analysis one uses the logarithim of the inverse of "fair" payout odds (i.e., "fair" decimal odds - 1, or b in your Kelly equation given an edge of 0) as the dependent variable, which is defined across the entire set of real numbers. This is know as a "logistic regression".
It's also not strictly correct to use ±0.98/sqrt(n) as your interval. While this does correspond to a 95% confidence interval, this would only really be strictly true were your data set drawn from a single binomial distribution with p=50% (.98 = 2*50%*(1-50%)*1.96). In the context of your particular problem the confidence interval would be better expressed using the standard error of the regression.
Specific mechanics aside I suspect I'd probably tend to agree with RickySteve in his analysis. That said, why don't you e-mail me your spreadsheet along with a description of each of the columns and we can take it from there.
__________________
|
|
|
|
12-15-2007, 12:59 PM
|
#5 (permalink)
|
|
SBR Sharp
Join Date: 11-09-07
Posts: 274
|
Quote:
Originally Posted by RickySteve
Miniscule sample.
Back-fitting.
Trying to beat the market with widely available information.
|
yeah but unless you have some sort of inside info then you'll always be trying to beat the market with widely available information. and as far as back-fitting, couldn't any analysis of past results be accused of the same thing?
|
|
|
|
12-15-2007, 01:04 PM
|
#6 (permalink)
|
|
SBR Wise Guy
Join Date: 11-27-07
Location: U.S.S. Enterprise NCC-1701-E
Posts: 939
|
Quote:
Originally Posted by VideoReview
Regression of variable RESULTS:
Goodness of fit statistics:
Observations 144.000
Sum of weights 144.000
DF 141.000
Rē 0.088
Adjusted Rē 0.075
|
Quote:
|
My first question, which is really just curiosity, is do you think the quality of these results justify continuing to develop this model?
|
This seems like a first step of a long journey.
Quote:
|
Regarding the above results, it appears to me that there is a good degree of correlation.
|
You do not need to guess here. The correlation between the actual results and variable RESULTS is 0.297 as it is just the positive square root of R-squared. While 0.297 is sure noticeable enough and certainly can be used for forecasting I would venture a guess that linesmakers model produces much better results. Therefore, while you can make predictions that are much better than random you are currently unable to make predictions better than linesmakers do and therefore cannot beat the line with your model in its current state.
|
|
|
|
12-15-2007, 02:48 PM
|
#7 (permalink)
|
|
SBR High Roller
Join Date: 12-14-07
Location: Canada
Posts: 105
|
How do I email you Ganchrow?
|
|
|
|
12-15-2007, 03:33 PM
|
#8 (permalink)
|
|
Moderator
Join Date: 08-28-05
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,587
|
E-mail address is in my profile.
__________________
|
|
|
|
12-15-2007, 03:43 PM
|
#9 (permalink)
|
|
Moderator
Join Date: 08-28-05
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,587
|
Quote:
Originally Posted by Data
You do not need to guess here. The correlation between the actual results and variable RESULTS is 0.297 as it is just the positive square root of R-squared. While 0.297 is sure noticeable enough and certainly can be used for forecasting I would venture a guess that linesmakers model produces much better results. Therefore, while you can make predictions that are much better than random you are currently unable to make predictions better than linesmakers do and therefore cannot beat the line with your model in its current state.
|
I wouldn't even go that far at this point, because as it stands the model most likely isn't properly specified and so I suspect the R 2 might be less meaningful than it appears.
I think he's probably going need to redo this as a logistic regression and then take it from there.
__________________
|
|
|
|
01-24-2008, 08:34 AM
|
#10 (permalink)
|
|
SBR High Roller
Join Date: 12-14-07
Location: Canada
Posts: 105
|
Quote:
Originally Posted by Ganchrow
Your question actually has little to do with Kelly per se. Given stated payout odds, then the mapping of probability to Kelly stake will be injective (i.e., one-to-one) for all positive expectation. As such, once you determine a confidence interval for your forecast probability, converting that to a confidence interval for Kelly is trivial.
|
I am having trouble understanding what you mean. Perhaps you could explain using the following simple example.
Assume I have the following random sample (That is, it was hypothesized beforehand without looking at the data, and these are the results):
2500 games win at +140
2500 games win at +120
5000 games lose at -100
My win probability is .50, my lose probability is .50, my average odds payout is 1.30. My ROI is .15, for interests sake.
Therefore, Kelly = ((.50*1.30)-.50)/1.3 = .1153846
I have 2 questions.
How do I factor in the 10,000 event sample size to calculate for various confidence levels (e.g. 95%)?
Is a logistic regression of the inverse of fair payout odds required when I am not trying to determine a probability between 0 and 1 but a return (or maybe I shouldn't be think along those lines at all)?
If I had a similar ROI of .15 but with a 50/50 outcome such as:
5000 games win at +130
5000 games lose at -100
The way I would calculate Kelly at 95% confidence would be:
((.50-(.98/sqrt(10000))*1.3-(.5+(.98/sqrt(10000))))/1.3 = .098046
Is the above calculation correct?
VideoReview
|
|
|
|
01-24-2008, 09:43 AM
|
#11 (permalink)
|
|
Moderator
Join Date: 08-28-05
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,587
|
Quote:
Originally Posted by VideoReview
Assume I have the following random sample (That is, it was hypothesized beforehand without looking at the data, and these are the results):
2500 games win at +140
2500 games win at +120
5000 games lose at -100
My win probability is .50, my lose probability is .50, my average odds payout is 1.30. My ROI is .15, for interests sake.
Therefore, Kelly = ((.50*1.30)-.50)/1.3 = .1153846
|
If you don't mind making a bunch of simplifying assumptions here's a simple way of approximating this:
If we take the 15% return as an unbiased indicator of true population edge (which we assume doesn't vary with payout odds) and further assume that half the sample was at +120 and the other half at +140, then the sample variance for unit-risk bets for each of the two odds classes would be (5,000 * (1+edge) * (payout_odds - edge)).
Hence the total std. dev. of our edge estimate (assuming betting to win equal quantities) for unit- win bets would then be: SQRT(5000*1.15*((1.4-0.15)/1.4^2+(1.2-0.15)/1.2^2))/(5000/1.4+5000/1.2) ≈ 1.146%.
Appealing to the central limit theorem and assuming edge doesn't vary with payout odds, your 95% confidence interval for edge would then be about 15% ± 2.246% (If you wanted to be more a bit more accurate you'd probably want to use either a Weibull or lognormal distribution).
So at +120 your full-Kelly 95% confidence interval would be about (10.629%, 14.371%).
And at +140 your full-Kelly 95% confidence interval would be about (9.110%, 12.318%).
Quote:
Originally Posted by VideoReview
If I had a similar ROI of .15 but with a 50/50 outcome such as:
5000 games win at +130
5000 games lose at -100
The way I would calculate Kelly at 95% confidence would be:
((.50-(.98/sqrt(10000))*1.3-(.5+(.98/sqrt(10000))))/1.3 = .098046
Is the above calculation correct?
|
As long as you're comfortable appealing to the Central Limit Theorem, then yes.
__________________
Last edited by Ganchrow : 01-25-2008 at 07:31 AM.
Reason: corrected error in formula
|
|
|
|
01-24-2008, 11:08 AM
|
#12 (permalink)
|
|
SBR High Roller
Join Date: 12-14-07
Location: Canada
Posts: 105
|
| |