If someone wins at a 60% clip for 1000 plays, is it safe to assume that he will likely win no less than 55% percent of his next 1000 plays?
| Poster's Sportsbook Poll: 2011View Poll Results | ||
| # 1 5Dimes | 450 total points | 5Dimes Review |
| # 2 Pinnacle | 408 total points | Pinnacle Review |
| # 3 Heritage | 227 total points | Heritage Review |
| # 4 Bookmaker | 138 total points | Bookmaker Review |
| # 5 BetIslands | 129 total points | BetIslands Review |
| SBR Top-Rated SportsbooksRecommended List | ||
| Pinnacle Sports | SBR Rating A+ | Pinnacle Sports Review |
| 5Dimes | SBR Rating A+ | 5Dimes Review |
| BookMaker | SBR Rating A+ | BookMaker Review |
| Legends | SBR Rating A+ | Legends Review |
| Bodog | SBR Rating A | Bodog Review |
If he did it--subscribe!!
SBR WORLD
POKER CUP
3rd Place
2012
SBR POKER TOURNEY9th Place 5/28/2012
SBR POKER TOP 100
29th Place 11/1/2011
SBR Founder Join Date: 8/10/2005
SBR Founder Join Date: 11/16/2005
SBR Founder Join Date: 8/10/2005
SBR Founder Join Date: 8/10/2005
This is a traditional exercise in what’s known as “Bayesian inference”. You have a number of observations of a random variable (1,000), which you use to infer knowledge about a statistical parameter (the handicapper’s “true” success rate).
As the question has been phrased, however, the answer is not well defined. This is because we still need to make an assumption regarding the distribution of the parameter within the population. To put it in simple terms, we need to know how likely it would be that a randomly selected handicapper would have a true success rate of X (this is known as the “prior distribution” of X as it reflects our knowledge of X prior to any observations of the particular handicapper).
The reason for this requirement should be clear on reflection. Imagine, for example, if we knew with 100% certainty that because of the randomness inherent in sports betting no handicapper could ever be a > 58% true picker. If this were the case then we could say with certainty that at least some of the handicapper’s observed success would be attributable purely to luck.
One common simplifying assumption is that the prior distribution of our parameter is “uniform” meaning that all values are equally likely within the population. Understand of course that we know this to be untrue. We know, for example, that 50% pickers are infinitely more common within the population than 100% pickers. But that said, assuming a uniform distribution does provide for computational ease. First, I’m going to demonstrate how to use Excel to calculate a solution to this problem under the working assumption of a uniform distribution pickers within the population, then I’m doing to demonstrate the approximate solution to the same problem using a slightly more realistic prior distribution.
The problem: Over 1,000 trials a handicapper has demonstrated a pick rate of 60%. We wish to determine the posterior distribution (i.e., after considering the available evidence) of the handicapper’s true pick rate. We’ll then use the posterior distribution to estimate the handicapper’s probability of picking 55% or greater over his next 1,000 picks.
Case 1: A uniform prior distribution of true pick rates within the population.
Bayes’ theorem states:
P(H|E) = P(E|H) * P(H) / P(E)Where P() is the probability operator, H is a given hypothesis, and E is the observed evidence. In this case E corresponds to the observed 600 wins over 1,000 trials, and H could correspond to any given hypothesis regarding the handicapper’s success rate (for example, the hypothesis that the handicapper is actually a 60% picker, or the hypothesis that the handicapper is actually a 50% picker). What we’re seeking is the probability distribution of H given the observation E.
Because P(E) represents the probability of observing E under all hypotheses, P(E) can be rewritten as follows:
P(E) = ∫ P(E|Hx) * P(Hx) dxHowever, because we’re assuming a uniform prior distribution of all hypotheses we have P(Hx) = P(Hy) for all x, y. This allows us to bring the term P(Hx) outside the integral and then cancel it out with the P(H) term in the numerator.
So this gives us:
P(H|E) = P(E|H) / ∫ P(E|Hx) dxWe can approximate the above integral using the discrete sum:
P(E) ≈ Σi P(E|Hi)over some countable set of possible hypotheses regarding the handicapper's true pick rate.
If you open up the attached spreadsheet you'll see in column A the set of different hypotheses considered. Each cell corresponds to the hypothesis that "the picker's true pick rate is within 0.05% of the value within the cell".
In column B we have the probability of P(E|H) which is the conditional probability of observing the evidence (600 winners out of 1,000) given the hypothesis in the corresponding cell in column A. This is simply the p-value from the binomial distribution (caveat: we're using a linear interpolation here which is only approximately correct).
Taking the sum of values in column B gives us Σ P(E|Hi), which in column C we use to divide each value in column B. This gives us the normalized likelihood of the stated hypothesis being true given the observed evidence.
Column D is the (linearly interpolated) probability of winning 550 of the next 1,000 picks assuming the given hypothesis is true.
Multiplying together columns C and D (results in column E) and then taking the sum (cell E1002) gives us the (approximate) expected value of the true pick rate, 98.9%
This means that if the distribution of pick rates within the population were uniform there would be about a 98.9% probability of the handicapper going at least 55% over his next 1,000 plays.
Case 2: A Gaussian prior distribution of true pick rates within the population. We'll assume a mean of 50% and a standard deviation of 1.5% (this implies that 0.0000000013% of the population is truly a 60% handicapper or better, while 0.0429% of the population is a 55% handicapper or better).
We start again with Bayes' Theorem:
P(H|E) = P(E|H) * P(H) / P(E)and the discrete sum approximation of P(E):
P(E) ≈ Σ P(E|Hi) * P(Hi)(Note that because values of P(Hi)'s are in general not the same for different hypothesis we can no longer cancel P(H) in the numerator and denominator.)
In Excel columns F & G we calculate the standard Z-scores for each hypothesis. Zl corresponds to the hypothesis that the true pick rate is more than 0.05% lower than the hypothesized value in column A, while Zu corresponds to the hypothesis that the true pick rate is more than 0.05% higher than the hypothesized value in column A.
Taking N(Zu) - N(Zl) (column H) gives us the prior probability of randomly selecting a handicapper from the population with a true pick rate within 0.05% of column A. This corresponds to P(Hi), the prior distribution of Hi (which we've assumed to be normal).
Multiplying the prior probability on column H by the conditional probability of observing 600 wins out of 1,000 picks (column B) gives us column I, which is the numerator of Bayes' theorem. Taking the sum (cell I1002) and dividing through each value in I gives us column J, the normalized likelihood of the stated hypothesis given the evidence.
Multiplying column J by the probability hitting of 550/1000 given the hypothesis yields column K.
Taking the sum of column K gives us the probability of the observed handicapper going at least 55% over his next 1,000 plays, which is about 46.1% (cell K1002).
Conclusion:
What we see here is that our estimates of the handicapper's posterior probability are highly dependent on our assumptions about the prior probability distribution. This is quite common in problems of Bayesian inference.
Bayesian inference allows us refine our beliefs regarding the truth of a specific hypothesis based on the availability of additional information. Specifically, Bayes' Theorem tells us the degree to which our beliefs should change as new information comes to light. In the two examples above we had certain prior beliefs regarding the probability of a randomly selected handicapper winning 550 out of his next 1,000 plays (it would work out to a 45.1% probability in the case of the uniform distribution and a 1.152% probability in the case the Gaussian distribution). However, based on the observed evidence (winning 600 out of his previous 1,000 plays) in each situation our estimate of his future probability increased.
SBR Founder Join Date: 8/28/2005