Is there any who can show how to calculate the cumulative probability of something occurring when multiple probabilities are involved?
I am aware of how to calculate the binomial distribution for probabilities that are constant but not when they vary. For example, if a fair coin was flipped 4 times and I was making a guess each time, the probability of me guessing wrong at least:
4 times is 6.25% or 1 in 16 experiments
3 or more times is 31.25% or 5 in 16 experiments
2 or more times is 68.75% or 11 in 16 experiments
1 or more times 93.75% or 15 in 16 experiments
0 or more times is 100% or 16 in 16 experiments
So, here are some relevant probabilities (actually they are real NHL ML numbers) and there outcomes (1=win, 0=lose):
Assuming a bet to win an equal percentage of bankroll for each of the above events, my results would be:
Win 48.332811 units, Bet 41.766906 units for a net ROI of 15.7%.
2 questions:
1) What was the probability that I would win "at least" the 15.7% that was obtained during this experiment and how is that calculated?
2) How do you calculate the probability of winning or losing various ROI amounts? (e.g. win 2%, 5%, 10%, etc.)
There are bonus smiley faces etc. if Excel type formulas are included with math notations answers.
I should add that up until now what I do is run simulations in Excel based on the odds. For example, I have my list of bets and their odds (e.g. +100 = .50, -150 = .60, etc.) and run at least 10,000 random simulations with these odds (e.g. if a random number is less than the odds, it is considered a win otherwise it is a loss). What this does for me is to determine how likely I would have been to win various ROI amounts purely by chance.
For example, I ran 10,000 random events on the entire population of 55 odds shown in the first post. Here were the results:
To win at least 15.7% (which is what I won), there is a 15.54% chance that it occurred totally randomly. I interpret this to mean that my 15.7% ROI is significant at the 15.54% level (notwithstanding my sample size of 10,000).
Here are other numbers and levels using the odds in the original post:
And finally, to have confidence at the 5% level (95% sure that my results are not by chance notwithstanding my sample size of 10,000)
Win 25.255% ROI.....5% Chance
Therefore, I could say that the null hypothesis (that I was lucky) should be accepted because there is a 15.54% chance that the results were random and the alternative hypothesis (that the algorithm that produced the choices was profitable) should be rejected.
What I am looking for is a mathematical calculation that will save me the time and inaccuracy of these simulations.
Last edited by VideoReview : 02-14-2008 at 05:42 PM.
Reason: Clarification
Is there any who can show how to calculate the cumulative probability of something occurring when multiple probabilities are involved?
I am aware of how to calculate the binomial distribution for probabilities that are constant but not when they vary. For example, if a fair coin was flipped 4 times and I was making a guess each time, the probability of me guessing wrong at least:
4 times is 6.25% or 1 in 16 experiments
3 or more times is 31.25% or 5 in 16 experiments
2 or more times is 68.75% or 11 in 16 experiments
1 or more times 93.75% or 15 in 16 experiments
0 or more times is 100% or 16 in 16 experiments
So, here are some relevant probabilities (actually they are real NHL ML numbers) and there outcomes (1=win, 0=lose):
Assuming a bet to win an equal percentage of bankroll for each of the above events, my results would be:
Win 48.332811 units, Bet 41.766906 units for a net ROI of 15.7%.
2 questions:
1) What was the probability that I would win "at least" the 15.7% that was obtained during this experiment and how is that calculated?
2) How do you calculate the probability of winning or losing various ROI amounts? (e.g. win 2%, 5%, 10%, etc.)
There are bonus smiley faces etc. if Excel type formulas are included with math notations answers.
I know nothing of Excel. In this case a reasonable null hypothesis is that the no-vig ML is the true probability. Computing the probability of M or more successes in N trials of this sort is straightforward but laborious. You could reasonably use the normal approximation and the fact that the variance of the sum of independent events is the sum of the variances of the events. The variance of single binomial trial is p*(1-p).
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,586
There's nothing wrong with estimating p-values using Monte Carlo simulations. There is, however, something wrong with estimating p-values using Monte Carlo simulations in Excel. It's hard on the soul.
Here's a very simple Monte Carlo script coded in Perl. (You can download a free copy of Perl from http://www.activeperl.com/.) On a decent machine you should easily be able to run 2,000,000 55-bet trials in a minute or less. You should modify the EDGE, TRIALS, and BOGEY constants to suit your needs.
Code:
#!perl
# Author: ganchrow@sbrforum.com
# a very simple implementation of the
# Monte Carlo method in fixed odds
# sports betting
use strict;
use warnings;
### edit from here ###
use constant EDGE => 0;
use constant TRIALS => 3_000_000;
use constant BOGEY => 0.1572035; # as % of risk amount
### don't edit below this line unless you know what you're doing ###
my @odds_ra = ();
my $total_risk = 0;
while(<>) {
chomp;
my ($us,) = split;
next unless $us;
my $dec = &us2dec($us);
my $prob = (1+EDGE)/$dec;
my $risk = 1/($dec-1);
push @odds_ra, [$prob, $dec, $risk, 1];
$total_risk += $risk;
}
my ($sum,$sumsq,$qualifiers,) = (0.0, 0.0, 0,);
my $pct_bogey = BOGEY * $total_risk ;
foreach my $i ( 1 .. TRIALS) {
my $this_trial_result = 0;
print STDERR "Trial $i\n" if $i%10_000 == 0;
foreach my $j (0 .. $#odds_ra) {
my ($prob, $dec, $risk, $win,) = @{$odds_ra[$j]};
my $r = rand();
my $this_bet_result;
if ($r < $prob) {
# win
$this_bet_result = $win;
} else {
$this_bet_result = -$risk;
}
$this_trial_result += $this_bet_result;
}
print "$this_trial_result\n";
$qualifiers++ if $this_trial_result >= $pct_bogey;
$sum += $this_trial_result;
$sumsq += $this_trial_result*$this_trial_result;
}
my $mean = $sum / TRIALS;
my $stddev = sqrt($sumsq / TRIALS - $mean*$mean);
my $frequency = $qualifiers / TRIALS;
print STDERR "Mean \t$mean\n";
print STDERR "Std. Dev.\t$stddev\n";
print STDERR "Qual \t$frequency\n";
sub us2dec {
my $us = shift;
return (
$us >= 0 ? 1+$us/100 : 1-100/$us
);
}
The script takes a text file of newline separated US-style odds and outputs to STDOUT the results of each of the trials (so you'll want to redirect STDOUT to a file), and to STDERR the mean, variance, and frequency with which the specified bogey (about 15.7% in your example) is reached.
I'll just note that the script uses the Perl built-in rand() function, which has a fairly low periodicity. There's a Perl module available from CPAN that implements the Mersenne twister pseudorandom number generation algorithm, which can be used a drop-in replacement for rand(). If you're going to be doing any moderately serious Monte Carlo sims and don't feel like coding in C, you should definitely hook that up (although it will slow down your sim). You can even seed it with data from http://www.random.org. It's about 10 extra lines of code. Let me know if you want it.
The only way to calculate an exact closed-form solution goes, would be to enumerate each of the 255 different outcomes, which would of course be completely impractical.
Another possibility would be to break up the 55 bets into manageable tranches of (let's say) 11 bets apiece, enumerate the results for each tranche using, and then determine exact p-values using the binomial distribution. You could then use Fisher's chi-square method to determine a joint significance for the entire data set.
Assuming all bets are independent of one another, then the simplest method would just be to appeal to the Central Limit Theorem. Take the sums of the variances of each bet (betting to win n unit at decimal odds d and edge E, variance would be (1+E)*(d-E-1) * n^2 / (d-1)^2) and then take the square root of that sum to obtain the standard deviation and a z-score. (So for zero-edge and betting to win 1 unit, variance would just be d-1.)
In your example, assuming no edge, we get a standard deviation of about 15.47%. This means your results of ~15.72% is about 1.016 standard devs from breakeven for a p-value of about 15.48% (=1-NORMSDIST(1.016)).
There's nothing wrong with estimating p-values using Monte Carlo simulations. There is, however, something wrong with estimating p-values using Monte Carlo simulations in Excel.
Here's a very simple Monte Carlo script coded in Perl.
-snip-
Ganchrow...whats your educational background?
__________________
"Black Maybach, white seats, black pipin'/Remind me of Paul Mccartney and Mike fightin'/The girl is mine, life's a bitch/So the whole world is mine!"
There's nothing wrong with estimating p-values using Monte Carlo simulations. There is, however, something wrong with estimating p-values using Monte Carlo simulations in Excel. It's hard on the soul.
Here's a very simple Monte Carlo script coded in Perl. (You can download a free copy of Perl from http://www.activeperl.com/.) On a decent machine you should easily be able to run 2,000,000 55-bet trials in a minute or less. You should modify the EDGE, TRIALS, and BOGEY constants to suit your needs.
Assuming all bets are independent of one another, then the simplest method would just be to appeal to the Central Limit Theorem. Take the sums of the variances of each bet (betting to win n unit at decimal odds d and edge E, variance would be (1+E)*(d-E-1) * n^2 / (d-1)^2) and then take the square root of that sum to obtain the standard deviation and a z-score. (So for zero-edge and betting to win 1 unit, variance would just be d-1.)
In your example, assuming no edge, we get a standard deviation of about 15.47%. This means your results of ~15.72% is about 1.016 standard devs from breakeven for a p-value of about 15.48% (=1-NORMSDIST(1.016)).
Monte Carlo method eh? Well, I feel pretty good that there is an official name for what I was doing. I thought I had invented it. You do have the correct picture of me sitting for 15 minutes waiting for Excel to chug through 10,000 simulations. Thank you for the link to the program code. I haven't downloaded it yet as I am just reading this post now but I would also be interested in the randomizing code you wrote about. Seems to me if I was running 100,000,000 trials to test a model that I would want to stay away from any repeating pattern.
Regarding your simplest suggestion, there seems to be one parenthesis missing and I can not make the equation work.
If I have:
E=0 (Edge)
d=.6 (American Odds of +150)
n=1 (Units to win)
Then I get:
(1+0)*(.6-0-1) * 1^2 / (.6-1)^2)
From my understanding, I assume I would calculate the results of the above equation for each of the 55 independent samples and take the square root which will give me the standard deviation. I believe the above example of +150 would evaluate to 3.75 if I ignore the last parenthesis. The square root of many such large numbers will not come close to 15.48%, so I am lost. Also, do the z-test and standard deviation both evaluate to 15.48% in your example and this is why you are able to get both numbers from one?
I do have a question for you about adhering to the Central Limit Theorem. Not so much from this post but from other posts you have made. I get the feeling that when you say things like "as long as your comfortable appealing to the Central Limit Theorem" --> (not a direct quote as I am going from memory on this) that maybe I shouldn't be appealing to it and maybe I should be thinking along the lines of Bayesian. Here are 2 simple questions that I have often wondered. Relevant to sports betting, do you personally appeal to the Central Limit Theorem for calculating probabilities and significance? Are there situations that you do not?
Also, if it were you and you had the choice of running 100,000,000 Monte Carlo simulations (with the better randomizer in place of course) or using the Central Limit Theorem example you proposed, which would you choose?
Finally, the following equation did not work in Excel as it is missing parameters for the function. I am sure that they are probably assumed numbers like 1's and 0's but it would be helpful if you could fill them in for me.
15.48% (=1-NORMSDIST(1.016))
I promised lots of smiley faces for the Excel type formulas included with the answers so here they are. Thanks for coming through, again, for me Ganchrow.
I know nothing of Excel. In this case a reasonable null hypothesis is that the no-vig ML is the true probability. Computing the probability of M or more successes in N trials of this sort is straightforward but laborious. You could reasonably use the normal approximation and the fact that the variance of the sum of independent events is the sum of the variances of the events. The variance of single binomial trial is p*(1-p).
HTH
Thanks Chemist. Is p the no vig ML decimal odds?
So, if I have +150 odds, the variance would be .6*(1-.6)=.24?
If so, what do I do with several such numbers (i.e. .24, .25, .1, .6, etc.)?
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,586
Quote:
Originally Posted by VideoReview
I would also be interested in the randomizing code you wrote about.
After the line "use warnings;" add the following code:
Code:
use Math::Random::MT;
my $rand_gen;
BEGIN {
warn "Seeding random number generator.\n";
require LWP::Simple;
use constant RAND_URL => 'http://random.org/integers/?num=1248&min=0&max=65535&col=2&base=10&format=plain&rnd=new';
my (@seed);
foreach (split(/\n/, LWP::Simple::get(RAND_URL))) {
m/^([0-9]+)\s+([0-9]+)$/;
push @seed, $1 + $2*2**16;
}
$rand_gen = Math::Random::MT->new(@seed);
warn "Random number generator seeded.\n";
}
and then replace the line "my $r = rand();" with:
Code:
my $r = $rand_gen->rand();
This utilizes the Mersenne Twister algorithm with a 19,968-bit truly-random seed obtained from random.org. It's not cryptographically secure but it's more than adequate for Monte Carlo purposes.
Quote:
Originally Posted by VideoReview
Regarding your simplest suggestion, there seems to be one parenthesis missing and I can not make the equation work.
The entire clause "betting to win n unit at decimal odds d and edge E, variance would be (1+E)*(d-E-1) * n^2 / (d-1)^2" is in parentheses, so you can ignore the final paren mathematics-wise.
Quote:
Originally Posted by VideoReview
If I have:
d=.6 (American Odds of +150)
+150 in decimal odds would be 2.5. See my odds converter. This yields variance of (1+0)*(2.5-1) * 1^2 / (2.5-0-1)^2 = 2/3.
Quote:
Originally Posted by VideoReview
From my understanding, I assume I would calculate the results of the above equation for each of the 55 independent samples and take the square root which will give me the standard deviation. I believe the above example of +150 would evaluate to 3.75 if I ignore the last parenthesis. The square root of many such large numbers will not come close to 15.48%, so I am lost.
The sum of the variances evaluates to 41.7669 which, not coincidentally, is also the total amount wagered. (As I noted previously, when betting to win 1 unit at 0 expectation, the variance of a bet equals the units risked.) The square root of the variance (6.4627 units) is the standard deviation. Since your final results were +6.5659 units, your z-score is 6.5659 / 6.4627 ≈ 1.0160, implying a p-value of about 15.482%. That the p-value is so close to the standard deviation is mere coincidence.
Quote:
Originally Posted by VideoReview
I do have a question for you about adhering to the Central Limit Theorem. Not so much from this post but from other posts you have made. I get the feeling that when you say things like "as long as your comfortable appealing to the Central Limit Theorem" --> (not a direct quote as I am going from memory on this) that maybe I shouldn't be appealing to it and maybe I should be thinking along the lines of Bayesian. Here are 2 simple questions that I have often wondered. Relevant to sports betting, do you personally appeal to the Central Limit Theorem for calculating probabilities and significance? Are there situations that you do not?
I'm just covering my ass -- I don't want some smart-alec screaming the distribution of outcomes isn't actually normal. In your example the CLT provides for a very decent approximate answer. If you had many fewer data points or had a number of big dogs or favorites then your skewed distribution of possible outcomes would not be so-well served by asserting normality.
For example, if you change were to change the odds on the 1st bet to +1000 (but keep it as a win) then your units won wouldn't change but your CLT p-value would be 11.768%. A 10,000,000-trial Monte Carlo simulation of same yields a p-value of about 15.95%.
Quote:
Originally Posted by VideoReview
IAlso, if it were you and you had the choice of running 100,000,000 Monte Carlo simulations (with the better randomizer in place of course) or using the Central Limit Theorem example you proposed, which would you choose?
It really depends upon the data you're analyzing. As long as you have a let's say 30 or more data points with odds fairly close to even you'll get acceptably close results using the CLT. If you're concerned you can of course always verify your results with a quick Monte Carlo sim (couple million trials or so). If the results are comparable you can rest easy.
Quote:
Originally Posted by VideoReview
Finally, the following equation did not work in Excel as it is missing parameters for the function. I am sure that they are probably assumed numbers like 1's and 0's but it would be helpful if you could fill them in for me.
15.48% (=1-NORMSDIST(1.016))
Are you sure you entered it as =1-NORMSDIST(1.016) and didn't omit the 'S'? That would give you the "too few arguments for this function" error message in Excel.
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,586
Quote:
Originally Posted by VideoReview
Thanks Chemist. Is p the no vig ML decimal odds?
So, if I have +150 odds, the variance would be .6*(1-.6)=.24?
If so, what do I do with several such numbers (i.e. .24, .25, .1, .6, etc.)?
p represents the expected win probability and could be calculated as (1+edge)/(decimal odds).
p*(1-p) represents the variance of a single binomial trial with success rate of p. Multiply that by the decimal payout odds squared and you'll get the variaance on a 1-unit risked bet. This is of course equivalent to the σ2 = (1+edge)*(decimal odds-edge-1) formulation.