SBR Forum - Free Picks & Sports Handicapping Legendz The PIG WSEX
BetJamaica Matchbook BetOnline
SBR - SportsBookReview.com 5Dimes The Greek Intertops
SBR Recommended Sportsbooks
1. Pinnacle Sports ... SBR Rating A+ ... Pinnacle Sports Review
2. The Greek Sports Book ... SBR Rating A+ ... The Greek Review
3. BookMaker ... SBR Rating A+ ... BookMaker Review
4. BetJamaica ... SBR Rating A+ ... BetJamaica Review
5. Legendz Sports ... SBR Rating A+ ... Legendz Review
Posters' Top Rated Sportsbooks
1. Matchbook ... 195 total points ... Matchbook Review
2. BetJamaica ... 182 total points ... BetJamaica Review
3. The Greek Sports Book ... 160 total points ... The Greek Review
4. Pinnacle Sports ... 130 total points ... Pinnacle Sports Review
5. 5Dimes ... 125 total points ... 5Dimes Review
Go Back   Sports Handicapping - Sports Betting - Sports Picks - SBR Forum > Sports Betting, Sportsbooks & General Discussion > Handicapper Think Tank

Reply
 
Thread Tools Display Modes
Old 01-04-2008, 09:59 PM   #1 (permalink)
maxpower79
SBR Rookie
 
Join Date: 02-01-07
Posts: 9
maxpower79 is offline
Default Math/Statistics question

This is partly intended for Ganchrow, but anyone can respond:

Let's say I want to use statistics to evaluate a certain angle, or capper, or tout, or whatever.

I know if I have some observations, I can use the binomial distribution, or normal approx. for large samples, to calculate a p-value, given a null hypothesis. So if I observe an ATS record of 9-2, and I make the null hypothesis that each of this system/capper/etc.'s picks will hit with probability q,

p = (11 choose 2) * q^9 * (1-q)^2
+ (11 choose 1) * q^10 * (1-q)
+ (11 choose 0) * q^11

And I can choose some significance level p*, and reject the null if p < p*. I hope that's right, anyway.

What I'd like to know is how things change if I have lots of sets of observations - say I'm data-mining, and I look at 100 different angles at once. Or I'm evaluating lots of handicappers - maybe the BTP contest - and I want to know, quantitatively, "Can any of these guys really cap?" Obviously, I can calculate p-values for each individual set of picks. But intuitively, they don't appear to have the some meaning. If the smallest p-value out of 700 is less than some p*, well, maybe you would probably expect somebody to do that well just from sheer luck and the high sample size.

I did get that if you take the null to be "all of these guys' picks hit with the same probability q" then you can add all the results together and get one p (for each q) for the entire sample. But if you rejected that null all you would be able to say is "at least one of these cappers wins with probability greater than q". But that's not that useful. What I'm wondering is if there is a way to come up with an "adjusted" p for each observation, that takes into account that it was one of many.

I hope that's clear. Thanks in advance for any input.

~ Max
Reply With Quote
Old 01-04-2008, 10:02 PM   #2 (permalink)
CrazyL
VIP Moderator
 
CrazyL's Avatar
 
Join Date: 08-02-07
Posts: 17,082
CrazyL is online now
Default

Paging Ganchrow...
Reply With Quote
Old 01-04-2008, 11:11 PM   #3 (permalink)
calm
SBR Rookie
 
Join Date: 01-03-08
Posts: 16
calm is online now
Default

I'm pretty sure your intuition is wrong. I'm too lazy/tired to go through the math, but whether you're looking at one handicapper or a thousand, I don't think it would change anything.
Reply With Quote
Old 01-04-2008, 11:20 PM   #4 (permalink)
pokernut9999
SBR Hall of Famer
 
pokernut9999's Avatar
 
Join Date: 07-25-07
Location: South Carolina
Posts: 5,479
pokernut9999 is online now
Default

lost me after the 3rd paragraph
Reply With Quote
Old 01-04-2008, 11:30 PM   #5 (permalink)
picoman
SBR Hall of Famer
 
picoman's Avatar
 
Join Date: 04-05-07
Location: the moon
Posts: 10,269
picoman is online now
Default

Quote:
Originally Posted by pokernut9999 View Post
lost me after the 3rd paragraph
you're better than me. i stop reading after the first sentence.
__________________
"But you can't have your dream without laying something on the line. The key is not to risk what you can't afford to lose. You might think you're different, but someday you gonna want more too. The quesiton is what are you willing to lay on the line."
钱 錢 argent Geld soldi お金 돈 dinheiro деньги dinero เงิน כסף, ממון raha λεφτά pengar danh từ money
Reply With Quote
Old 01-04-2008, 11:33 PM   #6 (permalink)
mofome
SBR Hall of Famer
 
Join Date: 12-19-07
Posts: 13,248
mofome is offline
Default

ganch, don, remp? whos here?
__________________
RIP #21
Reply With Quote
Old 01-04-2008, 11:34 PM   #7 (permalink)
pokernut9999
SBR Hall of Famer
 
pokernut9999's Avatar
 
Join Date: 07-25-07
Location: South Carolina
Posts: 5,479
pokernut9999 is online now
Default

excuse me, I got lost in the 3rd paragraph
Reply With Quote
Old 01-04-2008, 11:54 PM   #8 (permalink)
CHALKbreaker
SBR High Roller
 
CHALKbreaker's Avatar
 
Join Date: 12-27-07
Location: Wisconsin
Posts: 115
CHALKbreaker is offline
Default

Seems like paralysis by analysis to me.
Reply With Quote
Old 01-05-2008, 12:03 AM   #9 (permalink)
DrunkenLullaby
SBR MVP
 
DrunkenLullaby's Avatar
 
Join Date: 03-30-07
Posts: 1,650
DrunkenLullaby is offline
Default

I can't give an answer, but my gut tells me that when Ganch arrives that there may be a Chi-square distribution in our future.
__________________
Quote:
Originally Posted by WE EAT FISH View Post
If most people were 500 percent every day they would be HAPPY but I AM NOT.
Reply With Quote
Old 01-05-2008, 03:45 AM   #10 (permalink)
Ganchrow
Moderator
 
Ganchrow's Avatar
 
Join Date: 08-28-05
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,587
Ganchrow is offline
Default

Quote:
Originally Posted by maxpower79 View Post
What I'd like to know is how things change if I have lots of sets of observations - say I'm data-mining, and I look at 100 different angles at once. Or I'm evaluating lots of handicappers - maybe the BTP contest - and I want to know, quantitatively, "Can any of these guys really cap?" Obviously, I can calculate p-values for each individual set of picks. But intuitively, they don't appear to have the some meaning. If the smallest p-value out of 700 is less than some p*, well, maybe you would probably expect somebody to do that well just from sheer luck and the high sample size.

I did get that if you take the null to be "all of these guys' picks hit with the same probability q" then you can add all the results together and get one p (for each q) for the entire sample. But if you rejected that null all you would be able to say is "at least one of these cappers wins with probability greater than q". But that's not that useful. What I'm wondering is if there is a way to come up with an "adjusted" p for each observation, that takes into account that it was one of many.
Dating mining is very dangerous business. Unless you really know what you're doing do yourself a favor and stay far, far away.

With that caveat firmly in place, here are a couple of quick and dirty mathematical approaches. In my opinion, a clear understanding of the following would be a necessary, although by no means sufficient, precondition for embarking on any form of profit-oriented data mining sports betting project. The first approach considers the likelihood of the single best observed outcome, while the second considers the likelihood of a complete data set as extreme or better as that observed. Both tests are inherently one-tailed.
Let's say you're looking at a single handicapper making straight up picks at unbiased lines. If he were a 50% handicapper his probability of picking N or more games correctly would be given by =1-BINOMDIST(N-1,100,50%,1), where BINOMDIST() is the Excel binomial distribution function. Hence, a 50% picker would only have a 4.4313% probability of picking 59 or more games correctly. (This is referred to as either of the "p-value" or the "significance level".)

If you were looking at M handicappers, each making sets of 100 independent picks, then assuming they were all 50% pickers:
  1. The probability of at least one handicapper picking n or more correctly would be =1-BINOMDIST(N-1,100,50%,1)^M.
  2. The joint probability of results as extreme as the entire outcome or better where the ith handicapper has made Ni correct picks could be approximated by =CHIDIST(-2×Σi≤Mln(1-BINOMDIST(Ni-1,100,50%,1)), 2*M), where CHIDIST() is the Excel chi-squared distribution function, which in this case is called with 2*M degrees of freedom. This is known as Fisher's method. A more accurate way of explaining it would be that the result of chi-square is the probability that the product of the individual significance levels would have the observed value or lower, assuming all pickers were 50/50.

So let's say you're looking at 5 pickers, each making 100 picks on unrelated games, with 60, 57, 52, 48, and 47 correct picks respectively. If all pickers were 50% pickers:
  1. We'd expect to see at least one of the five picking 60 or more correctly with probability =1-BINOMDIST(60-1,100,50%,1)^5 ≈ 13.44%.
  2. The joint probability of the entire outcome or better would be =CHIDIST(-2*ln(2.84%*9.67%*38.22%*69.14%*75.79%),1 0) ≈ 13.17%. (It might pay to give an example of what would be considered "the same outcome or better". This phrase refers to the product of the significance levels over independent trials, which in this case works out to be about 0.05507%, the fifth root of which is 22.29%. This happens to be is the approximate significance of picking a bit less that 512 out of 1,000 correctly (1-BINOMDIST(512-1,1000,50%,1) ≈ 23.352%). Hence, the above outcome is, by the standards of Fisher's method, would be about as "extreme" as 5 people all picking 512 out of 1,000 correctly.)
I'll note that when used to analyze contemporaneous contest results data the above methods would need to be adjusted to take into account underlying contest structure possibly including correlation between contestants' picks and the impact of stale lines on winning percentages.

Another issue with the first method is that holding the desired significance level constant, as you increase the number of sample sets considered (i.e., the number of contest participants or the number of alternativ betting strategies considered), the incidence of Type-II errors ("false negatives") would also increase, decreasing the statistical power of these tests and rendering this form of analysis effectively useless.

The can also be an issue with the second method, especially to the extent that only a relatively small number of truly talented pickers (or successful strategies) exist within a large population. If this becomes an issue there are certainly other (more complicated) testing methodologies featured in commercial statistical packages that you might consider.
__________________
Reply With Quote
Old 01-05-2008, 03:44 PM   #11 (permalink)
maxpower79
SBR Rookie
 
Join Date: 02-01-07
Posts: 9
maxpower79 is offline
Default

Thanks Ganch.

So the first method, you are essentially creating an adjusted cdf for the max of N observations, and testing using that. The second method, you may have gone a bit deeper than I can follow, but I got the jist of it.

FWIW, I'm not planning on doing any data-mining, as you are right, I would be in over my head. It was really more just curiosity.

~ Max
Reply With Quote
Old 01-06-2008, 12:27 AM   #12 (permalink)
DrunkenLullaby
SBR MVP
 
DrunkenLullaby's Avatar
 
Join Date: 03-30-07
Posts: 1,650
DrunkenLullaby is offline
Default

Quote:
Originally Posted by Ganchrow View Post
where CHIDIST() is the Excel chi-squared distribution function,
__________________
Quote:
Originally Posted by WE EAT FISH View Post
If most people were 500 percent every day they would be HAPPY but I AM NOT.
Reply With Quote
Old 01-06-2008, 01:06 AM   #13 (permalink)
Data
SBR Wise Guy
 
Data's Avatar
 
Join Date: 11-27-07
Location: U.S.S. Enterprise NCC-1701-E
Posts: 939
Data is offline
Default

Ganchrow, excellent article, as always.

Quote:
Originally Posted by Ganchrow View Post
The can also be an issue with the second method, especially to the extent that only a relatively small number of truly talented pickers (or successful strategies) exist within a large population.
Could you explain why and how that small number becomes an issue?

Quote:
If this becomes an issue there are certainly other (more complicated) testing methodologies featured in commercial statistical packages that you might consider.
I am thinking about getting SPSS or may be even SAS. Can you elaborate how those packages can help here and their overall value for an analytical sportsbettor?
Reply With Quote
Old 01-06-2008, 11:28 AM   #14 (permalink)
Ganchrow
Moderator
 
Ganchrow's Avatar
 
Join Date: 08-28-05
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,587
Ganchrow is offline
Default

Quote:
Originally Posted by Data View Post
Ganchrow, excellent article, as always.
Thanks.

Quote:
Originally Posted by Data View Post
Could you explain why and how that small number becomes an issue?
It's apparent just by examining the test: Χ2[-2×Σi≤Mln(αi); 2*M] (where αi refers to the significance level of the ith capper, and integer M is the number of cappers being tested, corresponding to half the degrees of freedom). If we have a large number of talented cappers within the population, then their α's will be low and the Χ2 will in turn show significance. As we increase the number of "average" cappers within the population we'd start seeing many more α's of around 50%. Even if these cappers were better than average, with α's of e-1 ≈ 36.79%, then the value of the tested would approach the degrees of freedom (twice the number of cappers). As d.o.f. apprioach infinity the Χ2 approaches normality with a mean equal to the d.o.f., and so the significance of the test would approach 50%. Mind you, this occurs when filling in with cappers better than average.

Quote:
Originally Posted by Data View Post
I am thinking about getting SPSS or may be even SAS. Can you elaborate how those packages can help here and their overall value for an analytical sportsbettor?
In the past I've used both Mathematica and S+ professionally. I haven't used either SAS or SPSS since grad school. Nowadays, for no particularly good reason I primarily used self-programmed purpose-written libraries.

Really what needs to eb done here is some form of categorical analysis. We aren't looking to determine how good the single best capper is, or how good the population is as a whole, but rather how good a particular unspecified category of capper is. To this end I believe a test known as Mantel-Haenszel may be applicable. To be perfectly honest I don't quite remember the details other than it's a chi-squared test.
__________________
Reply With Quote
Old 01-06-2008, 02:18 PM   #15 (permalink)