|
01-04-2008, 09:59 PM
|
#1 (permalink)
|
|
SBR Rookie
Join Date: 02-01-07
Posts: 9
|
Math/Statistics question
This is partly intended for Ganchrow, but anyone can respond:
Let's say I want to use statistics to evaluate a certain angle, or capper, or tout, or whatever.
I know if I have some observations, I can use the binomial distribution, or normal approx. for large samples, to calculate a p-value, given a null hypothesis. So if I observe an ATS record of 9-2, and I make the null hypothesis that each of this system/capper/etc.'s picks will hit with probability q,
p = (11 choose 2) * q^9 * (1-q)^2
+ (11 choose 1) * q^10 * (1-q)
+ (11 choose 0) * q^11
And I can choose some significance level p*, and reject the null if p < p*. I hope that's right, anyway.
What I'd like to know is how things change if I have lots of sets of observations - say I'm data-mining, and I look at 100 different angles at once. Or I'm evaluating lots of handicappers - maybe the BTP contest - and I want to know, quantitatively, "Can any of these guys really cap?" Obviously, I can calculate p-values for each individual set of picks. But intuitively, they don't appear to have the some meaning. If the smallest p-value out of 700 is less than some p*, well, maybe you would probably expect somebody to do that well just from sheer luck and the high sample size.
I did get that if you take the null to be "all of these guys' picks hit with the same probability q" then you can add all the results together and get one p (for each q) for the entire sample. But if you rejected that null all you would be able to say is "at least one of these cappers wins with probability greater than q". But that's not that useful. What I'm wondering is if there is a way to come up with an "adjusted" p for each observation, that takes into account that it was one of many.
I hope that's clear. Thanks in advance for any input.
~ Max
|
|
|
|
01-04-2008, 10:02 PM
|
#2 (permalink)
|
|
VIP Moderator
Join Date: 08-02-07
Posts: 17,082
|
Paging Ganchrow...
|
|
|
|
01-04-2008, 11:11 PM
|
#3 (permalink)
|
|
SBR Rookie
Join Date: 01-03-08
Posts: 16
|
I'm pretty sure your intuition is wrong. I'm too lazy/tired to go through the math, but whether you're looking at one handicapper or a thousand, I don't think it would change anything.
|
|
|
|
01-04-2008, 11:20 PM
|
#4 (permalink)
|
|
SBR Hall of Famer
Join Date: 07-25-07
Location: South Carolina
Posts: 5,479
|
lost me after the 3rd paragraph
|
|
|
|
01-04-2008, 11:30 PM
|
#5 (permalink)
|
|
SBR Hall of Famer
Join Date: 04-05-07
Location: the moon
Posts: 10,269
|
Quote:
Originally Posted by pokernut9999
lost me after the 3rd paragraph
|
you're better than me. i stop reading after the first sentence. 
__________________
"But you can't have your dream without laying something on the line. The key is not to risk what you can't afford to lose. You might think you're different, but someday you gonna want more too. The quesiton is what are you willing to lay on the line."
钱 錢 argent Geld soldi お金 돈 dinheiro деньги dinero เงิน כסף, ממון raha λεφτά pengar danh từ money
|
|
|
|
01-04-2008, 11:33 PM
|
#6 (permalink)
|
|
SBR Hall of Famer
Join Date: 12-19-07
Posts: 13,248
|
ganch, don, remp? whos here?
__________________
RIP #21
|
|
|
|
01-04-2008, 11:34 PM
|
#7 (permalink)
|
|
SBR Hall of Famer
Join Date: 07-25-07
Location: South Carolina
Posts: 5,479
|
excuse me, I got lost in the 3rd paragraph
|
|
|
|
01-04-2008, 11:54 PM
|
#8 (permalink)
|
|
SBR High Roller
Join Date: 12-27-07
Location: Wisconsin
Posts: 115
|
Seems like paralysis by analysis to me.
|
|
|
|
01-05-2008, 12:03 AM
|
#9 (permalink)
|
|
SBR MVP
Join Date: 03-30-07
Posts: 1,650
|
I can't give an answer, but my gut tells me that when Ganch arrives that there may be a Chi-square distribution in our future.
__________________
Quote:
Originally Posted by WE EAT FISH
If most people were 500 percent every day they would be HAPPY but I AM NOT.
|
|
|
|
|
01-05-2008, 03:45 AM
|
#10 (permalink)
|
|
Moderator
Join Date: 08-28-05
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,587
|
Quote:
Originally Posted by maxpower79
What I'd like to know is how things change if I have lots of sets of observations - say I'm data-mining, and I look at 100 different angles at once. Or I'm evaluating lots of handicappers - maybe the BTP contest - and I want to know, quantitatively, "Can any of these guys really cap?" Obviously, I can calculate p-values for each individual set of picks. But intuitively, they don't appear to have the some meaning. If the smallest p-value out of 700 is less than some p*, well, maybe you would probably expect somebody to do that well just from sheer luck and the high sample size.
I did get that if you take the null to be "all of these guys' picks hit with the same probability q" then you can add all the results together and get one p (for each q) for the entire sample. But if you rejected that null all you would be able to say is "at least one of these cappers wins with probability greater than q". But that's not that useful. What I'm wondering is if there is a way to come up with an "adjusted" p for each observation, that takes into account that it was one of many.
|
Dating mining is very dangerous business. Unless you really know what you're doing do yourself a favor and stay far, far away.
With that caveat firmly in place, here are a couple of quick and dirty mathematical approaches. In my opinion, a clear understanding of the following would be a necessary, although by no means sufficient, precondition for embarking on any form of profit-oriented data mining sports betting project. The first approach considers the likelihood of the single best observed outcome, while the second considers the likelihood of a complete data set as extreme or better as that observed. Both tests are inherently one-tailed.
Let's say you're looking at a single handicapper making straight up picks at unbiased lines. If he were a 50% handicapper his probability of picking N or more games correctly would be given by =1-BINOMDIST(N-1,100,50%,1), where BINOMDIST() is the Excel binomial distribution function. Hence, a 50% picker would only have a 4.4313% probability of picking 59 or more games correctly. (This is referred to as either of the "p-value" or the "significance level".)
If you were looking at M handicappers, each making sets of 100 independent picks, then assuming they were all 50% pickers:- The probability of at least one handicapper picking n or more correctly would be =1-BINOMDIST(N-1,100,50%,1)^M.
- The joint probability of results as extreme as the entire outcome or better where the ith handicapper has made Ni correct picks could be approximated by =CHIDIST(-2×Σi≤Mln(1-BINOMDIST(Ni-1,100,50%,1)), 2*M), where CHIDIST() is the Excel chi-squared distribution function, which in this case is called with 2*M degrees of freedom. This is known as Fisher's method. A more accurate way of explaining it would be that the result of chi-square is the probability that the product of the individual significance levels would have the observed value or lower, assuming all pickers were 50/50.
So let's say you're looking at 5 pickers, each making 100 picks on unrelated games, with 60, 57, 52, 48, and 47 correct picks respectively. If all pickers were 50% pickers:- We'd expect to see at least one of the five picking 60 or more correctly with probability =1-BINOMDIST(60-1,100,50%,1)^5 ≈ 13.44%.
- The joint probability of the entire outcome or better would be =CHIDIST(-2*ln(2.84%*9.67%*38.22%*69.14%*75.79%),1 0) ≈ 13.17%. (It might pay to give an example of what would be considered "the same outcome or better". This phrase refers to the product of the significance levels over independent trials, which in this case works out to be about 0.05507%, the fifth root of which is 22.29%. This happens to be is the approximate significance of picking a bit less that 512 out of 1,000 correctly (1-BINOMDIST(512-1,1000,50%,1) ≈ 23.352%). Hence, the above outcome is, by the standards of Fisher's method, would be about as "extreme" as 5 people all picking 512 out of 1,000 correctly.)
I'll note that when used to analyze contemporaneous contest results data the above methods would need to be adjusted to take into account underlying contest structure possibly including correlation between contestants' picks and the impact of stale lines on winning percentages.
Another issue with the first method is that holding the desired significance level constant, as you increase the number of sample sets considered (i.e., the number of contest participants or the number of alternativ betting strategies considered), the incidence of Type-II errors ("false negatives") would also increase, decreasing the statistical power of these tests and rendering this form of analysis effectively useless.
The can also be an issue with the second method, especially to the extent that only a relatively small number of truly talented pickers (or successful strategies) exist within a large population. If this becomes an issue there are certainly other (more complicated) testing methodologies featured in commercial statistical packages that you might consider.
__________________
|
|
|
|
01-05-2008, 03:44 PM
|
#11 (permalink)
|
|
SBR Rookie
Join Date: 02-01-07
Posts: 9
|
Thanks Ganch.
So the first method, you are essentially creating an adjusted cdf for the max of N observations, and testing using that. The second method, you may have gone a bit deeper than I can follow, but I got the jist of it.
FWIW, I'm not planning on doing any data-mining, as you are right, I would be in over my head. It was really more just curiosity.
~ Max
|
|
|
|
01-06-2008, 12:27 AM
|
#12 (permalink)
|
|
SBR MVP
Join Date: 03-30-07
Posts: 1,650
|
Quote:
Originally Posted by Ganchrow
where CHIDIST() is the Excel chi-squared distribution function,
|

__________________
Quote:
Originally Posted by WE EAT FISH
If most people were 500 percent every day they would be HAPPY but I AM NOT.
|
|
|
|
|
01-06-2008, 01:06 AM
|
#13 (permalink)
|
|
SBR Wise Guy
Join Date: 11-27-07
Location: U.S.S. Enterprise NCC-1701-E
Posts: 939
|
Ganchrow, excellent article, as always.
Quote:
Originally Posted by Ganchrow
The can also be an issue with the second method, especially to the extent that only a relatively small number of truly talented pickers (or successful strategies) exist within a large population.
|
Could you explain why and how that small number becomes an issue?
Quote:
|
If this becomes an issue there are certainly other (more complicated) testing methodologies featured in commercial statistical packages that you might consider.
|
I am thinking about getting SPSS or may be even SAS. Can you elaborate how those packages can help here and their overall value for an analytical sportsbettor?
|
|
|
|
01-06-2008, 11:28 AM
|
#14 (permalink)
|
|
Moderator
Join Date: 08-28-05
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,587
|
Quote:
Originally Posted by Data
Ganchrow, excellent article, as always. 
|
Thanks.
Quote:
Originally Posted by Data
Could you explain why and how that small number becomes an issue?
|
It's apparent just by examining the test: Χ2[-2× Σi≤Mln( αi); 2*M] (where αi refers to the significance level of the i th capper, and integer M is the number of cappers being tested, corresponding to half the degrees of freedom). If we have a large number of talented cappers within the population, then their α's will be low and the Χ2 will in turn show significance. As we increase the number of "average" cappers within the population we'd start seeing many more α's of around 50%. Even if these cappers were better than average, with α's of e -1 ≈ 36.79%, then the value of the tested would approach the degrees of freedom (twice the number of cappers). As d.o.f. apprioach infinity the Χ2 approaches normality with a mean equal to the d.o.f., and so the significance of the test would approach 50%. Mind you, this occurs when filling in with cappers better than average.
Quote:
Originally Posted by Data
I am thinking about getting SPSS or may be even SAS. Can you elaborate how those packages can help here and their overall value for an analytical sportsbettor?
|
In the past I've used both Mathematica and S+ professionally. I haven't used either SAS or SPSS since grad school. Nowadays, for no particularly good reason I primarily used self-programmed purpose-written libraries.
Really what needs to eb done here is some form of categorical analysis. We aren't looking to determine how good the single best capper is, or how good the population is as a whole, but rather how good a particular unspecified category of capper is. To this end I believe a test known as Mantel-Haenszel may be applicable. To be perfectly honest I don't quite remember the details other than it's a chi-squared test.
__________________
|
|
|
|
|