| SBR Top-Rated Sportsbooks Recommended Books | ||
| 1. Pinnacle Sports | SBR Rating A+ | Pinnacle Sports Review |
| 2. The Greek Sports Book | SBR Rating A+ | The Greek Review |
| 3. BookMaker | SBR Rating A+ | BookMaker Review |
| 4. BetJamaica | SBR Rating A+ | BetJamaica Review |
| 5. LegendZ Sports | SBR Rating A+ | LegendZ Review |
| SBR Posters' Poll - March 2009 View Complete Results | ||
| 1. BetJamaica | 251 total points | BetJamaica Review |
| 2. The Greek Sports Book | 217 total points | The Greek Review |
| 3. 5Dimes | 181 total points | 5Dimes Review |
| 4. Matchbook | 159 total points | Matchbook Review |
| 5. Pinnacle Sports | 148 total points | Pinnacle Sports Review |
![]() |
View New Posts |
|
|
LinkBack | Thread Tools |
|
|
#1 | ||||
|
This is partly intended for Ganchrow, but anyone can respond:
Let's say I want to use statistics to evaluate a certain angle, or capper, or tout, or whatever. I know if I have some observations, I can use the binomial distribution, or normal approx. for large samples, to calculate a p-value, given a null hypothesis. So if I observe an ATS record of 9-2, and I make the null hypothesis that each of this system/capper/etc.'s picks will hit with probability q, p = (11 choose 2) * q^9 * (1-q)^2 + (11 choose 1) * q^10 * (1-q) + (11 choose 0) * q^11 And I can choose some significance level p*, and reject the null if p < p*. I hope that's right, anyway. What I'd like to know is how things change if I have lots of sets of observations - say I'm data-mining, and I look at 100 different angles at once. Or I'm evaluating lots of handicappers - maybe the BTP contest - and I want to know, quantitatively, "Can any of these guys really cap?" Obviously, I can calculate p-values for each individual set of picks. But intuitively, they don't appear to have the some meaning. If the smallest p-value out of 700 is less than some p*, well, maybe you would probably expect somebody to do that well just from sheer luck and the high sample size. I did get that if you take the null to be "all of these guys' picks hit with the same probability q" then you can add all the results together and get one p (for each q) for the entire sample. But if you rejected that null all you would be able to say is "at least one of these cappers wins with probability greater than q". But that's not that useful. What I'm wondering is if there is a way to come up with an "adjusted" p for each observation, that takes into account that it was one of many. I hope that's clear. Thanks in advance for any input. ~ Max |
||||
|
|
#2 | ||||
|
Tennis evaluator
|
Paging Ganchrow...
|
||||
|
|
#3 | ||||
|
I'm pretty sure your intuition is wrong. I'm too lazy/tired to go through the math, but whether you're looking at one handicapper or a thousand, I don't think it would change anything.
|
||||
|
|
#4 | ||||
|
lost me after the 3rd paragraph
|
||||
|
|
#5 | ||||
|
USC ml
|
you're better than me. i stop reading after the first sentence.
![]()
__________________
话说天下大势,分久必合,合久必分。 钱 錢 argent Geld soldi お金 돈 dinheiro деньги dinero เงิน כסף, ממון raha λεφτά pengar danh từ |
||||
|
|
#6 | ||||
|
ganch, don, remp? whos here?
__________________
RIP #21 |
||||
|
|
#7 | ||||
|
excuse me, I got lost in the 3rd paragraph
|
||||
|
|
#8 | ||||
|
Seems like paralysis by analysis to me.
|
||||
|
|
#9 | ||||
|
I can't give an answer, but my gut tells me that when Ganch arrives that there may be a Chi-square distribution in our future.
|
||||
|
|
#10 | |||||
|
Quote:
With that caveat firmly in place, here are a couple of quick and dirty mathematical approaches. In my opinion, a clear understanding of the following would be a necessary, although by no means sufficient, precondition for embarking on any form of profit-oriented data mining sports betting project. The first approach considers the likelihood of the single best observed outcome, while the second considers the likelihood of a complete data set as extreme or better as that observed. Both tests are inherently one-tailed. Let's say you're looking at a single handicapper making straight up picks at unbiased lines. If he were a 50% handicapper his probability of picking N or more games correctly would be given by =1-BINOMDIST(N-1,100,50%,1), where BINOMDIST() is the Excel binomial distribution function. Hence, a 50% picker would only have a 4.4313% probability of picking 59 or more games correctly. (This is referred to as either of the "p-value" or the "significance level".)I'll note that when used to analyze contemporaneous contest results data the above methods would need to be adjusted to take into account underlying contest structure possibly including correlation between contestants' picks and the impact of stale lines on winning percentages. Another issue with the first method is that holding the desired significance level constant, as you increase the number of sample sets considered (i.e., the number of contest participants or the number of alternativ betting strategies considered), the incidence of Type-II errors ("false negatives") would also increase, decreasing the statistical power of these tests and rendering this form of analysis effectively useless. The can also be an issue with the second method, especially to the extent that only a relatively small number of truly talented pickers (or successful strategies) exist within a large population. If this becomes an issue there are certainly other (more complicated) testing methodologies featured in commercial statistical packages that you might consider.
__________________
|
|||||
|
|
#11 | ||||
|
Thanks Ganch.
So the first method, you are essentially creating an adjusted cdf for the max of N observations, and testing using that. The second method, you may have gone a bit deeper than I can follow, but I got the jist of it. FWIW, I'm not planning on doing any data-mining, as you are right, I would be in over my head. It was really more just curiosity. ~ Max |
||||
|
|
#12 | |||||
|
Quote:
![]() |
|||||
|
|
#13 | ||||||
|
Ganchrow, excellent article, as always.
Quote:
Quote:
|
||||||
|
|
#14 | |||||
|
Thanks.
It's apparent just by examining the test: Χ2[-2×Σi≤Mln(αi); 2*M] (where αi refers to the significance level of the ith capper, and integer M is the number of cappers being tested, corresponding to half the degrees of freedom). If we have a large number of talented cappers within the population, then their α's will be low and the Χ2 will in turn show significance. As we increase the number of "average" cappers within the population we'd start seeing many more α's of around 50%. Even if these cappers were better than average, with α's of e-1 ≈ 36.79%, then the value of the tested would approach the degrees of freedom (twice the number of cappers). As d.o.f. apprioach infinity the Χ2 approaches normality with a mean equal to the d.o.f., and so the significance of the test would approach 50%. Mind you, this occurs when filling in with cappers better than average. Quote:
Really what needs to eb done here is some form of categorical analysis. We aren't looking to determine how good the single best capper is, or how good the population is as a whole, but rather how good a particular unspecified category of capper is. To this end I believe a test known as Mantel-Haenszel may be applicable. To be perfectly honest I don't quite remember the details other than it's a chi-squared test.
__________________
|
|||||
|
|
#15 | ||||
|
Why would we include the average cappers in our test? The Fisher method allows us to test "a basket" of cappers/strategies in a manner we test a single capper using the first (null-hypothesis) method and we want only the best cappers/strategies in that basket. What I do not immediately see is how to account for the low likehood (Bayesian-wise) of successful strategy in general population when translated into our cherry-picked sample.
|
||||
|
|
#16 | |||||
|
Quote:
If you were to limit the test population to just a superior subset, then of course the chi-square would be significant because you'd only be including the most significant results. Fisher tests the joint significance of all the results -- testing only the most significant for significance makes no sense. Applying Fisher in this manner would be akin to applying the first described method to only a portion of the results. You can't pick out the best results within a contest and pretend the other results never happened. Now of course if you had prior knowledge that a certain group was likely to outperform then you could certainly use Fisher on just that group -- but you can't use the data set itself to come to that conclusion (in other words you'd need separate in-sample and out-of-sample data sets). But as I understood the OP's initial question the whole point is to identify the talented cappers in the first place. Now I'm not saying Fisher is useless in this regard, but rather that it couldn't be used in its raw form on a portion of the data set that's determined by the data set itself -- that would be very bad practice. And if the portion of the data set which is selected has "only a relatively small number of truly talented pickers existing within it" ... then the problem I identified in my initial first post might crop up. In other words, if you select you properly select you data set in-sample, then there's no way to guarantee that this won't be an issue with Fisher out-of-sample.
__________________
|
|||||
|
|
#17 | ||||||
|
Quote:
I do not know what practical result can be achieved by applying this test to all the cappers in a given contest but I would guess none. Quote:
|
||||||
|
|
#18 | |||||
|
Quote:
![]() You can choose to use Fisher in any manner you like. However, if your selection of cappers is determined by their individual significance levels, and then you use those same individual significance levels as inputs for Fisher, you're engaging in data dredging at its most blatant. The point of testing the Fisher statistics against the chi-square is to determine the likelihood of attaining that product of significance levels or lower. If, however, you've not properly determined your significance levels because you've not properly conditioned them on their of likelihood of appearing within your chosen subset in the first place -- then the Fisher Method will routinely deliver spurious results. Try this experiment in Excel. Generate the results in column A from 500 samples of 100 randomized binomial trials assuming a 50% success rate. So cells A1:A500 would each look something like: =CRITBINOM(100,50%,RAND()). (These would represent the results of 500 talentless handicapper each picking 100 games.) In column B, display each capper's significance levels(so cell B1 would read =1-BINOMDIST(A1-1,100,50%,1), B2 would read =1-BINOMDIST(A2-1,100,50%,1), etc.) In column C fill in the natural logarithms of the values in column B (so cell C1 =ln(B1)). Then determine the test-wide Fisher statistic by setting cell D1 to =-2*SUM(C1:C500). To determine the significance, run a chi-square with 1,000 degrees of freedom by setting cell D2 =CHIDIST(D1,1000). Press F9 a few times to recalculate, checking the p-value in cell D2 each time. You should be seeing numbers fairly close to 100%, implying a lack of statistical significance. (The fact that it's generally SO close to 1 is implicative of the issue with Type II errors I had earlier mentioned). But now ... let's look at Fisher results if we cherry-pick a sample. Let's say we only look at the top 50% or better of pickers. What kind of Fisher method results will we see? Set cell D3 to the Fisher statistic of the cherry picked subset =-2*SUMIF(A1:A500,">="&PERCENTILE(A1:A500,50%),C1:C500). The number of handicappers in the top half would naturally be given by =COUNTIF(A1:A500,">="&PERCENTILE(A1:A500,50%)). Set cell C4 to the chi-squared p-value for the cherry picked sample =CHIDIST(D3,2*COUNTIF(A1:A500,">="&PERCENTILE(A1:A500,50%))). See a difference? Hit F9 a few times to make sure you aren't looking at some crazy aberration. You should be seeing results almost indistinguishable from zero, implying extreme significance. And we're not looking at some absurd subset either -- we're just looking at the top 50% of pickers drawn from a population that flips coins to determine picks. Now please don't get me wrong -- the problem here isn't inherent with Fisher itself, but rather with using incorrect p-values within the natural logarithm. Garbage-in, garbage-out, after all. The way in which one would need to properly apply Fisher in this particular instance would be by using p-values in the logs in column C that were conditioned on having been found in the top 50% of results. Without that conditioning you're going to get spurious results every time. The easiest way to handle the conditioning would be by appealing to the Central Limit Theorem as much as possible, while the correct way, probably involving Clopper-Pearson binomial intervals, would certainly be much tricker. That said as long as you're not overly proud, aren't trying to earn some sort of degree in statistics, and aren't serving time in prison, you're probably best off just appealing to that great equalizer among statisticians -- the Monte Carlo. ![]()
__________________
|
|||||
|
|
#19 | |||||||
|
Quote:
Quote:
Quote:
|
|||||||
![]() |
| Thread Tools | |
| Display Modes | |
|
|