Poster's Sportsbook Poll: OctoberView Poll Results
1. 5Dimes 450 total points 5Dimes Review
2. Pinnacle 408 total points Pinnacle Review
3. Heritage 227 total points Heritage Review
4. Bookmaker 138 total points Bookmaker Review
5. BetIslands 129 total points BetIslands Review
SBR Top-Rated Sportsbooks Recommended List
1. Pinnacle Sports SBR Rating A+ Pinnacle Sports Review
2. 5Dimes SBR Rating A+ 5Dimes Review
3. BookMaker SBR Rating A+ BookMaker Review
4. Legends SBR Rating A+ Legends Review
5. Bodog SBR Rating A Bodog Review
 
 
View New Posts
 
LinkBack Thread Tools
Old 08-03-08, 09:02 PM   #1
radbet
 
radbet's Avatar
Joined: 01-14-08
Posts: 11
 
Message Me
Default Calculating Confidence Interval

For NCAA football, I have a classification tree algorithm using the past 10 years of team data and my own ranking system. I am using the following equation to setup confidence intervals for my measuring accuracy for specific leaf nodes in my system:

p= (2 * N * (measured accuracy) + Z^2 +/- (SQRT(Z^2 + (4 * N * (measured accuracy) - 4 * N * (m accuracy)^2))
-------------------------------------------------------
(2 * N * Z^2)

Z=1.96 for 95% confidence.


My issue is when the N (# of events) gets small. There are several situations where the accuracy is high (90-100%) but the N is between 10 and 20. I know that standard confidence interval equations are accurate when the N is larger (>30 minimum).

Does anyone have a program or method of calculating confidence intervals with small sample sizes? I am wary of putting down $$$ without confirming the signifigance of these smaller nodes.

RadBet
Give Points Quick reply to this message
Old 08-03-08, 11:09 PM   #2
BuddyBear
 
BuddyBear's Avatar
SBR PRO
Joined: 08-10-05
Posts: 6,508
 
Message Me
Default

Quote:
Originally Posted by radbet View Post



Does anyone have a program or method of calculating confidence intervals with small sample sizes? I am wary of putting down $$$ without confirming the signifigance of these smaller nodes.

RadBet
Yeah, that is a pretty small sample size. Because of the small sample size, key assumptions related to normality are likely violated. Therefore you'll probably have to use some sort of nonparametric statistic to figure this problem out. I am not very good with non-parametric statistics but I am pretty sure there is an equivalent in nonparametric for confidence interval testing...I think it is something along the lines of bootstrapping or something like that.

I am sure Ganch knows.....
Give Points Quick reply to this message

SBR Founder Join Date: 8/10/2005

Old 08-04-08, 12:19 PM   #3
Ganchrow
Nolite te bastardes carborundorum.
 
Ganchrow's Avatar
Joined: 08-28-05
Posts: 5,003
 
Message Me
Default

Quote:
Originally Posted by radbet View Post
For NCAA football, I have a classification tree algorithm using the past 10 years of team data and my own ranking system. I am using the following equation to setup confidence intervals for my measuring accuracy for specific leaf nodes in my system:

p= (2 * N * (measured accuracy) + Z^2 +/- (SQRT(Z^2 + (4 * N * (measured accuracy) - 4 * N * (m accuracy)^2))
-------------------------------------------------------
(2 * N * Z^2)

Z=1.96 for 95% confidence.


My issue is when the N (# of events) gets small. There are several situations where the accuracy is high (90-100%) but the N is between 10 and 20. I know that standard confidence interval equations are accurate when the N is larger (>30 minimum).

Does anyone have a program or method of calculating confidence intervals with small sample sizes? I am wary of putting down $$$ without confirming the signifigance of these smaller nodes.

RadBet
Why don't you give a specific example, including an explanation of what exactly you mean by "measure accuracy"?
Give Points Quick reply to this message

SBR Founder Join Date: 8/28/2005

Old 08-04-08, 06:38 PM   #4
radbet
 
radbet's Avatar
Joined: 01-14-08
Posts: 11
 
Message Me
Default example

what I mean by measured accuracy is the percentage of correctly predicted outcomes of a node within my decision tree output.

Here is an example of a node I have:

Over past 4 years (1860 games), a situation has occurred 20 times with 18 of them resulting in a positive outcome (ie. correctly predicted victory). This gives an obvious predicted outcome % of 90%.

From my understanding (please correct me if i am wrong), if the N>30 in a classification tree node where the class variable has 2 values (ie win/loss), then the probability of the class variable outcome can be safely assumed to be a normal distribution (if the known distribution of variables is also normal). But, if N<30, then normality can not be assumed and predicting confidence intervals is more complicated.

By the way, the tree is built with 15 continous variables which are based on team rating, offensive/defensive scoring, and SOS. I have evaluated the variables extensively and they are all normally distributed.
Give Points Quick reply to this message
Old 08-04-08, 11:05 PM   #5
BuddyBear
 
BuddyBear's Avatar
SBR PRO
Joined: 08-10-05
Posts: 6,508
 
Message Me
Default

Well, if your main dependent variable of interest is binary (i.e. win/loss) then you could do what is called a logistic regression. Logistic regression is similar to standard OLS regression with the exception that the DV is binary and then the continuous variables could be included in the model. This, to me, seems more rigorous than confidence intervals b/c logistic regression would allow you to control for all those 15 variables in the model.

It's really all not that clear to me what you are trying to do except it seems like you are trying to predict win/loss based on a certain set of variables you've collected data on. If that is the case, regression would be able to do that for you.
Give Points Quick reply to this message

SBR Founder Join Date: 8/10/2005

 


Thread Tools



All times are GMT -5. The time now is 12:26 AM.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41