SBR Top-Rated Sportsbooks Recommended Books
1. Pinnacle Sports SBR Rating A+ Pinnacle Sports Review
2. The Greek Sports Book SBR Rating A+ The Greek Review
3. BookMaker SBR Rating A+ BookMaker Review
4. BetJamaica SBR Rating A+ BetJamaica Review
5. Legends Sports SBR Rating A+ Legends Review
 
SBR Posters' Poll - September 2009 View Complete Results
1. 5Dimes 253 total points 5Dimes Review
2. Matchbook 252 total points Matchbook Review
3. BetJamaica 194 total points BetJamaica Review
4. Pinnacle Sports 193 total points Pinnacle Sports Review
5. BookMaker 190 total points BookMaker Review
 
 
View New Posts
 
LinkBack Thread Tools
Old 08-03-08, 08:02 PM   #1
radbet
 
radbet's Avatar
Default Calculating Confidence Interval

For NCAA football, I have a classification tree algorithm using the past 10 years of team data and my own ranking system. I am using the following equation to setup confidence intervals for my measuring accuracy for specific leaf nodes in my system:

p= (2 * N * (measured accuracy) + Z^2 +/- (SQRT(Z^2 + (4 * N * (measured accuracy) - 4 * N * (m accuracy)^2))
-------------------------------------------------------
(2 * N * Z^2)

Z=1.96 for 95% confidence.


My issue is when the N (# of events) gets small. There are several situations where the accuracy is high (90-100%) but the N is between 10 and 20. I know that standard confidence interval equations are accurate when the N is larger (>30 minimum).

Does anyone have a program or method of calculating confidence intervals with small sample sizes? I am wary of putting down $$$ without confirming the signifigance of these smaller nodes.

RadBet
Quick reply to this message
Old 08-03-08, 10:09 PM   #2
BuddyBear
 
BuddyBear's Avatar
Joined: 08-10-05
Posts: 6,509
 
Message Me
Challenge Me
Default

Quote:
Originally Posted by radbet View Post



Does anyone have a program or method of calculating confidence intervals with small sample sizes? I am wary of putting down $$$ without confirming the signifigance of these smaller nodes.

RadBet
Yeah, that is a pretty small sample size. Because of the small sample size, key assumptions related to normality are likely violated. Therefore you'll probably have to use some sort of nonparametric statistic to figure this problem out. I am not very good with non-parametric statistics but I am pretty sure there is an equivalent in nonparametric for confidence interval testing...I think it is something along the lines of bootstrapping or something like that.

I am sure Ganch knows.....
Quick reply to this message
Old 08-04-08, 11:19 AM   #3
Ganchrow
Nolite te bastardes carborundorum.
 
Ganchrow's Avatar
Joined: 08-28-05
Posts: 4,784
 
Message Me
Challenge Me
Default

Quote:
Originally Posted by radbet View Post
For NCAA football, I have a classification tree algorithm using the past 10 years of team data and my own ranking system. I am using the following equation to setup confidence intervals for my measuring accuracy for specific leaf nodes in my system:

p= (2 * N * (measured accuracy) + Z^2 +/- (SQRT(Z^2 + (4 * N * (measured accuracy) - 4 * N * (m accuracy)^2))
-------------------------------------------------------
(2 * N * Z^2)

Z=1.96 for 95% confidence.


My issue is when the N (# of events) gets small. There are several situations where the accuracy is high (90-100%) but the N is between 10 and 20. I know that standard confidence interval equations are accurate when the N is larger (>30 minimum).

Does anyone have a program or method of calculating confidence intervals with small sample sizes? I am wary of putting down $$$ without confirming the signifigance of these smaller nodes.

RadBet
Why don't you give a specific example, including an explanation of what exactly you mean by "measure accuracy"?
Quick reply to this message
Old 08-04-08, 05:38 PM   #4
radbet
 
radbet's Avatar
Default example

what I mean by measured accuracy is the percentage of correctly predicted outcomes of a node within my decision tree output.

Here is an example of a node I have:

Over past 4 years (1860 games), a situation has occurred 20 times with 18 of them resulting in a positive outcome (ie. correctly predicted victory). This gives an obvious predicted outcome % of 90%.

From my understanding (please correct me if i am wrong), if the N>30 in a classification tree node where the class variable has 2 values (ie win/loss), then the probability of the class variable outcome can be safely assumed to be a normal distribution (if the known distribution of variables is also normal). But, if N<30, then normality can not be assumed and predicting confidence intervals is more complicated.

By the way, the tree is built with 15 continous variables which are based on team rating, offensive/defensive scoring, and SOS. I have evaluated the variables extensively and they are all normally distributed.
Quick reply to this message
Old 08-04-08, 10:05 PM   #5
BuddyBear
 
BuddyBear's Avatar
Joined: 08-10-05
Posts: 6,509
 
Message Me
Challenge Me
Default

Well, if your main dependent variable of interest is binary (i.e. win/loss) then you could do what is called a logistic regression. Logistic regression is similar to standard OLS regression with the exception that the DV is binary and then the continuous variables could be included in the model. This, to me, seems more rigorous than confidence intervals b/c logistic regression would allow you to control for all those 15 variables in the model.

It's really all not that clear to me what you are trying to do except it seems like you are trying to predict win/loss based on a certain set of variables you've collected data on. If that is the case, regression would be able to do that for you.
Quick reply to this message
 


SBR Featured Videos

Thread Tools
Display Modes



All times are GMT -5. The time now is 03:15 PM.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33