View New Posts
  1. #1

    Default Correlated Data

    Any suggestions on what to do when you have correlated data and your model starts giving wacky results?

    Ie. home teams coming off a bye are less likely to cover the spread, coefficients change sign from positive to negative in the presence of other variables, etc.

  2. #2

    Default

    almost all data in a game is correlated in some fashion, uniform color might be the odd factor out.

    if you don't like negative coefficients, suppress 'em to always be positive.

  3. #3

    Default

    Could be a bunch of things. If you've a weak correlation one way any strongly corellated data the other way could simply overwhelm the prior relationship.

    Stating the obvious it could be an error in your calculations.

    Could be that trying to correlate home teams coming of a bye and their ability to cover the spread is wacky, and hence a wacky result is appropriate.

    CHARITY DONOR
    11/28/2011 $25 donation


  4. #4

    Default

    Here's a better example. In my NFL data set home teams won the game outright 58.5% of the time, road 41.5%. Home teams coming off a bye week won the game 57.1% of the time, which is a slight decrease but not really significant. However, road teams coming off a bye won the game outright a whopping 46.9% of the time which is a significant increase. If you make a model from the perspective of the home teams bye's aren't significant or maybe slightly negative, but if you make a model from the perspective of the road team bye's are a significant positive variable.

    I think the bye is correlated with home field advantage, any suggestions on what to do?

  5. #5

    Default

    You seem to have a good intuitive grasp of what the results should be which is great but without seeing the raw data I doubt you have a statistically significant sample to draw any conclusions from such a weak difference in home team performance due to a bye. Your results for road team however may benefit from further analysis. I however, would just take the data as they are.

    Your results, which are very interesting (Thanks!) jive quite well with what I've read on NBA home/road adjustments:

    Home court 3.5pts
    both rested 3.2pts
    home rested/vistorb2b 4.7pts

    I grabbed this from somewhere on the forum, I'd give credit and a link if I could recall where.

    Cheers,

    Benjy

    CHARITY DONOR
    11/28/2011 $25 donation


  6. #6

    Default

    DT - impossible to infer too many things without knowing the sample size.

    also, i would want to know the subsets of the two categories, how many home teams were favs that lost outright and how many road teams off a bye were favs. i would also want to look at the timing of the bye-week as i would assume a later bye weeks give teams an advantage over teams not yet at their bye week or a month or so removed from it. you could really drill down to get specifics that would probably reveal some win/lose % that top 60 or 65%.

    GL

  7. #7

    Default

    I think the bye is correlated with home field advantage, any suggestions on what to do?
    I am prone to liking single factor plays, or "angles" as some call them. Give me a 60% angle to bet blindly at -110, I am a happy camper. Of course the single angle could be "road dogs coming off a bye week SU when......" if you find a piece of information that increases your percentage dramatically.

    Word of warning though. You could be starting on the slippery slope known as backfitting anytime you start mine data too finely.

  8. #8

    Default

    Quote Originally Posted by Peep View Post
    Give me a 60% angle to bet blindly at -110, I am a happy camper.
    Yeah, me too. Please show me one.

  9. #9

    Default

    Quote Originally Posted by Wrecktangle View Post
    Yeah, me too. Please show me one.
    Historically, when the best turnover differential team plays the worst turnover differential team in the NFL, the underdog has hit at a ridiculous clip. But you only get a game or two a year this way.

Top