View New Posts
  1. #1

    Question Question: Sabermetrics-Creating Fair Odds in Baseball

    Did a search on amazon, couldn't find that many books on sabermetrics. Michael Murray's book (Betting Baseball 2007) came up, but the reviews is horrible. I am not really interested in handcapping sports other than regular season MLB games. So I think the best way to start is to figure out how to figure out the fair odds in MLB games.

    Anybooks or articles that you come across that you found helpful?

  2. #2

    Default

    also found this book:
    Understanding Sabermetrics: An Introduction to the Science of Baseball Statistics (Paperback)

    from reading the reviews, this book mianly talks about what does each stat mean and how to use them. I already know all the terms, so not sure if this book is all that useful

    http://www.amazon.com/Understanding-...6202661&sr=1-1

  3. #3

    Default

    You might look at the old Strat-o-Matic games. They have good ideas on how to simulate a single matchup between a pitcher and a batter. Once you know this, you can simulate the rest if you don't mind stat-mining for all the players.

    I actually developed an MLB simulator, and it did fantastic for 2 years of testing. When I started betting it live, it did so badly, I had offers to buy it so people could fade the plays.

  4. #4

    Default

    Everything you need you can find easily enough online.

    Tangotiger is a good place to start. He has a book as well.

  5. #5

    Default

    Quote Originally Posted by Justin7 View Post
    You might look at the old Strat-o-Matic games. They have good ideas on how to simulate a single matchup between a pitcher and a batter. Once you know this, you can simulate the rest if you don't mind stat-mining for all the players.

    I actually developed an MLB simulator, and it did fantastic for 2 years of testing. When I started betting it live, it did so badly, I had offers to buy it so people could fade the plays.
    Thanks justin. My plan for this season is to simulate games using monte carlo. I think to simulate a single game is a lot like how those video game do it. It'll just take me some time to put in all the stats for the roster.

    This is the idea I have. To figure out the ML odds for a particular game. I'll use monte carlo simulation to replay the game one million times (10k or 100k might be enough...but electricity is cheap ), find the % of times that team A wins. I can use that to estimate the ML odds. For example, after 1 million simulation, i have team A wins 640k games. Then the ML odds should be -180. Then I would expect the fair odds will be -190/+170. I think this how accuscore does it. But it is going to be a bitch to factor in minor details like days of rest, weather, ball park...or i could just ignore all the small details and crunch the numbers base on most used stats. Anyways, i'll create couple versions and see how it does live.

  6. #6

    Default

    That's what my program does... But I use 10 million as a base. You'd be surprised how much volatility you can get in a mere million games.

  7. #7

    Default

    Murray's book leaves the reader to connect a lot of dots, but it isn't terrible.
    1000pts

    TOP SPORTSBOOK
    WINNER
    5/4/2012

    1050pts

    TOP SPORTSBOOK
    WINNER
    5/10/2012

    1000pts

    TOP SPORTSBOOK
    WINNER
    05/05/2012

    SBR Founder Join Date: 8/10/2005


  8. #8

    Default

    Quote Originally Posted by Red_Sux View Post
    But it is going to be a bitch to factor in minor details like days of rest, weather, ball park...
    Understatement.

    You'll be amazed how difficult it turns out to be if you're aiming for something reasonably sophisticated.

  9. #9

    Default

    Quote Originally Posted by Red_Sux View Post
    Thanks justin. My plan for this season is to simulate games using monte carlo. I think to simulate a single game is a lot like how those video game do it. It'll just take me some time to put in all the stats for the roster.

    This is the idea I have. To figure out the ML odds for a particular game. I'll use monte carlo simulation to replay the game one million times (10k or 100k might be enough...but electricity is cheap ), find the % of times that team A wins. I can use that to estimate the ML odds. For example, after 1 million simulation, i have team A wins 640k games. Then the ML odds should be -180. Then I would expect the fair odds will be -190/+170. I think this how accuscore does it. But it is going to be a bitch to factor in minor details like days of rest, weather, ball park...or i could just ignore all the small details and crunch the numbers base on most used stats. Anyways, i'll create couple versions and see how it does live.
    You better also look at the actual lineup used and not "overall" stats. Here is why. For each slot in the lineup there are characteristics for the hitter which are optimum. Each slot in the lineup behaves in different ways due to the base runner situational factor for that slot. For example, most managers put their best RBI hitter in the #3 slot. For the national league this is a mistake because the situational factor for the #3 slot in terms of "what runners on which bases with how many outs" is worse than the #5 slot due to the pitcher hitting before the #3 hitter (more often than you might think).

    Another example. What is the most important factor for a leadoff hitter? If you said Batting Average you are wrong. If you said stolen base % you are wrong. THE most important factor for a leadoff hitter is the % of times that the hitter moved past 1st base due to his own efforts per plate appearance. That means you want a leadoff hitter who gets a lot of extra base hits per plate appearance AND steals a lot of bases per number of times reached first base. The reason for this is simple. The % of runs scored in the situations "runner on 2nd with zero outs", or "runner on 3rd with zero outs" is much higher than "runner on 1st with zero outs".

    For a number two hitter you want a hitter who does not ground into double plays, doesn't walk very much, and gets a lot of hits per plate appearance. That speedy runner on 1st cannot take extra bases if the #2 hitter gets a walk.

    Some managers get these kind of stats, others don't . So, even though the team stats might look good a team can still lose games if the manager puts the players in the wrong slots in the lineup.

  10. #10

    Default

    Interesting thread fellas. I have always gotten by with a combination of about 6 factors-3 of which are pitching, one of which is based on expected lineup, hitting about 55-58% each baseball season. Sabermetrics might sharpen that up a bit.

  11. #11

    Default

    Do Sabermetrics really help or they're just another useless stat?

  12. #12

    Default

    Quote Originally Posted by Arnold View Post
    Do Sabermetrics really help or they're just another useless stat?
    I dabbled in sabermetrics very briefly a few years ago and it just mucked me up. I am willing to give it another shot though.

  13. #13

    Default

    Quote Originally Posted by Arnold View Post
    Do Sabermetrics really help or they're just another useless stat?
    i donno, but i kind of interested to build a baseball simulator

  14. #14

    Default

    Quote Originally Posted by Red_Sux View Post
    i donno, but i kind of interested to build a baseball simulator
    I think the success of it will depend on how close it is to reality. For example, you'd need to implement strategy into the game. Lets say you're down by 1 run in the 9th inning, and one of your guys just got a single. Normally, you'd put your fastest pinch runner on 1st, then try stealing second, maybe even 3rd. So you'd need some kind of aggressiveness level based on the situation in the game. Also, how would you determine base running? Lets say you have someone on 1st, then you get a single, how far do you advance the runner on 1st? He can go to 2nd or 3rd, or be thrown out. Is there any stat available for this kind of stuff? Things like this shouldn't be ignored in my opinion, and must be accurate.

  15. #15

    Default

    Quote Originally Posted by Arnold View Post
    I think the success of it will depend on how close it is to reality. For example, you'd need to implement strategy into the game. Lets say you're down by 1 run in the 9th inning, and one of your guys just got a single. Normally, you'd put your fastest pinch runner on 1st, then try stealing second, maybe even 3rd. So you'd need some kind of aggressiveness level based on the situation in the game. Also, how would you determine base running? Lets say you have someone on 1st, then you get a single, how far do you advance the runner on 1st? He can go to 2nd or 3rd, or be thrown out. Is there any stat available for this kind of stuff? Things like this shouldn't be ignored in my opinion, and must be accurate.
    for my version 1, i'll keep it simple. basically give single, double, triple, HR, double play, out percentage for each hitter and pitcher match up. Maybe i'll just use the S, one RP, and one C. it is not going to be accurate, but it just give me a sense how would the game turn out if everyone perform according to their career/season average.

  16. #16

    Default

    Quote Originally Posted by Arnold View Post
    Lets say you have someone on 1st, then you get a single, how far do you advance the runner on 1st? He can go to 2nd or 3rd, or be thrown out. Is there any stat available for this kind of stuff? Things like this shouldn't be ignored in my opinion, and must be accurate.
    Same issues for runner advancement on outs, as well.

    And what are you going to use to determine likelihoods of batter/pitcher matchups. Last years stats? Two Years? Career? Are you going to account for aging? Regression to the mean?

    What about lefty/righty. Do you use generic tendencies for left/righty matchups, or individual. If individual, you're going to run into a lot more sample size issues.

    Speaking of sample size, what about rookies and sophomores. Major League Equivalencies?

    Just a warning, this stuff just keeps coming and coming. It's a lot of work. And if the plan is to ignore the intricate details that a good simulator would address, you'd probably have better results with much less work using regression analysis instead of simulation.

    The good news is that everything you need to know about runner advancement, etc, can be squeezed out of the event files available for free from retrosheet. And if you can come up with an accurate simulator, it could be hugely valuable.

    Also, you'd be crazy not to backtest your simulator. With the availability of data, there's no reason to press forward without testing your model.

  17. #17

    Default

    Quote Originally Posted by MrX View Post
    Same issues for runner advancement on outs, as well.

    And what are you going to use to determine likelihoods of batter/pitcher matchups. Last years stats? Two Years? Career? Are you going to account for aging? Regression to the mean?
    I would ignore batter/pitcher matchups and just use the overall averages. Also I would use season stats prior to the game. It is useless to backtest a system/simulation having "future" stats factored in. For anything like this one would need to keep track of individual stats and update them after each game. That's a lot of work if you don't have it automated. I agree that a simulation like this is a looooooot of work. I'm not sure if it's worth trying, unless just for fun of developing one.

    How about existing simulators? Are they any good? For example, the accuscore.com simulator?

  18. #18

    Default

    The toughest part with a simulator is the AI for the team's manager, especially with regards to the bullpen. But the bottom line for any stats is they're only as good as the person reading them.

  19. #19

    Default

    Quote Originally Posted by Arnold View Post
    I would ignore batter/pitcher matchups and just use the overall averages. Also I would use season stats prior to the game. It is useless to backtest a system/simulation having "future" stats factored in. For anything like this one would need to keep track of individual stats and update them after each game. That's a lot of work if you don't have it automated. I agree that a simulation like this is a looooooot of work. I'm not sure if it's worth trying, unless just for fun of developing one.

    How about existing simulators? Are they any good? For example, the accuscore.com simulator?
    i am wondering about that too. maybe i'll start a thread to track accuscore win% using full kelly, see how it does for this year's MLB games. I am going to use opening and close line for comparison. Sounds like an interesting project.

Top