View New Posts
  1. #1

    Default Use math to predict baseball games - Markov Chain Method.

    By William Atkins
    Friday, 06 April 2007
    The use of mathematics may not seem very interesting to the average person, but U.S. math professor, and N.Y. Mets fan, Bruce Bukiet can consistently beat sports experts when using his copyrighted “Markov Chain” method.

    For the 2007 major league baseball season, Bukiet is predicting that the New York Yankees will be the winningest team with 110 victories out of 162 games.

    Bukiet, who teaches at New Jersey Institute of Technology (Newark), uses a mathematical model that determines the likelihood of victory or defeat on a particular day based on the two teams’ batting orders of starters (along with five reserves) and starting pitcher (and six relievers). His model predicts the outcome of individual games based on how well each player is likely to perform against each pitcher. Bukiet also predicts outcomes for the whole baseball season.

    The model is called the “Markov Chain”. It is a series of states within a system (in this case Major League Baseball) that relies on a finite number of possible situations in any baseball game. Each time the method makes a prediction a change of state has been made, what is called a transition. A past state carries no information about future states, only information in the current state is used to predict the future. When Bukiet makes his predictions for the 2007 he will input statistics into his Markov mathematical model from the past three years: 2004, 2005, and 2006.

    His model works only for games like baseball where one-on-one events occur, such as one pitcher pitching to one batter. The model doesn’t work for more team-intensive sports such as basketball where two teams of five players each, for instance, must go up and down the court in unison in order to defend and shoot baskets—and ultimately either win or lose the game.

    Bukiet says that in the last five out of six years he has had more right than wrong predictions. His findings have been published in the paper, "A Markov Chain Approach to Baseball," in the February 1997 issue of the journal Operations Research.

    He first started using his model as a way to show students that mathematics CAN BE FUN!

    Bruce Bukiet’s Web page is at: http://cams.njit.edu/~bukiet/.

    More information on Markov Chain appears at: http://mathworld.wolfram.com/MarkovChain.html.


    http://www.itwire.com.au/content/view/11130/1066/

    SBR Founder Join Date: 8/10/2005


  2. #2
    Ganchrow's Avatar Become A Pro!
    Join Date: 08-28-05
    Posts: 5,014
    SBR Points: 119
    Message Me

    Default

    I'd be interested to learn whether or not Prof. Bukiet's results outperform closing lines.

    SBR Founder Join Date: 8/28/2005


  3. #3

    Default

    Quote Originally Posted by Ganchrow View Post
    I'd be interested to learn whether or not Prof. Bukiet's results outperform closing lines.
    I'll lay you +900 that they don't.

  4. #4
    durito's Avatar SBR PRO
    Join Date: 07-03-06
    Posts: 13,086
    Message Me

    Default

    Quote Originally Posted by Ganchrow View Post
    I'd be interested to learn whether or not Prof. Bukiet's results outperform closing lines.
    according to this page:

    http://www.egrandslam.com/pastresults.html

    It's been profitable in 5/6 seasons.

  5. #5
    Ganchrow's Avatar Become A Pro!
    Join Date: 08-28-05
    Posts: 5,014
    SBR Points: 119
    Message Me

    Default

    Quote Originally Posted by durito View Post
    according to this page:

    http://www.egrandslam.com/pastresults.html

    It's been profitable in 5/6 seasons.
    Right, but are those results determined ex-post or ex-ante?

    In other words was he actually making these picks year-to-year using his model, or did he formulate his model so as to simply maximize past year results? The former represents predictive statistics and the latter descriptive statistics (aka "data mining").

    I don't pretend to know what he's done, but I would be interested in finding out.

    SBR Founder Join Date: 8/28/2005


  6. #6
    durito's Avatar SBR PRO
    Join Date: 07-03-06
    Posts: 13,086
    Message Me

    Default

    As far as I can tell from looking at his web page, he has made the picks year to year using the model. At least, that is what is implied.

  7. #7
    Ganchrow's Avatar Become A Pro!
    Join Date: 08-28-05
    Posts: 5,014
    SBR Points: 119
    Message Me

    Default

    Quote Originally Posted by durito View Post
    As far as I can tell from looking at his web page, he has made the picks year to year using the model. At least, that is what is implied.
    I've seen it in quantitative finance and I've seen it in quantitative sports betting ... and that's what's always implied.

    You'd be surprised how many otherwise competent quantitative individuals simply don't understand the practical importance of segmenting a data set into in-sample and out-of-sample partitions. The fact that Prof. Bukiet is not proactive about making explicit his sampling methodology is what concerns me.

    The truth it's rather easy to come up with a model that describes the past ... predicting the future is another matter entirely.

    SBR Founder Join Date: 8/28/2005


  8. #8

    Default

    Most of these formulas are backfitted when they say that it's won 5 out of 6 years. If that's the case then it doesn't mean much.

  9. #9
    Ganchrow's Avatar Become A Pro!
    Join Date: 08-28-05
    Posts: 5,014
    SBR Points: 119
    Message Me

    Default

    Quote Originally Posted by raiders72002 View Post
    Most of these formulas are backfitted when they say that it's won 5 out of 6 years. If that's the case then it doesn't mean much.
    Yeah, that's exactly it.

    I just want to be clear that I have no specific reason to suspect that Prof. Bukiet's model in particular is backfitted, it's just that from my experience this is what typically seems to be the sticking point.

    SBR Founder Join Date: 8/28/2005


  10. #10

    Default

    Many can predict games without odds as this guy might be able to do. But after I read his prediction of the Yanks winning 110 games he is just a fraud.

    Lol the yanks are lucky to win 90

    SBR Founder Join Date: 7/20/2005


  11. #11

    Default

    Agreed. If he's smart enough to have found the grail then you'd think he'd know he was smart enough to have found the grail and his New Jersey Institute of Technology days would be behind him.

    There's a long history of academics being attracted to the possibility of finding in sports betting an image of their own intelligence.
    1000pts

    TOP SPORTSBOOK
    WINNER
    5/4/2012

    1050pts

    TOP SPORTSBOOK
    WINNER
    5/10/2012

    1000pts

    TOP SPORTSBOOK
    WINNER
    05/05/2012

    SBR Founder Join Date: 8/10/2005


  12. #12

  13. #13

  14. #14

    Default

    Quote Originally Posted by raiders72002 View Post
    He'll tell you tomorrow.

  15. #15
    Ganchrow's Avatar Become A Pro!
    Join Date: 08-28-05
    Posts: 5,014
    SBR Points: 119
    Message Me

    Default

    Quote Originally Posted by bookie View Post
    Agreed. If he's smart enough to have found the grail then you'd think he'd know he was smart enough to have found the grail and his New Jersey Institute of Technology days would be behind him.

    There's a long history of academics being attracted to the possibility of finding in sports betting an image of their own intelligence.
    I think you're being a little bit harsh. Many people enjoy the life of academia and have made the conscious decision to seek knowledge and academic fame in preference to monetary success. There's nothing inherently inconsistent between finding the Holy Grail and living a life of academic austerity.

    That said, I think where many academics fail (especially in the fields of economics and finance) is in trying to relate interesting theoretical constructs to real world practicalities. A Markov chain as it relates to baseball can make for really, really interesting cocktail party conversation (at least in certain circles) and might be exceptionally hard to pass over intellectually. However, the extent to which a given theory is jointly true out-of-sample and in excess of market efficiency is the real unknown and that which is all too frequently overlooked by academic economists and applied mathematicians (especially those either overly accustomed to dealing with descriptive statistics or too comfortable working with qualitative predictions that don't need to out-perform any market index).

    SBR Founder Join Date: 8/28/2005


  16. #16

    Default

    Actually, his plays have a bigger problem. When Kazmir pitched against Pavano week 1 he had the Yankees at +100. He is betting vig free into newspapers opening lines... and when there is no line he assumes pick'em. He's a fraud.

  17. #17
    Ganchrow's Avatar Become A Pro!
    Join Date: 08-28-05
    Posts: 5,014
    SBR Points: 119
    Message Me

    Default

    Quote Originally Posted by Wheell View Post
    Actually, his plays have a bigger problem. When Kazmir pitched against Pavano week 1 he had the Yankees at +100. He is betting vig free into newspapers opening lines... and when there is no line he assumes pick'em. He's a fraud.
    Yeah ... that would be a rather ginormous difficulty. Good catch.

    Nevertheless, I'd be exceedingly hesitant to flat-out label him a "fraud". It's methodologies just like this, fully undertaken in good faith, that plague academic economic literature.
    Last edited by Ganchrow; 04-07-07 at 05:35 PM. Reason: adverb needed

    SBR Founder Join Date: 8/28/2005


  18. #18

    Default

    Upon further reflection, you are right, if only because he actually puts out his numbers and allows you to keep your own records. He is not a fraud, he's an academic.

  19. #19

    Default

    Quote Originally Posted by RickySteve View Post
    I'll lay you +900 that they don't.
    BINGO!!!
    His sytem predicts -300 pitcher will win 65% of his starts, so what?

    SBR Founder Join Date: 9/4/2005


  20. #20

    Default

    Quote Originally Posted by Ganchrow View Post
    I think you're being a little bit harsh. Many people enjoy the life of academia and have made the conscious decision to seek knowledge and academic fame in preference to monetary success.
    Many? I guess when I meet my first black swan academic who has the evaluative tools, capital, and savvy to crush sports betting but chooses to publish and teach I'll have to revise my conclusions.

    There are a number of interesting books on this topic. Two that I have enjoyed are Fortune's Formula (Poundstone) and If You're So Smart Why Aren't You Rich (McCloskey).
    Last edited by bookie; 04-09-07 at 01:00 PM. Reason: To include links...
    1000pts

    TOP SPORTSBOOK
    WINNER
    5/4/2012

    1050pts

    TOP SPORTSBOOK
    WINNER
    5/10/2012

    1000pts

    TOP SPORTSBOOK
    WINNER
    05/05/2012

    SBR Founder Join Date: 8/10/2005


  21. #21

    Default

    Quote Originally Posted by Scorpion View Post
    BINGO!!!
    His sytem predicts -300 pitcher will win 65% of his starts, so what?
    That would be a tremendous system. I'll take the +250 starter who wins 35%.

  22. #22

    Default

    Quote Originally Posted by bookie View Post
    Many? I guess when I meet my first black swan academic who has the evaluative tools, capital, and savvy to crush sports betting but chooses to publish and teach I'll have to revise my conclusions.

    There are a number of interesting books on this topic. Two that I have enjoyed are Fortune's Formula (Poundstone) and If You're So Smart Why Aren't You Rich (McCloskey).
    You're either joking or have a tragically narrow view of the world. Committed academics in many fields forego tremendous riches in the private sector. Those that are lured away by financial gain are often met with opportunities which dwarf any potential profit from exploiting inefficiencies in sports markets.

    Maybe you should read Fortune's Formula again, since it is the story of one such individual.

    You also should look up the definition of 'black swan'.

  23. #23

    Default

    Quote Originally Posted by RickySteve View Post
    You're either joking or have a tragically narrow view of the world. Committed academics in many fields forego tremendous riches in the private sector. Those that are lured away by financial gain are often met with opportunities which dwarf any potential profit from exploiting inefficiencies in sports markets.

    Maybe you should read Fortune's Formula again, since it is the story of one such individual.

    You also should look up the definition of 'black swan'.
    I think Poundstone tries to present the success of Claude Shannon as linked to his interest in the kelly formula, but it turns out his stock market success was due to his being a buy and hold investor in technology stocks.

    Are you an academic? Sorry if my comments hit a nerve.
    1000pts

    TOP SPORTSBOOK
    WINNER
    5/4/2012

    1050pts

    TOP SPORTSBOOK
    WINNER
    5/10/2012

    1000pts

    TOP SPORTSBOOK
    WINNER
    05/05/2012

    SBR Founder Join Date: 8/10/2005


  24. #24

    Default

    Quote Originally Posted by bookie View Post
    I think Poundstone tries to present the success of Claude Shannon as linked to his interest in the kelly formula, but it turns out his stock market success was due to his being a buy and hold investor in technology stocks.
    As a general rule you should have actually read something you reference, to avoid embarrassing situations where the source completely contradicts your argument.

    Quote Originally Posted by bookie View Post
    Are you an academic? Sorry if my comments hit a nerve.
    Nope. Just continuing my mission to enlighten idiots and expose phonies, one at a time.

  25. #25

    Default

    Quote Originally Posted by durito View Post
    according to this page:

    http://www.egrandslam.com/pastresults.html

    It's been profitable in 5/6 seasons.
    Either I can't read or their 2005 season has been a complete disaster: -3.34per game for the year

    link:
    http://www.egrandslam.com/cgi-bin/hwd2P.cgi?year=2005

  26. #26

    Default

    Quote Originally Posted by RickySteve View Post
    As a general rule you should have actually read something you reference, to avoid embarrassing situations where the source completely contradicts your argument.
    What do you imagine my argument to have been?
    1000pts

    TOP SPORTSBOOK
    WINNER
    5/4/2012

    1050pts

    TOP SPORTSBOOK
    WINNER
    5/10/2012

    1000pts

    TOP SPORTSBOOK
    WINNER
    05/05/2012

    SBR Founder Join Date: 8/10/2005


  27. #27

    Default

    Quote Originally Posted by Ganchrow View Post
    However, the extent to which a given theory is jointly true out-of-sample and in excess of market efficiency is the real unknown and that which is all too frequently overlooked by academic economists and applied mathematicians (especially those either overly accustomed to dealing with descriptive statistics or too comfortable working with qualitative predictions that don't need to out-perform any market index).
    If testing for weak form efficiency (betting on price data alone), or some other sort of systematic betting rule such as Hausch, Zeimba and Rubenstien (1981)* where they are no parameters to maximise in the model, is out-of-sample testing necessary (I'm assuming closing odds are used)?

    Am I correct in thinking out-of-sample testing involves testing your model across a range of time periods, which may all be in the past, meaning one doesn't need to wait for 'new' results (so long as you don't actually use the whole dataset for calibration)? In the case of a 'static' systematic models as above, wouldn't splitting the data into fewer time periods simply reduce the statistical significance of results from each group? Or is the whole point that any model, parameter-less or otherwise, should produce returns even when splitting the data into a range of time periods (and if you find that there is not enough data in each subset for statistical significance, you need a larger dataset)?


    *This is an example of academics publishing profitable material for bookie: They used the so-called 'Harville formulas' to find inconsistencies between the place and show betting odds in horse racing. If the identified bets had been placed, they claimed a return of 1.15 at the various racetracks tested. This 'system' was later published in a book for laymen, the "Dr. Z System". Studies have since indicated that this inefficiency has greatly dimished in the interventing years.

  28. #28
    Ganchrow's Avatar Become A Pro!
    Join Date: 08-28-05
    Posts: 5,014
    SBR Points: 119
    Message Me

    Default

    Quote Originally Posted by ugard View Post
    If testing for weak form efficiency (betting on price data alone), or some other sort of systematic betting rule such as Hausch, Zeimba and Rubenstien (1981)* where they are no parameters to maximise in the model, is out-of-sample testing necessary (I'm assuming closing odds are used)?
    I don't quite understand your question, nor am I familiar with that particular paper. But if you want to make predictions of future events and formulate this model based on an historical sample, then an out-of-sample dataset upon test your predictions is essential.

    Quote Originally Posted by ugard View Post
    Am I correct in thinking out-of-sample testing involves testing your model across a range of time periods, which may all be in the past, meaning one doesn't need to wait for 'new' results (so long as you don't actually use the whole dataset for calibration)?
    Yes.

    Quote Originally Posted by ugard View Post
    In the case of a 'static' systematic models as above, wouldn't splitting the data into fewer time periods simply reduce the statistical significance of results from each group?
    Yes.

    Quote Originally Posted by ugard View Post
    Or is the whole point that any model, parameter-less or otherwise, should produce returns even when splitting the data into a range of time periods (and if you find that there is not enough data in each subset for statistical significance, you need a larger dataset)?
    The point is you don't want to be testing your model on the same data set you used to formulate it.

    SBR Founder Join Date: 8/28/2005


  29. #29

    Default

    Quote Originally Posted by Ganchrow View Post
    I don't quite understand your question, nor am I familiar with that particular paper.
    It is available as a .doc here, or the .pdf (if you have JSTOR access) is here.

    When I say 'weak form' efficiency I'm using the definition popularised by Fama in early '70s as part of EMH. To the best of my understanding, this means making a 'model' (although I feel this definition is where I am not explaining myself) from price data alone.

    For example, the simplest test (which has been performed repeatedly) in sport betting is to group outcomes in the dataset by price level and test whether betting at a particular price level would have produced a profit. I can't see how this sort of efficiency test (or any other based on some sort of parameter-less 'model', such as the famous, in horse racing circles at least, HZR system) would require out of sample testing.

    Quote Originally Posted by Ganchrow View Post
    But if you want to make predictions of future events and formulate this model based on an historical sample...
    I think this points at the discrepancy between the systematic rule based 'model' I was getting at, and the probability prediction model you mean.

    I'm not trying to claim anything you have written is wrong. Regardless, I'm sure you would point out that you definition of 'model' did not cover this (and I think I would agree that it is a tenuous use of the word).

    I'm just trying to add that there are (conceivably) profitable 'models' (or maybe a better term would be 'systems') that don't require out of sample testing.

  30. #30
    Ganchrow's Avatar Become A Pro!
    Join Date: 08-28-05
    Posts: 5,014
    SBR Points: 119
    Message Me

    Default

    Quote Originally Posted by ugard View Post
    It is available as a .doc here, or the .pdf (if you have JSTOR access) is here.

    When I say 'weak form' efficiency I'm using the definition popularised by Fama in early '70s as part of EMH. To the best of my understanding, this means making a 'model' (although I feel this definition is where I am not explaining myself) from price data alone.

    For example, the simplest test (which has been performed repeatedly) in sport betting is to group outcomes in the dataset by price level and test whether betting at a particular price level would have produced a profit. I can't see how this sort of efficiency test (or any other based on some sort of parameter-less 'model', such as the famous, in horse racing circles at least, HZR system) would require out of sample testing.



    I think this points at the discrepancy between the systematic rule based 'model' I was getting at, and the probability prediction model you mean.

    I'm not trying to claim anything you have written is wrong. Regardless, I'm sure you would point out that you definition of 'model' did not cover this (and I think I would agree that it is a tenuous use of the word).

    I'm just trying to add that there are (conceivably) profitable 'models' (or maybe a better term would be 'systems') that don't require out of sample testing.
    Nothing you've described would in any way obviate the need for proper out-of-sample hypothesis testing.

    This is just the point I've been making throughout this post. Too frequently, otherwise intelligent and quantitative people overlook proper testing methodology and then after losing their proverbial shirts, wonder why their models (which passed every statistical test imaginable in-sample) are so poor at predicting the future.

    SBR Founder Join Date: 8/28/2005


  31. #31

    Default

    Quote Originally Posted by Ganchrow View Post
    Nothing you've described would in any way obviate the need for proper out-of-sample hypothesis testing.
    I've thought about this a little more. I thought that because the simple model I was describing reqired no training (or other jiggery-pokery with variables, trying to modify it to fit it to the data at hand), testing it on two sets of data (neither having been used in the models formulation) would be no better than lumping all the data together and getting (hopefully) one highly significant result. I see now that the central point is not whether a subset has been used to tune parameters, but simply that one tests on mulitple subsets.

    The question now is, how many subsets, and what criterion does one use to decide between the favourability of, for example, 10 subsets all with positive returns at the 1% level and 50 subsets all with returns at the 5% level. Time for a little more reading.

    Quote Originally Posted by Ganchrow View Post
    This is just the point I've been making throughout this post.
    Thank you for hammering it home, now it's finally got there I feel rather enlightened.

    The more I think about it, the more I realise how valid your critisism that this "plagues" the literature is. I have read, or skimmed, a substantial number of the papers testing horse racing for weak form efficiency and never have I seen a study where part of the data was reserved for out of sample testing (I assume that, as there have been so many studies testing the same thing, all the studies taken together count as a sort of informal unfinished out-of-sample test).

    Come to think of it, many of the papers I read in other areas of economics, the abstract ends with "...and we find our model predicts the observed data.", but I don't see any evidence of out of sample testing. Also, the academic who I have had close dealings with on this subject (whilst very good with growth models!) knows very little about even basic significance testing.

Top