1. ## Impossible MBL stat research?

I have a ridiculous question.

Does anyone know what the average runs scored per inning is (AL and NL), based on what how deep in the line-up the lead batter is? E.g., in the first inning, the first batter will always be #1 in the lineup. You would expect innings starting with positions 6-7 to be the lowest.

2. i think something like two hits will equal a run on average, so you could figure out what the probability those players will get a hit in that half inning.

of course if they have a higher OBP then the number of hits needed would be lower and i guess you would need to figure out some kind of clustering of hits.

4. If I get a chance later today, I will figure this out for you from Retrosheet data.

5. Here it is for 2011:

 Sum of Runs per Inning LEAGUE LINEUPPOS A N 1 0.528 0.520 2 0.569 0.534 3 0.498 0.482 4 0.474 0.434 5 0.495 0.457 6 0.441 0.416 7 0.464 0.348 8 0.470 0.408 9 0.516 0.470

This is all innings, all games, so there's some obvious biases due to:
--9th or later innings that abruptly end due to walkoff events
--interleague games
--other things I haven't thought of

(plus there's undoubtedly lots of variation among teams depending on their specific hitters, etc.)
Nice
The more interesting question is what Justin is trying to do with the data presented, given the points bztips made above

7. He still needs the league average position in the lineup of the number one hitter per inning.

Depends what you're estimating. Trying to estimate total runs this way is very dubious, and I doubt he's doing that.

9. ....perhaps to do with live betting?

10. I agree with Mathy, can't figure out why this would be of use. But here's the counts anyway:

Sum of INNINGS LEAGUE
LINEUPPOS A N
1 4115 5076
2 1801 1982
3 1731 2027
4 2446 2764
5 2235 2427
6 1953 2368
7 2077 2286
8 2004 2286
9 1983 2120
Grand Total 20345 23336

For some reason it didn't format properly this time, but it's same as before American Lg first, then National -- I never could figure out how to reliably import an Excel table :-(

It was friendly of me to call this dubious. Even J7 would not do this, which is a strong statement given he bets menstrual cycles

Reads to me like he wants to run the average runs per inning against the league average of what number batter in the order starts the inning. Just the league average runs per inning would hardly qualify as a 'ridiculous question'.

Once you had such a league average you could use it in a per-inning projection for two specific teams, at the start of each inning.

14. You could but that would be a terrible projection to do before the game has started, and why would you need the average "leadoff position" data if you're betting at the start of an arbitrary inning? You would observe that, in which case all you need is the data that J7 asked for.

Anyways at the end of the day it is just an average so it provides a guide but if you're going to bet a team to win the x'th inning based on it, you are in for some pain.

I don't know what Justin needs this for. Perhaps a program to project the next inning live. The average leadoff position per inning would give you the hitters starting the inning. Assuming the question is about one piece of a larger puzzle, I don't see how you could discard it without deeper knowledge.

What in the world are you talking about?

How would the "average leadoff position" give you the hitters starting the inning when live betting? Pretty sure the LINEUP would do that.

And if it's "one piece of a larger puzzle," why would you assume that, "He still needs the league average position in the lineup of the number one hitter per inning?"

The only thing that I can think of is that it's some poor implementation of a Bayesian prior/log5 calculation.

17. You would think he would have his own database by now.

18. i gave the best solution to live betting in this thread. that is what i thought he wanted it for. specific people batting in game.

19. A lot of rocket science going on in this thread.

20. i wouldn't know about rocket science. i don't bet on menstrual cycles.

21. League averages incredibly useless in this situation. It's very team specific

22. yea I agree, I am thinking very team specific, and also depending on what you are using it for you might want to consider using a different measure of central tendency than the mean.

I say this because it seems inherent that the more runs a team scores in a given inning then the less meaningful this data point is. i.e. once they score a run or two they are out of that section of the batting order that you are trying to measure.

So it might be more meaningful to ask how often does a team score at all when their inning starts at each spot in the order. Think about it, a 6 run inning could skew the mean (again I don't know what it is being used for so maybe not) so maybe it would be better to use mode than mean.

23. or you could make your own metric by capping the high end, for example you could compute all innings with more than 3 runs as 3 runs -- that would keep a few big innings from overshadowing the many zero-run innings

and in a way you are not losing anything because once three runs are scored, I would think pretty much all bets are off with regard to the explanatory power of batting order for that inning