|
07-22-2008, 01:41 AM
|
#1 (permalink)
|
|
SBR Rookie
Join Date: 03-28-08
Location: Las Vegas, NV
Posts: 40
|
data/programming/updating model question
Hi all,
I have been working on modeling MLB for most of a year and have a (primarily econometric) model that I can confidently say should be a long-term winner. Problem is, I'm not much of a programmer. I did take an Intro Programming course (IN JAVA) in my last semester of school (last spring) and I can program pretty well in Stata (the statistical program I use), but the real problem is updating the model every day with new data. I've written code to generate a prediction (with the two teams and pitchers as inputs) but I'll need to add data new game data every day.
My question is this: Do any of you/have any of you faced this issue? What is your solution? Do I pretty much need to learn to write a webcrawler using Perl to scrape data offline?
Any help/advice would be greatly appreciated.
Last edited by Rufus : 07-22-2008 at 03:51 AM.
Reason: specification
|
|
|
|
07-22-2008, 02:10 AM
|
#2 (permalink)
|
|
SBR Wise Guy
Join Date: 06-03-08
Posts: 743
|
Yes, scrape the data. I use Tcl to do it-- a lot easier than Perl. Python or Ruby would be other choices. Perl is popular for this kind of thing too but probably more difficult for a novice programmer.
|
|
|
|
07-22-2008, 04:33 AM
|
#3 (permalink)
|
|
Moderator
Join Date: 08-28-05
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,746
|
I'm personally partial to Perl, in which I probably do close to 90% of my programming. If you have experience with Java you should have absolutely no problem with Perl.
You might also want to look into hiring programming help off of rentacoder.com or a similar site.
__________________
|
|
|
|
07-22-2008, 12:02 PM
|
#4 (permalink)
|
|
SBR Rookie
Join Date: 03-28-08
Location: Las Vegas, NV
Posts: 40
|
Any good book you would recommend to learn Perl?
|
|
|
|
07-22-2008, 12:05 PM
|
#5 (permalink)
|
|
Moderator
Join Date: 08-28-05
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,746
|
Quote:
Originally Posted by modelman
Any good book you would recommend to learn Perl?
|
The O'Reilly Learning Perl and Programming Perl books are very user-friendly.
__________________
|
|
|
|
07-22-2008, 12:06 PM
|
#6 (permalink)
|
|
SBR Hall of Famer
Join Date: 07-03-06
Location: La Selva Lacandona
Posts: 5,005
|
I have a programmer I hired through rentacoder.com
He's pretty cheap, but the work isn't quite what i want. If i can ever get my brain to work again, I'm going to try and learn again myself.
|
|
|
|
07-22-2008, 12:25 PM
|
#7 (permalink)
|
|
SBR Hall of Famer
Join Date: 07-03-06
Location: La Selva Lacandona
Posts: 5,005
|
Quote:
Originally Posted by Ganchrow
|
Ordered. Finding out that amazon can deliver to Colombia has not been good for my spending habits.
|
|
|
|
07-22-2008, 01:09 PM
|
#8 (permalink)
|
|
SBR Wise Guy
Join Date: 11-27-07
Location: U.S.S. Enterprise NCC-1701-E
Posts: 986
|
Quote:
Originally Posted by durito
Ordered. Finding out that amazon can deliver to Colombia has not been good for my spending habits.
|
This is a cheaper way:
http://proquest.safaribooksonline.com/
You read the books online (or save them on your computer as PDFs). A great time saving benefit, you get all the sample code in downloadable files.
|
|
|
|
07-22-2008, 01:18 PM
|
#9 (permalink)
|
|
SBR MVP
Join Date: 01-10-06
Location: Kakapoopoopeepeeshire
Posts: 1,269
|
If you have the time, I'd definitely recommend learning enough to write your own scrapers.
Occasionally the site you're scraping from will make a slight change to their format, or some aspect of a report will be different enough from the norm to throw off your scraper and it's sure nice to be able to make changes on the fly instead of waiting for your programmer.
As a side note, I scrape most of my data from MLB.com and they have remained blessedly consistent for a couple of years.
__________________
I'm completely in favor of the separation of Church and State. My idea is that these two institutions screw us up enough on their own, so both of them together is certain death. --George Carlin
|
|
|
|
07-22-2008, 01:52 PM
|
#10 (permalink)
|
|
SBR Rookie
Join Date: 03-28-08
Location: Las Vegas, NV
Posts: 40
|
Thanks everyone! I really appreciate the help.
|
|
|
|
07-22-2008, 01:55 PM
|
#11 (permalink)
|
|
Moderator
Join Date: 07-31-06
Posts: 2,341
|
I paid a programmer to write a scraper in Perl. It would automatically download stats from USAToday every day.
|
|
|
|
07-22-2008, 02:56 PM
|
#12 (permalink)
|
|
SBR Rookie
Join Date: 03-28-08
Location: Las Vegas, NV
Posts: 40
|
Quote:
Originally Posted by Justin7
I paid a programmer to write a scraper in Perl. It would automatically download stats from USAToday every day.
|
How much would that sort of thing cost?
|
|
|
|
07-22-2008, 04:38 PM
|
#13 (permalink)
|
|
SBR Hustler
Join Date: 02-23-08
Posts: 70
|
i use php, works pretty well - never had a problem
and i use windows scheduler to run it once a day at 7am and input it into mysql db
also for mlb you can just use dougstats, he updates once a day though i notice he's missing a couple players (like e. gonzalez from the padres)
|
|
|
|
07-22-2008, 06:02 PM
|
#14 (permalink)
|
|
SBR Rookie
Join Date: 03-28-08
Location: Las Vegas, NV
Posts: 40
|
I just looked at dougstats. It seems pretty good, except I need the game-by-game stats since I don't use uniform weights. I normally get it from baseball-reference (I have a subscription so I can use the Play Index) but it's pain in the ass to copy and paste it all.
|
|
|
|
07-22-2008, 06:52 PM
|
#15 (permalink)
|
|
SBR Rookie
Join Date: 03-28-08
Location: Las Vegas, NV
Posts: 40
|
What database/statistical software do other people use? Being an econ major in college, I learned Stata, which works well for me once I get data into it. I can do all the regressions, statistical analysis, and data management. Anybody else have other database preferences?
|
|
|
|
07-22-2008, 07:02 PM
|
#16 (permalink)
|
|
SBR MVP
Join Date: 01-10-06
Location: Kakapoopoopeepeeshire
Posts: 1,269
|
Quote:
Originally Posted by modelman
What database/statistical software do other people use? Being an econ major in college, I learned Stata, which works well for me once I get data into it. I can do all the regressions, statistical analysis, and data management. Anybody else have other database preferences?
|
I find that the statistical functions in Excel 2007 meet most of my needs. I've dabbled in a couple other programs for regression analysis, but not lately.
Mysql for database needs.
__________________
I'm completely in favor of the separation of Church and State. My idea is that these two institutions screw us up enough on their own, so both of them together is certain death. --George Carlin
|
|
|
|
07-22-2008, 08:12 PM
|
#17 (permalink)
|
|
Moderator
Join Date: 08-28-05
Location: Forest Hills, NY, Home of the Blitzkrieg Bop
Posts: 4,746
|
| |