View New Posts
  1. #1

    Default Database suggestions?

    Planning on web scraping NCAA basketball data, but I'm unsure which database solutions I should be looking into. Would like multiple filter/sorting options for teams, conferences, location, etc.

    What should I be looking into? Any books I should be reading (already have a couple data warehousing and statistical analysis, but any you guys recommend would be awesome)?

    Also, how happy are you with your automated web scraping solution? Any issues you've run into or tips you feel like sharing?

    I definitely appreciate whatever knowledge you guys can share on databases or issues you ran into when setting up your own.

    Thanks a lot.

  2. #2

    Default

    I use Excel for working with and Access for viewing.

    I don't know how to scrape, so I just do copy and paste with the data I collect.

  3. #3

  4. #4

    Default

    SQL is pretty much a slam dunk. Free, powerful, and flexible.

  5. #5
    durito's Avatar SBR PRO
    Join Date: 07-03-06
    Posts: 13,077
    SBR Points: 606
    Message Me

    Default

    I use mysql

  6. #6

    Default

    I want to get an idea about the size of the data. Is it in the MBs, GBs, or TBs? If you can scrape, store, query, and analyze as much data as you want, how much better would your models be? Thanks for the info.

  7. #7

    Default

    Quote Originally Posted by xyz View Post
    I want to get an idea about the size of the data. Is it in the MBs, GBs, or TBs? If you can scrape, store, query, and analyze as much data as you want, how much better would your models be? Thanks for the info.
    MBs, and I specialize in the most data-intensive sport using data down to the individual pitch level, so I don't know how anyone's going to get into the TBs!

    In theory, the perfect model would use all of the pertinent data available. More tends to be better, but more can also lead to overfitting, too much complexity, excessive computing time, headaches, nausea, etc.

  8. #8

Top