1. #1
    pringles
    pringles's Avatar Become A Pro!
    Join Date: 11-26-12
    Posts: 41
    Betpoints: 186

    Scraping SBObet, worth a shot?

    SBObet only offers XML feed to bigger affiliates.
    As I am a simple bettor, but still need the numbers, ive talked to a skilled programmer to write me a scraper.

    The problem is whether i will get blocked and if not, after how many minutes should i scrape the lines?

    I appreciate your answers/suggestions

    //

    I would like to know if scraping the main lines (AH, totals) on all the soccer leagues is too much data (excuse me, i dont know much about programming, but do you still have to scrape the whole page in order to do that?) ... will i get blocked if i do this ... lets say once every few minutes.



  2. #2
    HUY
    HUY's Avatar Become A Pro!
    Join Date: 04-29-09
    Posts: 253
    Betpoints: 3257

    Quote Originally Posted by pringles View Post
    SBObet only offers XML feed to bigger affiliates.
    As I am a simple bettor, but still need the numbers, ive talked to a skilled programmer to write me a scraper.

    The problem is whether i will get blocked and if not, after how many minutes should i scrape the lines?

    I appreciate your answers/suggestions

    //

    I would like to know if scraping the main lines (AH, totals) on all the soccer leagues is too much data (excuse me, i dont know much about programming, but do you still have to scrape the whole page in order to do that?) ... will i get blocked if i do this ... lets say once every few minutes.


    You won't have any problems with the amount of data. You will be blocked if you request pages too often. Just download every few minutes and you should be fine. Also, send the User-Agent string of a well-known browser while you download stuff. If you do get blocked despite all that then a quick reset of the router should get you a new IP and sidestep the ban.

    I've written scrapers for many websites and I can scrape SBObet for you, contact me via PM so we can discuss pricing if you're interested.

  3. #3
    hubie69
    I am JJs bookie
    hubie69's Avatar Become A Pro!
    Join Date: 09-16-10
    Posts: 7,329
    Betpoints: 617

    And really it doesn't even take a skilled programmer to do it.

  4. #4
    pringles
    pringles's Avatar Become A Pro!
    Join Date: 11-26-12
    Posts: 41
    Betpoints: 186

    Quote Originally Posted by hubie69 View Post
    And really it doesn't even take a skilled programmer to do it.
    true
    Last edited by pringles; 07-29-13 at 09:25 AM.

  5. #5
    HUY
    HUY's Avatar Become A Pro!
    Join Date: 04-29-09
    Posts: 253
    Betpoints: 3257

    Quote Originally Posted by pringles View Post
    true
    You don't need to be a star programmer simply to scrape a website, but you do need to have firm understanding of the technologies involved in order to make your program resistant to bans, resistant to connection unreliability and feasible to be run 24/7/365 (i.e. to use propper logging, automatically restart it in case the host server is rebooted etc.)

    As always, the devil is in the details.

  6. #6
    SportsInsights
    SportsInsights's Avatar Become A Pro!
    Join Date: 01-05-09
    Posts: 119
    Betpoints: 110

    SBO is a difficult site to scrape. You'll need an account as well as the ability to use a proxy network. They monitor usage.

  7. #7
    hubie69
    I am JJs bookie
    hubie69's Avatar Become A Pro!
    Join Date: 09-16-10
    Posts: 7,329
    Betpoints: 617

    Quote Originally Posted by HUY View Post
    You don't need to be a star programmer simply to scrape a website, but you do need to have firm understanding of the technologies involved in order to make your program resistant to bans, resistant to connection unreliability and feasible to be run 24/7/365 (i.e. to use propper logging, automatically restart it in case the host server is rebooted etc.)

    As always, the devil is in the details.
    I disagree to a point, but it depends on the OS and what you code in. Parsing XML relatively error free can be done in under 50 lines of code, at least on a linux box using either bash or php. The ban resistance is easy as there really is no need to scrape more than every 4 or 5 minutes, I doubt you'll get banned for that type of activity. And it doesnt need to run 24/7/365, you can do it on a per sport basis and just cronjob it for once every X minutes. As well, you can just use curl or wget to grab the entire xml site and run the parsing locally.

    Hooray for Linux

  8. #8
    hubie69
    I am JJs bookie
    hubie69's Avatar Become A Pro!
    Join Date: 09-16-10
    Posts: 7,329
    Betpoints: 617

    Quote Originally Posted by SportsInsights View Post
    SBO is a difficult site to scrape. You'll need an account as well as the ability to use a proxy network. They monitor usage.

    You could use Hide My A$$ for this, its free

  9. #9
    HUY
    HUY's Avatar Become A Pro!
    Join Date: 04-29-09
    Posts: 253
    Betpoints: 3257

    Quote Originally Posted by hubie69 View Post
    I disagree to a point, but it depends on the OS and what you code in. Parsing XML relatively error free can be done in under 50 lines of code, at least on a linux box using either bash or php. The ban resistance is easy as there really is no need to scrape more than every 4 or 5 minutes, I doubt you'll get banned for that type of activity. And it doesnt need to run 24/7/365, you can do it on a per sport basis and just cronjob it for once every X minutes. As well, you can just use curl or wget to grab the entire xml site and run the parsing locally.

    Hooray for Linux
    You don't need linux to parse xml, run bash, run php, run wget or run curl. Try to contribute something to the thread please.

  10. #10
    Fair
    Fair's Avatar Become A Pro!
    Join Date: 11-25-10
    Posts: 216
    Betpoints: 203

    sorry... but why scraping data from a website when you have others sites (like oddsportal) that have all the lines movements?

  11. #11
    HUY
    HUY's Avatar Become A Pro!
    Join Date: 04-29-09
    Posts: 253
    Betpoints: 3257

    Quote Originally Posted by Fair View Post
    sorry... but why scraping data from a website when you have others sites (like oddsportal) that have all the lines movements?
    So you should scrape oddsportal instead, is that what you are saying?

  12. #12
    hubie69
    I am JJs bookie
    hubie69's Avatar Become A Pro!
    Join Date: 09-16-10
    Posts: 7,329
    Betpoints: 617

    Quote Originally Posted by HUY View Post
    You don't need linux to parse xml, run bash, run php, run wget or run curl. Try to contribute something to the thread please.
    No, you don't need linux to do it. It makes it easier once you learn it though. Simply stating that it may be helpful for the op to use Linux if he currently doesn't. If that doesn't contribute enough for you, I also chimed in with ban resistances, from what I've found to be true over the past few years of scraping XML data myself. Learn to not only read my post, but understand the words that are typed in it before you slander me.

  13. #13
    strixee
    I think, therefore I win
    strixee's Avatar Become A Pro!
    Join Date: 05-31-10
    Posts: 432

    pringles, how much such a scraper approximately costs?

  14. #14
    pringles
    pringles's Avatar Become A Pro!
    Join Date: 11-26-12
    Posts: 41
    Betpoints: 186

    Quote Originally Posted by strixee View Post
    pringles, how much such a scraper approximately costs?
    well, im using both very skilled designer and programmer, we have done the interface and the programming starts in a few days.
    im paying around 3k€ for the whole set-up

  15. #15
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    You are paying 3000€ for a website scraper? That only scrapes one site?

  16. #16
    sideloaded
    staring into the abyss
    sideloaded's Avatar Become A Pro!
    Join Date: 08-21-10
    Posts: 7,561

    Quote Originally Posted by HUY View Post
    You don't need linux to parse xml, run bash, run php, run wget or run curl. Try to contribute something to the thread please.
    why on earth would you do all that on something NOT based on linux? You setting up your ultra complex scraper on Solaris?

  17. #17
    sideloaded
    staring into the abyss
    sideloaded's Avatar Become A Pro!
    Join Date: 08-21-10
    Posts: 7,561

    Quote Originally Posted by pringles View Post
    well, im using both very skilled designer and programmer, we have done the interface and the programming starts in a few days.
    im paying around 3k€ for the whole set-up
    You're over paying. No need for a skilled programmer for this. Hire a 9th grader and buy him a 3ds or something.

  18. #18
    HUY
    HUY's Avatar Become A Pro!
    Join Date: 04-29-09
    Posts: 253
    Betpoints: 3257

    Quote Originally Posted by sideloaded View Post
    why on earth would you do all that on something NOT based on linux? You setting up your ultra complex scraper on Solaris?
    Cygwin.

  19. #19
    sideloaded
    staring into the abyss
    sideloaded's Avatar Become A Pro!
    Join Date: 08-21-10
    Posts: 7,561

    yeah but if you're scraping 99 percent of the time you are deploying to a vps running linux


    cygwin and windows is just gross

  20. #20
    Fair
    Fair's Avatar Become A Pro!
    Join Date: 11-25-10
    Posts: 216
    Betpoints: 203

    i mean... if you are interested in lines movement, there are a lot af site that offer all the information that you want, all the historical data from so many boookies. So why pay 3000$ for an information that is avaiable for free? In the end... if you do this for bet and for earn some money... you start with a bankroll of -30000 ... are you kidding me?

  21. #21
    hubie69
    I am JJs bookie
    hubie69's Avatar Become A Pro!
    Join Date: 09-16-10
    Posts: 7,329
    Betpoints: 617

    Quote Originally Posted by HUY View Post
    Cygwin.
    Sorry bud but why try to put a linux layer over the top of windows when you can simply buy a 5$ machine from a garage sale and actually run linux? Scraping requires virtually 0 resources and if done on actual linux it's portable to any *nix based box. Not judging, just being a Linux Admin and a Network admin as my living, it seems odd.
    Points Awarded:

    Maverick22 gave hubie69 2 SBR Point(s) for this post.


  22. #22
    hubie69
    I am JJs bookie
    hubie69's Avatar Become A Pro!
    Join Date: 09-16-10
    Posts: 7,329
    Betpoints: 617

    Quote Originally Posted by sideloaded View Post
    yeah but if you're scraping 99 percent of the time you are deploying to a vps running linux


    cygwin and windows is just gross


  23. #23
    HUY
    HUY's Avatar Become A Pro!
    Join Date: 04-29-09
    Posts: 253
    Betpoints: 3257

    Quote Originally Posted by hubie69 View Post
    Sorry bud but why try to put a linux layer over the top of windows when you can simply buy a 5$ machine from a garage sale and actually run linux? Scraping requires virtually 0 resources and if done on actual linux it's portable to any *nix based box. Not judging, just being a Linux Admin and a Network admin as my living, it seems odd.
    More machines = more problems.

    Also, I'm working on a laptop and linux does not play very well with laptops.

  24. #24
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    Dude... go to a pawn shop. Find the cheapest computer you can find. Put linux on it. Deploy all your code there. Then thank us later

  25. #25
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    Plus... a dedicated server running a scraper makes your life easier... not harder.

    Sometimes more computers is more complexity... but not in this case. Not in this case at all.

  26. #26
    pringles
    pringles's Avatar Become A Pro!
    Join Date: 11-26-12
    Posts: 41
    Betpoints: 186

    Quote Originally Posted by Maverick22 View Post
    You are paying 3000€ for a website scraper? That only scrapes one site?
    Im using designer + initial programmer for a scraper that takes everything, lines and statistics and writes a db to my server.
    Then a second programmer to add algorithms and make an Iphone app with alerts.

  27. #27
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    I would have a conversation with each developer and "designer".

    After the whole thing is finished, I would try to get a copy of all the source code, including all the database scripts. and documentation (For those prices, it better come with some documentation)

    Since you are paying for it, you (should) own it.

    You might not think you will need it, but you may one day. And you will not want to chase down a guy for the code years later.

    Just my thoughts anyways.

  28. #28
    HUY
    HUY's Avatar Become A Pro!
    Join Date: 04-29-09
    Posts: 253
    Betpoints: 3257

    Quote Originally Posted by pringles View Post
    Im using designer + initial programmer for a scraper that takes everything, lines and statistics and writes a db to my server.
    Then a second programmer to add algorithms and make an Iphone app with alerts.
    What will those "algorithms" do? Tell you what to bet? If so, I have a whole new world waiting for you: They're called "touts".

  29. #29
    SportsInsights
    SportsInsights's Avatar Become A Pro!
    Join Date: 01-05-09
    Posts: 119
    Betpoints: 110

    Just so you know, SBOBet offers an XML feed.

  30. #30
    arwar
    arwar's Avatar Become A Pro!
    Join Date: 07-09-09
    Posts: 208
    Betpoints: 1544

    well i have been writing scrapers for years. i have written so many of them i can do it in my sleep. for 4 years now i have been running a scraper against a popular site that runs every 3 minutes, 24/7/365. even though respected posters on here said you can get banned for over use there, it has never happen on this site to me. i have a little random routine that adds 0-60 seconds to 2 minutes between scrapes. if it were to hit exactly every 3 minutes, it might attract attention. this site has an RSS feed, but i find it lags behind the real time data. i have scraped hundreds of different sites, both for historical and real time data. the only time i have ever been booted was once on yahoo. if i remember correctly that had some kind of weird numbering convention for MLB games and so in attempt to grab all the data i ran some kind of loop like 100000 to 600000 and it returned a lot of 404 (page not found) errors. this apparently attracted the attention of some sysop and gave me some kind of ping of death. i was able to get back (i have static ip from my ISP) in after 20 minutes, and after adjusting the logic of my program so it only requested pages that could actually be served, i never had any more problems. So far as whoever posted about getting a different IP address, usually the DHCP server will assign a lease with the same IP to the same MAC address if possible. It used to be that it was always different, not so much anymore. the weirdest scraper i wrote was for some guy betting tennis at an offshore book that changed the line on the match after every point. i couldn't figure out how he could get a bet down between points?? i guess he just bet between games. some of these scrapers get very complicated - with all the advanced javascript, even readystate=4 doesn't work. i am competent with linux, but curl isn't going to return data that's not there. i am curious now - can somebody post the url to this site?

  31. #31
    arwar
    arwar's Avatar Become A Pro!
    Join Date: 07-09-09
    Posts: 208
    Betpoints: 1544

    well i zipped over to SBObet.com and took a look at it. i think the guy wanted soccer which of course is football. i didn't need any account to be able to see lines, but it looked like they had tons of different leagues - Japan, etc. and a shitload of games. i didn't look under the hood at the way the way the site was coded, but it would be relatively simple to scrape. this is live odds, so i am not quite sure what the guy is looking for in a scraper. maybe line movements? otherwise just go there and look at the odds (decimal btw) . i saw at the top of the odds page a bunch of different days - i didn't go in there tho. the only soccer scrapers i wrote before tracked results. the scraping itself is simple. tracking all the different leagues, etc. adds a lot of overhead.

  32. #32
    slobib
    slobib's Avatar Become A Pro!
    Join Date: 07-26-06
    Posts: 43
    Betpoints: 1935

    I would like a scraper, unfortunately i dont have enough posts and cant send PMs. Anyone willing to write itfor a payment please send me a PM for details.
    Thanks.

  33. #33
    aramakilx
    aramakilx's Avatar Become A Pro!
    Join Date: 01-18-13
    Posts: 195
    Betpoints: 5635

    Its impossible to obtain xml feed for sipmle user, just if you have rich site like sbr. What kind of odds do you need: pre-game or live? May be you can look for another bookie?

  34. #34
    slobib
    slobib's Avatar Become A Pro!
    Join Date: 07-26-06
    Posts: 43
    Betpoints: 1935

    Pre-game odds. Maybe from other asian bookies or pinnacle.

  35. #35
    slobib
    slobib's Avatar Become A Pro!
    Join Date: 07-26-06
    Posts: 43
    Betpoints: 1935

    I need it to compare with local bookies and get the best odds fastest.
    There are sites that do this but i think they lack quality and arent optimal.

12 Last
Top