Quote:
Originally Posted by ShamsWoof10
What's the advantage to scraping and what's the advantage to xml feeds..? It looks like a buch of bionary code... Forgive me for being unaware but I can't make heads or tails of it... Just looks like bionary code..
|
Well first off,
binary code will look something this:
Quote:
|
00000000 00010001 00000100 11111111 00010011 00000111 00000100 00010010 00000100 11111111 00001111 00000100 00001110 00001111 00001011 00000100 11111111 00010001 00000100 00000000 00001011 00001011 00011000 11111111 00010011 00000111 00000000 00010011 11111111 00000101 00010100 00000010 00001010 00001000 00001101 00000110 11111111 00010010 00010011 00010100 00001111 00001000 00000011 11111111 11111111
|
While XML may not always be transparently human readable it's still much higher level and easier to visually parse than binary (not that a properly motivated person couldn't read binary with a modest effort ...)
The advantage to an XML feed over scraping is that it's provided by design and follows a guaranteed format. HTML pages may will frequently have their formatting modified and will so without warning or description . Furthermore, many companies look askance and those scraping their pages. XML should be the default choice.
As a general rule, there really is no advantage to scraping over XML. However, there may well be
case-specific reasons why XML could be preferable.
For example:
- Pinnacle's XML feed does not provide access to the lines reflected in its drop-down box prices. If you wanted to automatically read this data you couldn't do so using their standard XML and instead would need to scrape the HTML.
- A sportsbook may not devote the same quality of technology to its XML provision server as it might to its HTML provision server (or servers). The reason for this is obvious. It's more profitable to invest resources to a website used by nearly all of its customers, than to a feed used by a relatively small number. One effect of this could be what I mentioned in the first post of this thread; namely that accessing the feed could become extremely slow process. Another effect could be an unreliable XML feed that often crashed.
- A company can effectively exert a lot more control over its XML feed than over its HTML.. The reason is b/c if a company sees you accessing its feed, it knows exactly what you're doing and can hence forbid you from accessing it too frequently. If you're scanning a web page, however, a book can't know for certain what you're doing and will typically allow you more latitude in terms of query frequency.
- Data providers that provide HTML data, sometimes charge money for dedicated XML feeds.
Anyway, this is by no means a complete list, but is just a few thoughts off the top of my head. These reasons (which may not even exist in any particular circumstance). may or may not outweigh the pronounced structural advantages of XML. Use XML unless you have a good reason not to.