On Tue, May 3, 2016 at 2:43 AM, Max Semenik maxsem.wiki@gmail.com wrote:
At this point, I would say that everybody who screen-scrapes saw it coming and breaking them is a good thing as sometimes, lessons just have to be learned.
There aren't many options other than content-scraping if you want to transform Wikipedia articles into some semblance of structured data. We even do it ourselves, for media metadata (and use an XML parser for it, as PHP doesn't offer much in the way of parsing HTML5, so outputting HTML5-style empty tags might break it - although IIRC there is a hack to work around that as file pages can contain ill-formed HTML anyway).