Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;

3 May 2016


      On Tue, May 3, 2016 at 2:43 AM, Max Semenik maxsem.wiki@gmail.com wrote:
...
At this point, I would say that everybody who screen-scrapes saw it coming
and breaking them is a good thing as sometimes, lessons just have to be
learned.
There aren't many options other than content-scraping if you want to
transform Wikipedia articles into some semblance of structured data. We
even do it ourselves, for media metadata (and use an XML parser for it, as
PHP doesn't offer much in the way of parsing HTML5, so outputting
HTML5-style empty tags might break it - although IIRC there is a hack to
work around that as file pages can contain ill-formed HTML anyway).

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;