Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;

3 May 2016


      On Tue, May 3, 2016 at 4:34 PM, Gergo Tisza gtisza@wikimedia.org wrote:
...
There aren't many options other than content-scraping if you want to
transform Wikipedia articles into some semblance of structured data. We
even do it ourselves, for media metadata (and use an XML parser for it
Actually the XML parser has been replaced with DOMDocument a while ago,
which can handle HTML5 fine. But the point stands: HTML scraping is hardly
an unusual requirement for reusers of our content.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;