Sorry, now correctly cross posted. Emmanuel
-------- Original Message -------- Subject: WMF XML dump title case problem Date: Sun, 26 Jun 2011 17:07:19 +0200 From: Emmanuel Engelhart emmanuel@engelhart.org To: Mailing list for Wikimedia CH wikimediach-l@lists.wikimedia.org, offline-l@lists.wikimedia.org
Hi
Titles should be stored in the table "page" with a first letter uppercased. http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_restr...
Unfortunately, it seems that we have XML dumps (and consequently mwdumper generated SQL) containing titles with a first letter lowercased.
For example: $wget http://download.wikimedia.org/mywiktionary/20110617/mywiktionary-20110617-pa... $bzip2 -d -c mywiktionary-20110617-pages-articles.xml.bz2 | grep "<title>"| grep tationery | more <title>stationery</title> <title>stationery shop</title>
Is that a bug?
Regards Emmanuel