Hi,
I saw this thread back in October where someone was having trouble importing the English Wikipedia XML dump: http://lists.wikimedia.org/pipermail/wikitech-l/2009-October/045594.html The thread back in October seemed to end without resolution, and the tools still seem to be broken, so has anyone found a solution in the meantime?
I'm using mediawiki-1.15.1 and attempting to import enwiki-20100130-pages-articles.xml.bz2.
None of these options seem to work: 1) importDump.php fails by spewing "Warning: xml_parse(): Unable to call handler in_() in ./includes/Import.php on line 437" repeatedly
2) xml2sql (http://meta.wikimedia.org/wiki/Xml2sql): Fails with error: xml2sql: parsing aborted at line 33 pos 16. due to the new <redirect> tag introduced in the new dumps?
3) mwdumper (http://www.mediawiki.org/wiki/MWDumper): Current XML is schema v0.4, but the documentation says that it's for 0.3
4) mwimport (http://meta.wikimedia.org/wiki/Data_dumps/mwimport): Fails immediately: siteinfo: untested generator 'MediaWiki 1.16alpha-wmf', expect trouble ahead page: expected closing tag in line 35
Any tips? Thanks! Eric
On Thu, Feb 4, 2010 at 9:12 PM, Eric Sun esun@cs.stanford.edu wrote:
Hi,
I saw this thread back in October where someone was having trouble importing the English Wikipedia XML dump: http://lists.wikimedia.org/pipermail/wikitech-l/2009-October/045594.html The thread back in October seemed to end without resolution, and the tools still seem to be broken, so has anyone found a solution in the meantime?
I'm using mediawiki-1.15.1 and attempting to import enwiki-20100130-pages-articles.xml.bz2.
None of these options seem to work:
- importDump.php
fails by spewing "Warning: xml_parse(): Unable to call handler in_() in ./includes/Import.php on line 437" repeatedly
- xml2sql (http://meta.wikimedia.org/wiki/Xml2sql):
Fails with error: xml2sql: parsing aborted at line 33 pos 16. due to the new <redirect> tag introduced in the new dumps?
- mwdumper (http://www.mediawiki.org/wiki/MWDumper):
Current XML is schema v0.4, but the documentation says that it's for 0.3
Fails immediately: siteinfo: untested generator 'MediaWiki 1.16alpha-wmf', expect trouble ahead page: expected closing tag in line 35
Any tips? Thanks! Eric
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Most of these errors are caused by the new(ish) <redirect /> tag within <page> elements. 0.4 is the correct version of the schema, but unfortunately the schema was updated and dumps were produced using them before the changes made it into a release.
1.15.1 cannot import pages with <redirect />, we should probably backport that. That, and we should rewrite the importers to not barf terribly when they encounter an unknown element.
-Chad
I am still able to import the dumps using the old mwDumper (modified to fix the contributor) and xml2SQL works also and it is quiet fast. importDump.php continues after it breaks I think.
bilal -- Verily, with hardship comes ease.
On Thu, Feb 4, 2010 at 9:24 PM, Chad innocentkiller@gmail.com wrote:
On Thu, Feb 4, 2010 at 9:12 PM, Eric Sun esun@cs.stanford.edu wrote:
Hi,
I saw this thread back in October where someone was having trouble importing the English Wikipedia XML dump: http://lists.wikimedia.org/pipermail/wikitech-l/2009-October/045594.html The thread back in October seemed to end without resolution, and the tools still seem to be broken, so has anyone found a solution in the meantime?
I'm using mediawiki-1.15.1 and attempting to import enwiki-20100130-pages-articles.xml.bz2.
None of these options seem to work:
- importDump.php
fails by spewing "Warning: xml_parse(): Unable to call handler in_() in ./includes/Import.php on line 437" repeatedly
- xml2sql (http://meta.wikimedia.org/wiki/Xml2sql):
Fails with error: xml2sql: parsing aborted at line 33 pos 16. due to the new <redirect> tag introduced in the new dumps?
- mwdumper (http://www.mediawiki.org/wiki/MWDumper):
Current XML is schema v0.4, but the documentation says that it's for 0.3
Fails immediately: siteinfo: untested generator 'MediaWiki 1.16alpha-wmf', expect trouble
ahead
page: expected closing tag in line 35
Any tips? Thanks! Eric
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Most of these errors are caused by the new(ish) <redirect /> tag within <page> elements. 0.4 is the correct version of the schema, but unfortunately the schema was updated and dumps were produced using them before the changes made it into a release.
1.15.1 cannot import pages with <redirect />, we should probably backport that. That, and we should rewrite the importers to not barf terribly when they encounter an unknown element.
-Chad
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Would it be safe to strip out the <redirect /> tags from the xml and reimport, or will that cause other problems?
Thanks, Eric
On Thu, Feb 4, 2010 at 6:24 PM, Chad innocentkiller@gmail.com wrote:
On Thu, Feb 4, 2010 at 9:12 PM, Eric Sun esun@cs.stanford.edu wrote:
Hi,
I saw this thread back in October where someone was having trouble importing the English Wikipedia XML dump: http://lists.wikimedia.org/pipermail/wikitech-l/2009-October/045594.html The thread back in October seemed to end without resolution, and the tools still seem to be broken, so has anyone found a solution in the meantime?
I'm using mediawiki-1.15.1 and attempting to import enwiki-20100130-pages-articles.xml.bz2.
None of these options seem to work:
- importDump.php
fails by spewing "Warning: xml_parse(): Unable to call handler in_() in ./includes/Import.php on line 437" repeatedly
- xml2sql (http://meta.wikimedia.org/wiki/Xml2sql):
Fails with error: xml2sql: parsing aborted at line 33 pos 16. due to the new <redirect> tag introduced in the new dumps?
- mwdumper (http://www.mediawiki.org/wiki/MWDumper):
Current XML is schema v0.4, but the documentation says that it's for 0.3
Fails immediately: siteinfo: untested generator 'MediaWiki 1.16alpha-wmf', expect trouble
ahead
page: expected closing tag in line 35
Any tips? Thanks! Eric
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Most of these errors are caused by the new(ish) <redirect /> tag within <page> elements. 0.4 is the correct version of the schema, but unfortunately the schema was updated and dumps were produced using them before the changes made it into a release.
1.15.1 cannot import pages with <redirect />, we should probably backport that. That, and we should rewrite the importers to not barf terribly when they encounter an unknown element.
-Chad
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Yes, it was safe in my case (import of Russian and English Wiktionary). See http://meta.wikimedia.org/wiki/Talk:Xml2sql and example of script or shell command to strip out the <redirect />
-- Andrew.
On Fri, Feb 5, 2010 at 6:38 AM, Eric Sun esun@cs.stanford.edu wrote:
Would it be safe to strip out the <redirect /> tags from the xml and reimport, or will that cause other problems?
Thanks, Eric
On Thu, Feb 4, 2010 at 6:24 PM, Chad innocentkiller@gmail.com wrote:
On Thu, Feb 4, 2010 at 9:12 PM, Eric Sun esun@cs.stanford.edu wrote:
Hi,
I saw this thread back in October where someone was having trouble importing the English Wikipedia XML dump: http://lists.wikimedia.org/pipermail/wikitech-l/2009-October/045594.html The thread back in October seemed to end without resolution, and the tools still seem to be broken, so has anyone found a solution in the meantime?
I'm using mediawiki-1.15.1 and attempting to import enwiki-20100130-pages-articles.xml.bz2.
None of these options seem to work:
- importDump.php
fails by spewing "Warning: xml_parse(): Unable to call handler in_() in ./includes/Import.php on line 437" repeatedly
- xml2sql (http://meta.wikimedia.org/wiki/Xml2sql):
Fails with error: xml2sql: parsing aborted at line 33 pos 16. due to the new <redirect> tag introduced in the new dumps?
- mwdumper (http://www.mediawiki.org/wiki/MWDumper):
Current XML is schema v0.4, but the documentation says that it's for 0.3
Fails immediately: siteinfo: untested generator 'MediaWiki 1.16alpha-wmf', expect trouble
ahead
page: expected closing tag in line 35
Any tips? Thanks! Eric
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Most of these errors are caused by the new(ish) <redirect /> tag within <page> elements. 0.4 is the correct version of the schema, but unfortunately the schema was updated and dumps were produced using them before the changes made it into a release.
1.15.1 cannot import pages with <redirect />, we should probably backport that. That, and we should rewrite the importers to not barf terribly when they encounter an unknown element.
-Chad
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I stripped out the <redirect />'s and imported enwiki using xml2sql, but none of the templates rendered correctly--for example, navigating to /The_Matrix results in a page with lots of mediawiki source like
{{#if: |This {{#ifeq:||article|page}} is about . }}For {{#if:the series|the series|other uses}}, see {{#if:The Matrix (franchise)|The Matrix (franchise){{#ifeq:the setting|and| and {{#if:Matrix (fictional universe)|Matrix (fictional
Any ideas if this is a known problem with xml2sql, or did something get corrupted during my import? I haven't yet tried importDump.php because it seems to be extremely slow (can only import a few pages per second)
Eric
On Fri, Feb 5, 2010 at 1:13 AM, Andrew Krizhanovsky andrew.krizhanovsky@gmail.com wrote:
Yes, it was safe in my case (import of Russian and English Wiktionary). See http://meta.wikimedia.org/wiki/Talk:Xml2sql and example of script or shell command to strip out the <redirect />
-- Andrew.
On Fri, Feb 5, 2010 at 6:38 AM, Eric Sun esun@cs.stanford.edu wrote:
Would it be safe to strip out the <redirect /> tags from the xml and reimport, or will that cause other problems?
Thanks, Eric
On Thu, Feb 4, 2010 at 6:24 PM, Chad innocentkiller@gmail.com wrote:
On Thu, Feb 4, 2010 at 9:12 PM, Eric Sun esun@cs.stanford.edu wrote:
Hi,
I saw this thread back in October where someone was having trouble importing the English Wikipedia XML dump: http://lists.wikimedia.org/pipermail/wikitech-l/2009-October/045594.html The thread back in October seemed to end without resolution, and the tools still seem to be broken, so has anyone found a solution in the meantime?
I'm using mediawiki-1.15.1 and attempting to import enwiki-20100130-pages-articles.xml.bz2.
None of these options seem to work:
- importDump.php
fails by spewing "Warning: xml_parse(): Unable to call handler in_() in ./includes/Import.php on line 437" repeatedly
- xml2sql (http://meta.wikimedia.org/wiki/Xml2sql):
Fails with error: xml2sql: parsing aborted at line 33 pos 16. due to the new <redirect> tag introduced in the new dumps?
- mwdumper (http://www.mediawiki.org/wiki/MWDumper):
Current XML is schema v0.4, but the documentation says that it's for 0.3
Fails immediately: siteinfo: untested generator 'MediaWiki 1.16alpha-wmf', expect trouble
ahead
page: expected closing tag in line 35
Any tips? Thanks! Eric
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Most of these errors are caused by the new(ish) <redirect /> tag within <page> elements. 0.4 is the correct version of the schema, but unfortunately the schema was updated and dumps were produced using them before the changes made it into a release.
1.15.1 cannot import pages with <redirect />, we should probably backport that. That, and we should rewrite the importers to not barf terribly when they encounter an unknown element.
-Chad
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 8 February 2010 06:44, Eric Sun esun@cs.stanford.edu wrote:
I stripped out the <redirect />'s and imported enwiki using xml2sql, but none of the templates rendered correctly--for example, navigating to /The_Matrix results in a page with lots of mediawiki source like
{{#if: |This {{#ifeq:||article|page}} is about . }}For {{#if:the series|the series|other uses}}, see {{#if:The Matrix (franchise)|The Matrix (franchise){{#ifeq:the setting|and| and {{#if:Matrix (fictional universe)|Matrix (fictional
You do not have all required MediaWiki extensions installed. Wikipedia uses many extensions (see http://en.wikipedia.org/wiki/Special:Version), some of them are required for proper rendering, especially ParserFunctions (http://www.mediawiki.org/wiki/Extension:ParserFunctions), as is the problem here (another important extension, off the top of my head, is the Cite extension – http://www.mediawiki.org/wiki/Extension:Cite – which provides for <ref> and <references>).
-- [[cs:User:Mormegil | Petr Kadlec]]
Thanks Petr. I installed the appropriate versions of the Parser hook extensions that look relevant: CategoryTree (Version r48218) CharInsert (Version r36357) Cite (Version r47190) InputBox (Version r42791) ParserFunctions (Version 1.1.1)
I'm using MediaWiki 1.15.1 and I imported the dump using xml2sql. Most enwiki pages render correctly, but a bunch of pages (e.g. Jennifer_Garner) show spurious </span> tags (inspecting the page source show a bunch of </span>).
Template:Marriage looks to be one of the offending templates (though the wikimarkup of that page matches the version on en.wikipedia.org).
I guess the problem could be:
1) bug caused by xml2sql 2) I'm missing an un-obvious extension 3) bug in one of the many templates that's included in Template:Marriage that has since been fixed on en.wikipedia.org
Has anyone seen this problem before?
Thanks, Eric
On Mon, Feb 8, 2010 at 1:30 AM, Petr Kadlec petr.kadlec@gmail.com wrote:
On 8 February 2010 06:44, Eric Sun esun@cs.stanford.edu wrote:
I stripped out the <redirect />'s and imported enwiki using xml2sql, but none of the templates rendered correctly--for example, navigating to /The_Matrix results in a page with lots of mediawiki source like
{{#if: |This {{#ifeq:||article|page}} is about . }}For {{#if:the series|the series|other uses}}, see {{#if:The Matrix (franchise)|The Matrix (franchise){{#ifeq:the setting|and| and {{#if:Matrix (fictional universe)|Matrix (fictional
You do not have all required MediaWiki extensions installed. Wikipedia uses many extensions (see http://en.wikipedia.org/wiki/Special:Version), some of them are required for proper rendering, especially ParserFunctions (http://www.mediawiki.org/wiki/Extension:ParserFunctions), as is the problem here (another important extension, off the top of my head, is the Cite extension – http://www.mediawiki.org/wiki/Extension:Cite – which provides for <ref> and <references>).
-- [[cs:User:Mormegil | Petr Kadlec]]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi,
On Sun, Feb 14, 2010 at 11:03 AM, Eric Sun esun@cs.stanford.edu wrote:
I'm using MediaWiki 1.15.1 and I imported the dump using xml2sql. Most enwiki pages render correctly, but a bunch of pages (e.g. Jennifer_Garner) show spurious </span> tags (inspecting the page source show a bunch of </span>).
Are you using $wgUseTidy? It is an HTML cleanup process that is always enabled on WMF projects. Since it it is there, template creators often miss closing spans and other things, or leave in extra close tags, and never notice because the tidy fixes them. Would not surprise me a bit if dozens of en.wp templates have such errors; I find them occasionally on en.wikt.
Robert
That solved the problem. Thanks!
On Sun, Feb 14, 2010 at 4:00 AM, Robert Ullmann rlullmann@gmail.com wrote:
Hi,
On Sun, Feb 14, 2010 at 11:03 AM, Eric Sun esun@cs.stanford.edu wrote:
I'm using MediaWiki 1.15.1 and I imported the dump using xml2sql. Most enwiki pages render correctly, but a bunch of pages (e.g. Jennifer_Garner) show spurious </span> tags (inspecting the page source show a bunch of </span>).
Are you using $wgUseTidy? It is an HTML cleanup process that is always enabled on WMF projects. Since it it is there, template creators often miss closing spans and other things, or leave in extra close tags, and never notice because the tidy fixes them. Would not surprise me a bit if dozens of en.wp templates have such errors; I find them occasionally on en.wikt.
Robert
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Even after setting $wgUseTidy = true, many of my pages show an error "Expression error: Missing operand for >" at the bottom in the References section. It looks like an error message produced by the ParserFunctions package. I've never seen this on en.wikipedia.org so I wonder if some customization or other cleanup has been done.
I'm running MediaWiki 1.15.1 and the versions of Cite and ParserFunctions that correspond to 1.15.x. Is this something easily fixable?
Thanks, Eric
On Sun, Feb 14, 2010 at 9:49 AM, Eric Sun esun@cs.stanford.edu wrote:
That solved the problem. Thanks!
On Sun, Feb 14, 2010 at 4:00 AM, Robert Ullmann rlullmann@gmail.com wrote:
Hi,
On Sun, Feb 14, 2010 at 11:03 AM, Eric Sun esun@cs.stanford.edu wrote:
I'm using MediaWiki 1.15.1 and I imported the dump using xml2sql. Most enwiki pages render correctly, but a bunch of pages (e.g. Jennifer_Garner) show spurious </span> tags (inspecting the page source show a bunch of </span>).
Are you using $wgUseTidy? It is an HTML cleanup process that is always enabled on WMF projects. Since it it is there, template creators often miss closing spans and other things, or leave in extra close tags, and never notice because the tidy fixes them. Would not surprise me a bit if dozens of en.wp templates have such errors; I find them occasionally on en.wikt.
Robert
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Tue, Feb 16, 2010 at 10:42 PM, Eric Sun esun@cs.stanford.edu wrote:
Even after setting $wgUseTidy = true, many of my pages show an error "Expression error: Missing operand for >" at the bottom in the References section.
This is a ParserFunctions-related problem.
It looks like an error message produced by the ParserFunctions package. I've never seen this on en.wikipedia.org so I wonder if some customization or other cleanup has been done.
I'm running MediaWiki 1.15.1 and the versions of Cite and ParserFunctions that correspond to 1.15.x. Is this something easily fixable?
Wikipedia uses the latest alpha versions, in this case 1.16alpha. It's possible that some feature was added in more recent ParserFunctions versions that's incompatible. But I can't actually see any changes to Expr.php since the 1.15 branch, or any relevant ParserFunctions changes at all.
Do you have links to pages on your site that show the error? It should be possible to track down exactly what's causing it using Special:ExpandTemplates, if no one has any ideas.
Here's an example of an offending template: {{cite journal |title=Mrs. Obama goes to Washington |author=Slevin, Peter |date=March 18, 2009 |journal=[[Princeton Alumni Weekly]] |volume=109 |number=10 |pages=18–22}}
From Special:ExpandTemplates, this renders as
Slevin, Peter (March 18, 2009). [Expression error: Missing operand for
"Mrs. Obama goes to Washington"]. Princeton Alumni Weekly 109 (10):
18–22.
The error goes away if I remove "|title=Mrs. Obama goes to Washington". The error remains if I change it to anything else, such as "|title=Obama".
I thought it might be a problem with the Cite extension, but updating with the latest trunk version of Cite didn't help.
The full XML output & result from Special:ExpandTemplates is below. How odd... <root><template><title>cite journal </title><part><name>title</name>=<value>Mrs. Obama goes to Washington </value></part><part><name>author</name>=<value>Slevin, Peter </value></part><part><name>date</name>=<value>March 18, 2009 </value></part><part><name>journal</name>=<value>[[Princeton Alumni Weekly]] </value></part><part><name>volume</name>=<value>109 </value></part><part><name>number</name>=<value>10 </value></part><part><name>pages</name>=<value>18–22</value></part></template></root>
<span class="citation Journal"
Slevin, Peter (March 18, 2009). [<strong class="error">Expression error: Missing operand for ></strong> "Mrs. Obama goes to Washington"]. ''<nowiki />[[Princeton Alumni Weekly]]<nowiki />'' '''<nowiki />109<nowiki />''' (10)<nowiki>: </nowiki> 18–22.</span><span
class="Z3988" title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Mrs.+Obama+goes+to+Washington&rft.jtitle=%5B%5BPrinceton+Alumni+Weekly%5D%5D&rft.aulast=Slevin%2C+Peter&rft.au=Slevin%2C+Peter&rft.date=March+18%2C+2009&rft.volume=109&rft.issue=10&rft.pages=18%E2%80%9322&rfr_id=info:sid/en.wikipedia.org:Special:ExpandTemplates"><span style="display: none;"> </span></span>
On Wed, Feb 17, 2010 at 10:06 AM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
On Tue, Feb 16, 2010 at 10:42 PM, Eric Sun esun@cs.stanford.edu wrote:
Even after setting $wgUseTidy = true, many of my pages show an error "Expression error: Missing operand for >" at the bottom in the References section.
This is a ParserFunctions-related problem.
It looks like an error message produced by the ParserFunctions package. I've never seen this on en.wikipedia.org so I wonder if some customization or other cleanup has been done.
I'm running MediaWiki 1.15.1 and the versions of Cite and ParserFunctions that correspond to 1.15.x. Is this something easily fixable?
Wikipedia uses the latest alpha versions, in this case 1.16alpha. It's possible that some feature was added in more recent ParserFunctions versions that's incompatible. But I can't actually see any changes to Expr.php since the 1.15 branch, or any relevant ParserFunctions changes at all.
Do you have links to pages on your site that show the error? It should be possible to track down exactly what's causing it using Special:ExpandTemplates, if no one has any ideas.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi,
There is a note here: http://www.mediawiki.org/wiki/Extension:ParserFunctions saying you should use a different version of ParserFunctions for 1.15.1. I'm not at all sure what that actually means; but the problem definitely seems to be within ParserFunctions ...
Robert
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 2/18/2010 5:30 AM, Robert Ullmann wrote:
Hi,
There is a note here: http://www.mediawiki.org/wiki/Extension:ParserFunctions saying you should use a different version of ParserFunctions for 1.15.1. I'm not at all sure what that actually means; but the problem definitely seems to be within ParserFunctions ...
Use the version for your install. If you dont know how to do that from the subversion server[1] you can use the Extension Distributor[2].
1 - http://svn.wikimedia.org/svnroot/mediawiki/tags/REL1_15_1/extensions/ParserF... 2 - http://www.mediawiki.org/wiki/Special:ExtensionDistributor/ParserFunctions
On Sun, Feb 14, 2010 at 1:00 PM, Robert Ullmann rlullmann@gmail.com wrote:
Are you using $wgUseTidy? It is an HTML cleanup process that is always enabled on WMF projects. Since it it is there, template creators often miss closing spans and other things, or leave in extra close tags, and never notice because the tidy fixes them. Would not surprise me a bit if dozens of en.wp templates have such errors; I find them occasionally on en.wikt.
What about turning wgUseTidy off for some time? Maybe some night hours... so that our template magicians are forced to clean up the templates and the other crap which resides deep buried into the wikitext. It is not a 3rd user's responsibility to install extra software just to be able to render our content... or at least, it shouldn't be. We should not artificially make it difficult to get our content forked or duplicated or elsewhile re-used.
Marco
On Sun, Feb 14, 2010 at 7:34 PM, Marco Schuster marco@harddisk.is-a-geek.org wrote:
What about turning wgUseTidy off for some time? Maybe some night hours... so that our template magicians are forced to clean up the templates and the other crap which resides deep buried into the wikitext.
They would just complain until we turned it back on. Unless we left it off for good, which might be reasonable, but then we'd have to improve Sanitizer, I guess. (Maybe once html5lib is more usable.)
We should not artificially make it difficult to get our content forked or duplicated or elsewhile re-used.
This is not artificial. Tidy serves a useful purpose. It might be less useful when we switch to HTML5, though -- my impression is that we mainly use it so that our HTML validates, but our HTML5 won't reliably validate in any case.
On Mon, Feb 15, 2010 at 2:30 AM, Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com
wrote:
On Sun, Feb 14, 2010 at 7:34 PM, Marco Schuster marco@harddisk.is-a-geek.org wrote:
What about turning wgUseTidy off for some time? Maybe some night hours...
so
that our template magicians are forced to clean up the templates and the other crap which resides deep buried into the wikitext.
They would just complain until we turned it back on. Unless we left it off for good, which might be reasonable, but then we'd have to improve Sanitizer, I guess. (Maybe once html5lib is more usable.)
Why? Why must software take care of the crap that users do? Either we force them to write proper code, or they never will. It's like with PHP, with the difference that we still have the option to make our templatewriters and coders learn, instead of having to support stone-age stuff with crappy workarounds. Every single extra step that has to be taken to get a WP fork up and running is one step too much.
Marco
On Sun, Feb 14, 2010 at 8:40 PM, Marco Schuster marco@harddisk.is-a-geek.org wrote:
Why? Why must software take care of the crap that users do? Either we force them to write proper code, or they never will.
So they never will. So what? They're supposed to be writing an encyclopedia, they're not required or expected to be HTML experts. Ideally, they wouldn't have to touch HTML at all, and could do all their editing in a WYSIWYG editor. Allowing HTML but tolerating mistakes is the least we can do to make their lives easier.
Every single extra step that has to be taken to get a WP fork up and running is one step too much.
That's a separate issue. I agree that it would be nice if we had less of a Tidy dependency, for multiple reasons.
On Sun, Feb 14, 2010 at 7:34 PM, Marco Schuster marco@harddisk.is-a-geek.org wrote:
What about turning wgUseTidy off for some time?
The doctype that we serve is XHTML, and various AJAX tools rely on being able to parse the DOM tree as an XML document. But there are certain valid wikitext constructions that are ''guaranteed'' to generate invalid XML without tidy, because of mediawiki bugs. For example, putting a list inside a table cell (bug 17486). So tidy seems to be a requirement for the time being.
I hope that, before the doctype is changed to html5, a substantial grace period is given for people to change to an HTML5 parser in their javascript code.
One high-profile use case here is the Twinkle script.
- Carl
On Mon, Feb 15, 2010 at 2:19 PM, Carl (CBM) cbm.wikipedia@gmail.com wrote:
I hope that, before the doctype is changed to html5, a substantial grace period is given for people to change to an HTML5 parser in their javascript code.
We will continue with well-formed XML output for the foreseeable future for exactly this reason. We don't need to stop emitting well-formed XML to use HTML5. I have tested trunk defaults ($wgHtml5 = true; $wgWellFormedXml = true;) with the Python SAX parser and it parses pages correctly, so no tools should be broken.
wikitech-l@lists.wikimedia.org