Dear Ariel,
I added a function to WP-MIRROR 0.7 what cleans up <title>. It removes the following namespace words from page titles:
Category: 24575 Template: 15082 Wikipedia: 4072 MediaWiki: 520 Help: 108 Module: 27
The number beside each namespace word indicates the number of <title>s found in the dump file `simplewiki-20140220-pages-articles.xml.bz2' that were cleaned up. MediaWiki now does a much better job of rendering the mirror.
Still, it would be nice if the dump files could be fixed.
Sincerely Yours, Kent
On 2/21/14, wp mirror wpmirrordev@gmail.com wrote:
Dear Ariel,
- Problem
The dump files contain a great number of pages where the page_title contains the namespace. These page_titles are imported (via mwxml2sql and wp-mirror) into my database. One consequence: most templates are not expanded by MediaWiki; rather, they are rendered as red-links.
- Example
(shell)$ rsync ftpmirror.your.org::wikimedia-dumps/simplewiki/20140220/simplewiki-20140220-pages-articles.xml.bz2 . (shell)$ bunzip2 simplewiki-20140220-pages-articles.xml.bz2 (shell)$ cat simplewiki-20140220-pages-articles.xml | grep "Template:" | head <title>Template:Stub</title> <title>Template:NPOV</title> <title>Template:Disputed</title> <title>Template:Disambiguation</title> <title>Template:TOC</title> <title>Template:Uw-test1</title> <title>Template:1911</title> <title>Template:Please do not change this line</title> <title>Template:Solar System</title> <title>Template:Months</title> (shell)$ cat simplewiki-20140220-pages-articles.xml | grep "<title>Category:" | head <title>Category:Computer science</title> <title>Category:Sports</title> <title>Category:Athletics</title> <title>Category:Body parts</title> <title>Category:Tools</title> <title>Category:Movies</title> <title>Category:Grammar</title> <title>Category:Mathematics</title> <title>Category:Alphabet</title> <title>Category:Countries</title> (shell)$ cat simplewiki-20140220-pages-articles.xml | grep "<title>Help:" | head <title>Help:How to change pages</title> <title>Help:Minor change</title> <title>Help:User settings</title> <title>Help:Writing articles for Wikipedia</title> <title>Help:Contents</title> <title>Help:Revert a page</title> <title>Help:Editing</title> <title>Help:How to use images</title> <title>Help:How to write simple English articles</title> <title>Help:User preferences help</title>
- Solution
I would like your advice as to where the solution should be attempted:
a) Should the dump file generating process be fixed? b) Should `mwxml2sql' be altered to edit the <title> content? c) Should `wp-mirror' be altered to edit the <title> content? d) Should `wp-mirror' be able to detect and correct such `page_title' content in the underlying database?
Sincerely Yours, Kent
On 2/21/14, gnosygnu gnosygnu@gmail.com wrote:
Hi. I believe the problem is with the import of the [[Template]] pages into the page table
Your SQL output shows the following:
page_title: Template:Ndash
Instead, the page_title should just be "Ndash", not "Template:Ndash". Note that the page is already marked as page_namespace = 10. Also, note that no other namespace (Category, Help, Project, etc) will have a "page_title" with the namespace name in front of it. i.e.: Category "Earth" will be in the page table with a page_title of "Earth" not "Category:Earth"
MediaWiki has code that takes {{Template:A}} and makes it effectively the same as {{A}}. Note that this is just regular page transclusion via namespace. You can do "{{Category:Earth}}" and it will transclude the contents of the page "Category:Earth"
Hope this helps.
On Fri, Feb 21, 2014 at 5:21 PM, wp mirror wpmirrordev@gmail.com wrote:
Dear Sir or Madam,
I am not sure to which person or list I should address this question to.
- Objective
I am in the process of building DEB packages for: WP-MIRROR 0.7, the latest development version of MediaWiki 1.23, and a set of MediaWiki extensions.
The objective is to this: That a page rendered by a mirror should look the same a that page rendered by the WMF site.
- Problem
In the process of testing mirrors, I noticed that many templates were not expanding, and instead being rendered as red-links.
- Example
To illustrate, consider the Ndash template, which appears on many pages such as http://simple.wikipedia.org/wiki/August. It appears in the underlying database:
mysql> select page_id,page_title,rev_len,old_text from simplewiki.page,simplewiki.revision,simplewiki.text where page_id=rev_page and rev_text_id=old_id and page_title like 'Template:Ndash' limit 10\G *************************** 1. row *************************** page_id: 132985 page_title: Template:Ndash rev_len: 65 old_text: –<noinclude> [[Category:Formatting templates]]
</noinclude> 1 row in set (0.25 sec)
- Special:ExpandTemplates
To test the above example ``Template:Ndash'', I use Special:ExpandTemplates.
3.1) Input text
Today is the {{CURRENTDAY}} day.</br> This server is {{SERVER}}, script path {{SCRIPTPATH}}, current MW version {{CURRENTVERSION}}.</br> This site is {{SITENAME}}. Full page name is {{FULLPAGENAME}}.</br>
<table> <tr><th>Template</th><th>Expanded</th><th>page_id</th><th>rev_len</th></tr> <tr><td>Ndash</td><td>{{Ndash}}</td><td>{{PAGEID: Ndash}}</td><td>{{PAGESIZE: Ndash}}</td></tr> <tr><td>Template:Ndash</td><td>{{Template:Ndash}}</td> <td>{{PAGEID: Template:Ndash}}</td><td>{{PAGESIZE: Template:Ndash}}</td></tr> <tr><td>Template:Template:Ndash</td><td>{{Template:Template:Ndash}}</td> <td>{{PAGEID: Template:Template:Ndash}}</td><td>{{PAGESIZE: Template:Template:Ndash}}</td></tr> </table>
3.2) http://simple.wikipedia.site/wiki/Special:ExpandTemplates Preview
Here is the result from the WMF site:
Today is the 21 day. This server is //simple.wikipedia.org, script path /w, current MW version 1.23wmf14 (f8b9201). This site is Wikipedia. Full page name is My template. Template Expanded page_id rev_len Ndash - 0 0 Template:Ndash - 132985 65 Template:Template:Ndash Template:Template:Ndash 0 0
Both {{Ndash}} and {{Template:Ndash}} expand as expected.
3.3) http://simple.wikipedia.site/wiki/Special:ExpandTemplates Preview
Here is the result from the mirrored site:
Today is the 21 day. This server is http://simple.wikipedia.site, script path /w, current MW version 1.23alpha. This site is simplewiki. Full page name is My template. Template Expanded page_id rev_len Ndash Template:Ndash 0 0 Template:Ndash Template:Ndash 0 0 Template:Template:Ndash - 132985 65
Only {{Template:Template:Ndash}} expands!
- Question
Why do I need to prepend an extra ``Template:'' to make the templates work for the mirror?
Better yet: Could someone tell me where in the MediaWiki core I can find the code that takes the template (e.g. {{Ndash}} or {{Template:Ndash}}) and converts it into an SQL query that SELECTs the template expansion from the underlying database?
Sincerely Yours, Kent
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l