Hi,
Chinese Wikipedia supports a few variants, zh-cn, zh-tw, zh-hk, same wikitext is rendered differently under these variants. e.g. "software" in zh-cn [1] and "software" in zh-tw [2]. But seems no HTML are included in dump file zhwiki.
Do you know where can I get the HTML version of articles on Chinese Wikipedia?
Thanks
[1] http://zh.wikipedia.org/zh-cn/%E8%BD%AF%E4%BB%B6 [2] http://zh.wikipedia.org/zh-tw/%E8%BD%AF%E4%BB%B6
On 20/03/13 15:43, Jiang BIAN wrote:
Hi,
Chinese Wikipedia supports a few variants, zh-cn, zh-tw, zh-hk, same wikitext is rendered differently under these variants. e.g. "software" in zh-cn [1] and "software" in zh-tw [2]. But seems no HTML are included in dump file zhwiki.
That's right. The dumps contain the wikitext source, not the rendered html (both variants are created from that wikitext).
Do you know where can I get the HTML version of articles on Chinese Wikipedia?
It can be rendered with mediawiki: https://www.mediawiki.org/ You may also be interested in https://www.mediawiki.org/wiki/Extension:DumpHTML
If we render the content using MediaWiki softare and the extension, will the content be same as Chinese Wikipedia?
I know some content also rely on the the site config/settings, e.g. parsing the interwiki links like [[:File:Mediawiki.png]], [[:fr:Help:Link]].
Also for Chinese Variants rendering, it seems some particular template/category is needed. e.g. {{noteTA}} is a template defined on zh.wikipedia.org [1], not in en.wikipedia.org [2].
[1] http://zh.wikipedia.org/wiki/Template:NoteTA [2] http://en.wikipedia.org/wiki/Template:NoteTA
On Wed, Mar 20, 2013 at 8:10 AM, Platonides platonides@gmail.com wrote:
On 20/03/13 15:43, Jiang BIAN wrote:
Hi,
Chinese Wikipedia supports a few variants, zh-cn, zh-tw, zh-hk, same wikitext is rendered differently under these variants. e.g. "software" in zh-cn [1] and "software" in zh-tw [2]. But seems no HTML are included in dump file zhwiki.
That's right. The dumps contain the wikitext source, not the rendered html (both variants are created from that wikitext).
Do you know where can I get the HTML version of articles on Chinese Wikipedia?
It can be rendered with mediawiki: https://www.mediawiki.org/ You may also be interested in https://www.mediawiki.org/wiki/Extension:DumpHTML
On 21/03/13 02:50, Jiang BIAN wrote:
If we render the content using MediaWiki softare and the extension, will the content be same as Chinese Wikipedia?
Yes...
I know some content also rely on the the site config/settings, e.g. parsing the interwiki links like [[:File:Mediawiki.png]], [[:fr:Help:Link]].
...but you will need to make a similar configuration as wikipedia, as well as installing most of the extensions used in Chinese wikipedia.
Also for Chinese Variants rendering, it seems some particular template/category is needed. e.g. {{noteTA}} is a template defined on zh.wikipedia.org http://zh.wikipedia.org [1], not in en.wikipedia.org http://en.wikipedia.org [2].
[1] http://zh.wikipedia.org/wiki/Template:NoteTA [2] http://en.wikipedia.org/wiki/Template:NoteTA
zh.wikipedia.org only uses templates defined in zh.wikipedia.org, it is completely independent from the pages at en.wikipedia.org (the only exception is that it can use *files* from commons.wikimedia.org, as if you had instantcommons enabled)
The only fifth exception is Wikidata:
So you need 1. Mediawiki + the same extensions zh:Wikipedia uses 2. The dumps (including cats and templates) 3. Instacommons (hadn't heard of this before, sounds cool) 4. The config setup of zh:Wikipedia, most of which is public (including all the bits you need probably) 5. Some way to talk to Wikidata
All the best, Richard.
On 21/03/2013 16:26, Platonides wrote:
On 21/03/13 02:50, Jiang BIAN wrote:
If we render the content using MediaWiki softare and the extension, will the content be same as Chinese Wikipedia?
Yes...
I know some content also rely on the the site config/settings, e.g. parsing the interwiki links like [[:File:Mediawiki.png]], [[:fr:Help:Link]].
...but you will need to make a similar configuration as wikipedia, as well as installing most of the extensions used in Chinese wikipedia.
Also for Chinese Variants rendering, it seems some particular template/category is needed. e.g. {{noteTA}} is a template defined on zh.wikipedia.org http://zh.wikipedia.org [1], not in en.wikipedia.org http://en.wikipedia.org [2].
[1] http://zh.wikipedia.org/wiki/Template:NoteTA [2] http://en.wikipedia.org/wiki/Template:NoteTA
zh.wikipedia.org only uses templates defined in zh.wikipedia.org, it is completely independent from the pages at en.wikipedia.org (the only exception is that it can use *files* from commons.wikimedia.org, as if you had instantcommons enabled)
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
On 21/03/13 21:02, Richard Farmbrough wrote:
The only fifth exception is Wikidata:
So you need
- Mediawiki + the same extensions zh:Wikipedia uses
- The dumps (including cats and templates)
- Instacommons (hadn't heard of this before, sounds cool)
- The config setup of zh:Wikipedia, most of which is public (including
all the bits you need probably) 5. Some way to talk to Wikidata
All the best, Richard.
Good point. Although at this point he probably doesn't need wikidata yet.
Notes: 2) The category and template links aren't *required* (they can be derived from the wikitext), but if you are going to work with them, downloading the sqls can speed up things.
3) https://www.mediawiki.org/wiki/InstantCommons You only need to put $wgUseInstantCommons = true; to have images work.
Thanks for detailed instructions. A few minor things still not clear to me, inline:
On Thu, Mar 21, 2013 at 1:02 PM, Richard Farmbrough < richard@farmbrough.co.uk> wrote:
The only fifth exception is Wikidata:
So you need
- Mediawiki + the same extensions zh:Wikipedia uses
How can I know what extensions is used on zh:Wikipedia? and what's its config?
- The dumps (including cats and templates)
Looks to me zhwiki-20130315-pages-articles-multistream.xml.bz2http://dumps.wikimedia.org/zhwiki/20130315/zhwiki-20130315-pages-articles-multistream.xml.bz2 is the one I want, right?
- Instacommons (hadn't heard of this before, sounds cool)
Sounds like this is a config setting, right?
- The config setup of zh:Wikipedia, most of which is public (including
all the bits you need probably) 5. Some way to talk to Wikidata
All the best, Richard.
On 21/03/2013 16:26, Platonides wrote:
On 21/03/13 02:50, Jiang BIAN wrote:
If we render the content using MediaWiki softare and the extension, will the content be same as Chinese Wikipedia?
Yes...
I know some content also rely on the the site config/settings,
e.g. parsing the interwiki links like [[:File:Mediawiki.png]], [[:fr:Help:Link]].
...but you will need to make a similar configuration as wikipedia, as well as installing most of the extensions used in Chinese wikipedia.
Also for Chinese Variants rendering, it seems some particular
template/category is needed. e.g. {{noteTA}} is a template defined on zh.wikipedia.org http://zh.wikipedia.org [1], not in en.wikipedia.org http://en.wikipedia.org [2].
[1] http://zh.wikipedia.org/wiki/**Template:NoteTAhttp://zh.wikipedia.org/wiki/Template:NoteTA [2] http://en.wikipedia.org/wiki/**Template:NoteTAhttp://en.wikipedia.org/wiki/Template:NoteTA
zh.wikipedia.org only uses templates defined in zh.wikipedia.org, it is completely independent from the pages at en.wikipedia.org (the only exception is that it can use *files* from commons.wikimedia.org, as if you had instantcommons enabled)
______________________________**_________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.**wikimedia.org Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/xmldatadumps-**lhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
______________________________**_________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.**wikimedia.org Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/xmldatadumps-**lhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
On 22/03/13 01:16, Jiang BIAN wrote:
Thanks for detailed instructions. A few minor things still not clear to me, inline:
On Thu, Mar 21, 2013 at 1:02 PM, Richard Farmbrough <richard@farmbrough.co.uk mailto:richard@farmbrough.co.uk> wrote:
The only fifth exception is Wikidata: So you need 1. Mediawiki + the same extensions zh:Wikipedia uses
How can I know what extensions is used on zh:Wikipedia? and what's its config?
The configuration of the wikipedias is at http://noc.wikimedia.org/conf/ (although not very "clean"...)
For the list of extensions used, I recommend looking at http://zh.wikipedia.org/wiki/Special:Version
2. The dumps (including cats and templates)
Looks to me zhwiki-20130315-pages-articles-multistream.xml.bz2 http://dumps.wikimedia.org/zhwiki/20130315/zhwiki-20130315-pages-articles-multistream.xml.bz2 is the one I want, right?
Yes, either zhwiki-20130330-pages-articles.xml.bz2 or zhwiki-20130330-pages-articles.xml.bz2 should be enough (it's the same content), as you probably only need the articles.
3. Instacommons (hadn't heard of this before, sounds cool)
Sounds like this is a config setting, right?
Right.
4. The config setup of zh:Wikipedia, most of which is public (including all the bits you need probably) 5. Some way to talk to Wikidata All the best, Richard.
Yes, either zhwiki-20130330-pages-articles.xml.bz2 or zhwiki-20130330-pages-articles.xml.bz2 should be enough (it's the same content), as you probably only need the articles.
Those are the same files. Did you mean zhwiki-20130330-pages-articles.xml.7z for the second one?
Petr Onderka [[en:User:Svick]]
Thanks!
On Mon, Apr 15, 2013 at 11:25 PM, Platonides platonides@gmail.com wrote:
On 22/03/13 01:16, Jiang BIAN wrote:
Thanks for detailed instructions. A few minor things still not clear to me, inline:
On Thu, Mar 21, 2013 at 1:02 PM, Richard Farmbrough <richard@farmbrough.co.uk mailto:richard@farmbrough.co.uk> wrote:
The only fifth exception is Wikidata: So you need 1. Mediawiki + the same extensions zh:Wikipedia uses
How can I know what extensions is used on zh:Wikipedia? and what's its config?
The configuration of the wikipedias is at http://noc.wikimedia.org/conf/ (although not very "clean"...)
For the list of extensions used, I recommend looking at http://zh.wikipedia.org/wiki/Special:Version
2. The dumps (including cats and templates)
Looks to me zhwiki-20130315-pages-articles-multistream.xml.bz2 <
http://dumps.wikimedia.org/zhwiki/20130315/zhwiki-20130315-pages-articles-mu... is
the one I want, right?
Yes, either zhwiki-20130330-pages-articles.xml.bz2 or zhwiki-20130330-pages-articles.xml.bz2 should be enough (it's the same content), as you probably only need the articles.
3. Instacommons (hadn't heard of this before, sounds cool)
Sounds like this is a config setting, right?
Right.
4. The config setup of zh:Wikipedia, most of which is public (including all the bits you need probably) 5. Some way to talk to Wikidata All the best, Richard.
xmldatadumps-l@lists.wikimedia.org