Hi,
don't know if this issue came up already - in case it did and has been
dismissed, I beg your pardon. In case it didn't...
I hereby propose, that pbzip2 (https://launchpad.net/pbzip2) is used
to compress the xml dumps instead of bzip2. Why? Because its sibling
(pbunzip2) has a bug bunzip2 hasn't. :-)
Strange? Read on.
A few hours ago, I filed a bug report for pbzip2 (see
https://bugs.launchpad.net/pbzip2/+bug/922804) together with some test
results done even some few hours before that.
The results indicate that:
bzip2 and pbzip2 are vice-versa compatible each one can create
archives, the other one can read. But if it is for uncomressing, only
pbzip2 compressed archives are good for pbunzip2.
I propose compressing the archives with pbzip2 for the following
reasons:
1) If your archiving machines are SMP systems this could lead to a
better usage of system ressources (i.e. faster compression).
2) Compression with pbzip2 is harmless for regular users of bunzip2,
so everything should run for these people as usual.
3) pbzip2-compressed archives can be uncompressed with pbunzip2 with a
speedup that scales nearly linearly with the number of CPUs in the
host.
So to sum up: It's a no loose and two win situation if you migrate to
pbzip2. And that just because pbunzip2 is slightly buggy. Isn't that
interesting? :-)
cheers,
--
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH - www.petamem.com Geschäftsführer: Richard Jelinek
Human Language Technology Experts Sitz der Gesellschaft: Fürth
69216618 Mind Units Registergericht: AG Fürth, HRB-9201
Hi. I'm not sure if this is a dump issue, but I thought I'd start off here.
I had a user report a missing page in zh.wiktionary.org:
https://sourceforge.net/p/xowa/tickets/291/. It seems that a Main namespace
page (學生) references a template (Template:漢語寫法) that is not in the dump.
Here are more details:
* The reference page is
zh.wiktionary.org/wiki/學生<http://zh.wiktionary.org/wiki/%E5%AD%B8%E7%94%9F>(or
%E5%AD%B8%E7%94%9F)
* It uses a template from https://zh.wiktionary.org/wiki/Template:漢語寫法 or
Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95
* When I plug this url into my Firefox address bar, I get redirected to
https://zh.wiktionary.org/wiki/Template:汉语写法 or
Template:%E6%B1%89%E8%AF%AD%E5%86%99%E6%B3%95
* I can't read Chinese, but Google translate tells me that both terms mean
"Chinese wording", so this looks like a redirect to me
* However, unlike other redirects, I don't get a "Redirected from" link.
- For example, https://zh.wiktionary.org/wiki/Template:En redirects me to
Template:en but shows a link for "(重定向自Template:En)"
- Template:汉语写法 doesn't show any corresponding "redirected from" link
* More importantly, Template:漢語寫法 or
Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95 is not in the dump file
* I've tried grepping the xml for both 漢語寫法</title> and
%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95</title> but neither return
* However, grepping for 汉语写法</title> (the redirected page) does return one
match:
$ grep zhwiktionary-latest-pages-articles.xml -f zhfind.txt
<title>Template:汉语写法</title>
Is this a configuration issue with the language / wiki? Or is the page
itself defective?
Any pointers / help would be appreciated.
Thanks.
xmldatadumps-l is the maillist for this question
On Wed, Dec 11, 2013 at 5:12 PM, Liu Chenheng <liuchenheng(a)gmail.com> wrote:
> Hi, All,
>
> I found all the xml files for log event are incorrect, i.e. missing the
> last </mediawiki> tag.
>
> For example:
> http://dumps.wikimedia.org/szlwiki/20131210/szlwiki-20131210-pages-logging.…
>
> Thanks.
> Chenheng Liu
>
> _______________________________________________
> Mediawiki-api mailing list
> Mediawiki-api(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>
>
--
Jiang BIAN
This email may be confidential or privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it went to
the wrong person. Thanks.
Hello! I am using WikiTaxi for importing wiki databases and their
offline usage. I've never had issues before but the latest two English
wiktionary databases haven't been working correctly. As seen in the
screenshot, the translations from various languages can't be seen -
instead I can see only the code.
I noticed that first in the datadump from November but thought it was
an onetime issue as sometimes happens when I import a fresh database.
But the same happened with the latest December dump.
The latest datadump that didn't give me any issues was from september, I think.
Does anyone have any ideas why this is happening now and how to make it work?
Thank you in advance