Xmldatadumps-l December 2013

xmldatadumps-l@lists.wikimedia.org

5 participants
4 discussions

pbzip2 proposal
by Richard Jelinek 16 Jan '16

16 Jan '16

Hi, don't know if this issue came up already - in case it did and has been dismissed, I beg your pardon. In case it didn't... I hereby propose, that pbzip2 (https://launchpad.net/pbzip2) is used to compress the xml dumps instead of bzip2. Why? Because its sibling (pbunzip2) has a bug bunzip2 hasn't. :-) Strange? Read on. A few hours ago, I filed a bug report for pbzip2 (see https://bugs.launchpad.net/pbzip2/+bug/922804) together with some test results done even some few hours before that. The results indicate that: bzip2 and pbzip2 are vice-versa compatible each one can create archives, the other one can read. But if it is for uncomressing, only pbzip2 compressed archives are good for pbunzip2. I propose compressing the archives with pbzip2 for the following reasons: 1) If your archiving machines are SMP systems this could lead to a better usage of system ressources (i.e. faster compression). 2) Compression with pbzip2 is harmless for regular users of bunzip2, so everything should run for these people as usual. 3) pbzip2-compressed archives can be uncompressed with pbunzip2 with a speedup that scales nearly linearly with the number of CPUs in the host. So to sum up: It's a no loose and two win situation if you migrate to pbzip2. And that just because pbunzip2 is slightly buggy. Isn't that interesting? :-) cheers, -- Dipl.-Inf. Univ. Richard C. Jelinek PetaMem GmbH - www.petamem.com Geschäftsführer: Richard Jelinek Human Language Technology Experts Sitz der Gesellschaft: Fürth 69216618 Mind Units Registergericht: AG Fürth, HRB-9201

6 11

Missing page in zh.wiktionary.org dump?: https://zh.wiktionary.org/wiki/Template:漢語寫法 or %E6%B1%89%E8%AF%AD%E5%86%99%E6%B3%95
by gnosygnu 20 Dec '13

20 Dec '13

Hi. I'm not sure if this is a dump issue, but I thought I'd start off here. I had a user report a missing page in zh.wiktionary.org: https://sourceforge.net/p/xowa/tickets/291/. It seems that a Main namespace page (學生) references a template (Template:漢語寫法) that is not in the dump. Here are more details: * The reference page is zh.wiktionary.org/wiki/學生<http://zh.wiktionary.org/wiki/%E5%AD%B8%E7%94%9F>(or %E5%AD%B8%E7%94%9F) * It uses a template from https://zh.wiktionary.org/wiki/Template:漢語寫法 or Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95 * When I plug this url into my Firefox address bar, I get redirected to https://zh.wiktionary.org/wiki/Template:汉语写法 or Template:%E6%B1%89%E8%AF%AD%E5%86%99%E6%B3%95 * I can't read Chinese, but Google translate tells me that both terms mean "Chinese wording", so this looks like a redirect to me * However, unlike other redirects, I don't get a "Redirected from" link. - For example, https://zh.wiktionary.org/wiki/Template:En redirects me to Template:en but shows a link for "（重定向自Template:En)" - Template:汉语写法 doesn't show any corresponding "redirected from" link * More importantly, Template:漢語寫法 or Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95 is not in the dump file * I've tried grepping the xml for both 漢語寫法</title> and %E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95</title> but neither return * However, grepping for 汉语写法</title> (the redirected page) does return one match: $ grep zhwiktionary-latest-pages-articles.xml -f zhfind.txt <title>Template:汉语写法</title> Is this a configuration issue with the language / wiki? Or is the page itself defective? Any pointers / help would be appreciated. Thanks.

3 3

Re: [Xmldatadumps-l] [Mediawiki-api] The format of log events dumps is incorrect.
by Jiang BIAN 11 Dec '13

11 Dec '13

xmldatadumps-l is the maillist for this question On Wed, Dec 11, 2013 at 5:12 PM, Liu Chenheng <liuchenheng(a)gmail.com> wrote: > Hi, All, > > I found all the xml files for log event are incorrect, i.e. missing the > last </mediawiki> tag. > > For example: > http://dumps.wikimedia.org/szlwiki/20131210/szlwiki-20131210-pages-logging.… > > Thanks. > Chenheng Liu > > _______________________________________________ > Mediawiki-api mailing list > Mediawiki-api(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > > -- Jiang BIAN This email may be confidential or privileged. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it went to the wrong person. Thanks.

1 1

ENwiktionary database dump issue
by Tina Lukša 10 Dec '13

10 Dec '13

Hello! I am using WikiTaxi for importing wiki databases and their offline usage. I've never had issues before but the latest two English wiktionary databases haven't been working correctly. As seen in the screenshot, the translations from various languages can't be seen - instead I can see only the code. I noticed that first in the datadump from November but thought it was an onetime issue as sometimes happens when I import a fresh database. But the same happened with the latest December dump. The latest datadump that didn't give me any issues was from september, I think. Does anyone have any ideas why this is happening now and how to make it work? Thank you in advance

3 5

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Xmldatadumps-l December 2013