> Date: Fri, 19 Feb 2010 18:25:50 +0100
> From: Tomasz Finc <tfinc(a)wikimedia.org>
> Subject: Re: [Wikitech-l] enwiki complete page edit history
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
> Message-ID: <4B7EC99E.4040907(a)wikimedia.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> >
> > The pages-meta-history.xml.bz2 is showing 115.4GB written (in
> progress) at:
> > http://download.wikipedia.org/enwiki/20100130/
> >
> > The older pages-meta-history.xml.bz2 from
> http://download.wikipedia.org/enwiki/20091128/> shows 255.1GB
> written (failed build)
> >
> > So once the 20100130 current pages-meta-history.xml.bz2 dump
> is finished writing, will it be over 255GB
> > as it is newer than the older copy and contains more info?
>
> Correct.
>
> >
> > Also these big files aren't weblinked for download lately I
> noticed. I think they should be as they contain
> > the full wikipedia history/discussion pages which have
> humongous amounts of useful information that should be
> > available for easy distribution. What is the
> reason they aren't
> weblinked, the bandwidth costs?
>
> Do you mean that the failed runs aren't web linked? If so then
> I'd
> rather not point people to corrupted files.
Hi Tomasz,
I don't think there are any (failed or successful) weblinked "pages-meta-history.xml.bz2" or "pages-meta-history.xml.7z" files for the enwiki on the wikimedia download server. I think there must be a successful enwiki "pages-meta-history" from 2009 floating around somewhere, I think that the last successful dump (guessing Sept 2009?) should always be linked for download. If you have a copy of the latest successful build of "pages-meta-history" (.bz2 or .7z) for enwiki I'd appreciate it if you posted a link, thanks
cheers,
Jamie
>
> --tomasz
>
Hi all, I'm new into the list. Please be patient if my questions are not soo
deep or have been discussed previously (I'm mainly a "it.wikisourcian").
1. As I see, .png images from <math> tag have a white background. This is
disturbing if such images are posted into a coloured background page. Should
it be possible to replace the white background with a transparent
background?
2. I know that <math> tag manages css style, so that it's easy to
redim/align the resulring .png image (a very useful trick for inline, simple
formulas). Nevertheless, the default font is ruined by a redim, since its
light graphic with subtle tracts. Is there a trick so solve this issue?
Thanks!
--
Alex
Hi,
There hasn't been a successful pages-meta-history.xml.bz2 or pages-meta-history.xml.7z dump from the http://download.wikimedia.org/enwiki/ site in the last 5 dumps. How is the new dump system coming along for these large wiki files? I personally am a bit concerned that these files haven't been available for ~4months at least, maybe publicize the problems to get more feedback on how to fix it instead of just telling us that:
The current dump system is not sustainable on very large wikis and is
being replaced. You'll hear about it when we have the new one in place. :)
-- brion
Sorry for complaining, but it has been a long time that this is broken, what are the details of the problem?
I hope you guys are planning on adding some way to download the wikimedia commons images too at some point, ie. I was thinking a multifile torrent could work, with images from enwiki in one file, and the other wikis in other files. Also the enwiki images could also be in subcategories of popularity based on access.log files, so a space restricted user might only download the folder labelled "top10percent" and then get the top ten percent most popular images in wikipedia, which could still make a pretty complete encyclopedia for most offline users and save 90% of the disk space. Creating a multifile torrent like this is standard, ie. if you download from piratebay you know what I mean. The only drawback with the torrents is the lack of geographical awareness for the data transfer as someone mentioned before, but I think the decentralized nature of bittorrent with many possible uploaders makes this irrelevant as wikimedia won't be paying for the bandwidth if other people choose to help seed the torrents anyway.
What about www.wikipirate.org or wikitorrent.org for a list of wikimedia torrents? :)
both are available!
cheers,
Jamie
Date: Wed, 17 Feb 2010 05:01:43 +0100
From: Tomasz Finc <tfinc(a)wikimedia.org>
Subject: Re: [Wikitech-l] enwiki complete page edit history
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID: <4B7B6A27.9040200(a)wikimedia.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
It sadly failed as noted in
http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/2010-January/0000…
I've updated the index to clear that up.
--tomasz
Hi Tomasz,
The pages-meta-history.xml.bz2 is showing 115.4GB written (in progress) at:
http://download.wikipedia.org/enwiki/20100130/
The older pages-meta-history.xml.bz2 from http://download.wikipedia.org/enwiki/20091128/
shows 255.1GB written (failed build)
So once the 20100130 current pages-meta-history.xml.bz2 dump is finished writing, will it be over 255GB as it is newer than the older copy and contains more info?
Also these big files aren't weblinked for download lately I noticed. I think they should be as they contain the full wikipedia history/discussion pages which have humongous amounts of useful information that should be available for easy distribution. What is the reason they aren't weblinked, the bandwidth costs?
cheers,
Jamie
Jamie Morken wrote:
> Hi,
>
> I was looking at the enwiki dump progress and noticed the file size for the enwiki pages-meta-history.xml.bz2 has decreased
> from 255GB on 20100125 down to 105GB on 20100203. Is it possible that
> old page revision edit data is being lost due to the smaller archive file
> size?
>
> 2009-12-03 12:53:43 in-progress All pages with complete page edit history (.bz2)2010-01-25
> 16:02:21: enwiki 14833408 pages (3.231/sec), 284292000 revs
> (61.930/sec), 54.7% prefetched, ETA 2010-02-03 02:34:19 [max 329446505]
> These dumps can be *very* large, uncompressing
> up to 20 times the archive download size. Suitable for archival and
> statistical use, most mirror sites won't want or need this.pages-meta-history.xml.bz2 255.1 GB (written)
> 2010-02-03 17:28:43 in-progress All pages with complete page edit history (.bz2)2010-02-16
> 00:32:55: enwiki 747550 pages (0.704/sec), 95964000 revs (90.340/sec),
> 95.8% prefetched, ETA 2010-03-19 12:10:50 [max 341714004]
> These dumps can be *very* large, uncompressing
> up to 20 times the archive download size. Suitable for archival and
> statistical use, most mirror sites won't want or need this.pages-meta-history.xml.bz2 105.1 GB (written)
> cheers,
> Jamie
>
Hi
I wanted to use http://en.wikipedia.org/wiki/Template:Clayden on a wiki so I
went to Special:Export and added Template:Clayden and checked "Include
templates" and got the file which I then imported into a new MW. I expected
to get the same output as on enwiki but I got an error instead.
enwiki:
J. P. Clayden, N. Greeves, S. G. Warren, P. D. Wothers (2000), Organic
Chemistry (1st ed.), Oxford: Oxford University Press, ISBN 978-0-19-850346-0
mywiki:
J. P. Clayden, N. Greeves, S. G. Warren, P. D. Wothers (2000), [Expression
error: Missing operand for > Organic Chemistry] (1st ed.), Oxford: Oxford
University Press, ISBN 978-0-19-850346-0
My question is: why that happens? (i.e. how to fix?) should I import the
template with a different method?
1.15.1
5.2.10
ParserFunctions (Version 1.1.1)
$wgUseTidy = true;
Hi,
>
> Well, Google's translate service is an example of exactly what
> they were
> *trying* to block, people hotloading Wikipedia for fun and profit.
>
I
am sure that the intentions are good for what they are doing, people
just want to protect wikipedia (including me). I am most interested in
learning about how to make a local searchable backup of the full
wikipedia, it seems a bit tricky with the xml format. I plan on making
a install of apache/lucene/mysql/php/mediawiki and see how it works! :)
ps. there are a lot of question marks in my posts but I am not putting most of them there, all my real questions have been kindly answered so far thanks!
cheers,
Jamie
I can imagine this url:
http://someserver.com/wiki/api.php?wikicode=code here
urlencoded&format=json&device=handheld
as a way to ask MediaWiki installed in someserver.com to render has
html inside json the wikicode provided.
Do mediawiki already support something like this?
Is somewhat interesting, as make the engine "portable", so any device
can render wikicode in html, even clientside javascript (with a single
ajax call).
--
--
ℱin del ℳensaje.
Hi,
I saw this thread back in October where someone was having trouble
importing the English Wikipedia XML dump:
http://lists.wikimedia.org/pipermail/wikitech-l/2009-October/045594.html
The thread back in October seemed to end without resolution, and the
tools still seem to be broken, so has anyone found a solution in the
meantime?
I'm using mediawiki-1.15.1 and attempting to import
enwiki-20100130-pages-articles.xml.bz2.
None of these options seem to work:
1) importDump.php
fails by spewing "Warning: xml_parse(): Unable to call handler in_()
in ./includes/Import.php on line 437" repeatedly
2) xml2sql (http://meta.wikimedia.org/wiki/Xml2sql):
Fails with error:
xml2sql: parsing aborted at line 33 pos 16.
due to the new <redirect> tag introduced in the new dumps?
3) mwdumper (http://www.mediawiki.org/wiki/MWDumper):
Current XML is schema v0.4, but the documentation says that it's for 0.3
4) mwimport (http://meta.wikimedia.org/wiki/Data_dumps/mwimport):
Fails immediately:
siteinfo: untested generator 'MediaWiki 1.16alpha-wmf', expect trouble ahead
page: expected closing tag in line 35
Any tips?
Thanks!
Eric