In general we don't recombine the pieces; it is extremely easy for the
enduser to do so if a single file is really needed. I probably have a
shell (bash) script around here that would do it. But people have
expressed a preference for more smaller files, either so that they can
process a pice that contains the pages they like, or so that they can
process the data in parallel.
Which brings up a point: a few months back I mentioned that I'd like to
produce a large number, ~ 125, small files for the en wikipedia history
dumps, rather than the 30 larger ones we produce now. . These files
would have the first and last page id of their contents embedded in the
filename. Once again I would not plan to recombine these files; it
adds extra days to the run after the data has already been made
available for download. I'd like people's comments on this.
Ariel
Στις 03-08-2011, ημέρα Τετ, και ώρα 18:12 +0200, ο/η Oliver Ferschke
έγραψε:
Glad I could help.
And YES, we would love to have volunteer contributions to the JWPL documentation.
Any help is greatly appreciated. We also try to improve the documentation, but there is
not always the time.
Thanks,
Oliver
-----Ursprüngliche Nachricht-----
Von: xmldatadumps-l-bounces(a)lists.wikimedia.org
[mailto:xmldatadumps-l-bounces@lists.wikimedia.org] Im Auftrag von Napolitano, Diane
Gesendet: Mittwoch, 3. August 2011 17:58
An: xmldatadumps-l(a)lists.wikimedia.org
Betreff: Re: [Xmldatadumps-l] 7/22 enwiki dump pages-meta-history
Hi Oliver, thanks for your response. That answers my question and in that case, the 27
individual files (!) will work just fine.
On a side note, would you welcome any volunteer effort for documentation contributions to
JWPL? ;)
Thanks,
Diane
-----Original Message-----
From: xmldatadumps-l-bounces(a)lists.wikimedia.org
[mailto:xmldatadumps-l-bounces@lists.wikimedia.org] On Behalf Of Oliver Ferschke
Sent: Wednesday, August 03, 2011 11:56 AM
To: xmldatadumps-l(a)lists.wikimedia.org
Subject: Re: [Xmldatadumps-l] 7/22 enwiki dump pages-meta-history
Dear Diane,
I cannot give you an answer on your original question, but maybe I can still help. For
what exactly do you need the data?
For the JWPL DataMachine, you won't need the pages-meta-history files - only
meta-current, which is available as a single file.
For the RevisionMachine, you can define multiple input files. Consequently, there is no
problem using the archives without recombining them.
Only in the case you want to recreate an old/historic dump (or a series of old dumps)
from the current history dump using the TimeMachine, will you need the pages-meta-history
files recombined. Is this the case?
Best,
Oliver
--
-------------------------------------------------------------------
Oliver Ferschke, M.A.
Doctoral Researcher
Ubiquitous Knowledge Processing Lab
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
ferschke(a)tk.informatik.tu-darmstadt.de
www.ukp.tu-darmstadt.de
Web Research at TU Darmstadt (WeRC)
www.werc.tu-darmstadt.de
-------------------------------------------------------------------
-----Ursprüngliche Nachricht-----
Von: xmldatadumps-l-bounces(a)lists.wikimedia.org
[mailto:xmldatadumps-l-bounces@lists.wikimedia.org] Im Auftrag von Napolitano, Diane
Gesendet: Mittwoch, 3. August 2011 17:36
An: xmldatadumps-l(a)lists.wikimedia.org
Betreff: [Xmldatadumps-l] 7/22 enwiki dump pages-meta-history
Hello, are there any plans to combine all of the pages-meta-history XML dumps from the
7/22 dump into one file? This is useful for importing into JWPL.
Thanks,
Diane M. Napolitano
Associate Research Engineer
Educational Testing Service
Turnbull Hall R-239
Princeton, New Jersey 08540
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l