I am looking to create a script for creating manual dumps for those wikis that either dont or wont publish their own dumps and that I dont have server access to. To that end I am writing a python dump creator, however I would like to ensure that my format is the same as the existing. I could reverse engineer it by looking at multiple different dumps but that takes a lot of time and is not fool proof, is there or can I get documentation and details on exactly how the XML dumps are formatted?
On 09/11/12 20:21, John wrote:
I am looking to create a script for creating manual dumps for those wikis that either dont or wont publish their own dumps and that I dont have server access to. To that end I am writing a python dump creator, however I would like to ensure that my format is the same as the existing. I could reverse engineer it by looking at multiple different dumps but that takes a lot of time and is not fool proof, is there or can I get documentation and details on exactly how the XML dumps are formatted?
Such script already exists. See http://code.google.com/p/wikiteam/ and its mailing list https://groups.google.com/forum/?fromgroups=#!forum/wikiteam-discuss
If you are proficient in python, you arrive in the best moment, since there's currently a discussion about fixing a bug in the tool when it uses the api. :)
I am actually looking to re-write that tool to avoid those bugs which is why I was asking :)
On Friday, November 9, 2012, Platonides wrote:
On 09/11/12 20:21, John wrote:
I am looking to create a script for creating manual dumps for those wikis that either dont or wont publish their own dumps and that I dont have server access to. To that end I am writing a python dump creator, however I would like to ensure that my format is the same as the existing. I could reverse engineer it by looking at multiple different dumps but that takes a lot of time and is not fool proof, is there or can I get documentation and details on exactly how the XML dumps are formatted?
Such script already exists. See http://code.google.com/p/wikiteam/ and its mailing list https://groups.google.com/forum/?fromgroups=#!forum/wikiteam-discuss
If you are proficient in python, you arrive in the best moment, since there's currently a discussion about fixing a bug in the tool when it uses the api. :)
Platonides, this reminds me: have you/we ever documented https://gerrit.wikimedia.org/r/#/c/6717/ somewhere? And do we have some system in place to avoid such problems (import/export incompatibilities) to come up again?
John, 09/11/2012 22:32:
I am actually looking to re-write that tool to avoid those bugs which is why I was asking :)
As Platonides mentioned it: we're speaking of https://meta.wikimedia.org/wiki/WikiTeam/Dumpgenerator_rewrite
Nemo
On 10/11/12 00:17, Federico Leva (Nemo) wrote:
Platonides, this reminds me: have you/we ever documented https://gerrit.wikimedia.org/r/#/c/6717/ somewhere?
I don't think so. Do we really need to document that "Import on 1.16 no longer breaks with 1.18 export format"? It's not as if I had added those values.
And do we have some system in place to avoid such problems (import/export incompatibilities) to come up again?
As commented in the change, the Import format was rewritten in 1.17, so I don't think it explodes on unrecognized tags from that version on.
John, you can find the XSD of the export format in: https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob;f=docs/exp... (last version is 0.8). As for how they are really used, I think you will have to read the export code.
xmldatadumps-l@lists.wikimedia.org