Gentlemen, let's say one is afraid one day the forces of evil will confiscate one's small wiki, so one wants to encourage all loyal users to keep a backup of the whole wiki (current revisions fine, no need for full history).
OK, we want this to be as simple as possible for our loyal users, just one click needed. (So forget Special:Export!)
And, we want this to be as simple as possible for our loyal administrator, me. I.e., use existing facilities, no cronjobs to run dumpBackup.php (or even mysqldump, which would be giving up too much information) and then offering a link to what they produce.
The format desired is for later making a new wiki via Special:Import, so indeed the Special:Export or dumpBackup.php --current outputs are the desired format.
I just can't figure out the right http://www.mediawiki.org/wiki/API URL recipe... api.php ? action=query & generator=allpages & format=xmlfm & ...? Could it be that the API lacks the "bulk export of XML formatted data" capability of Special:Export?
If one click is not enough, then at least one click per Namespace. I would just have the users backup Main: and Category:, for example.
Embedding the API URL would be no problem, I would just use [{{SERVER}}/api.php?... Backup this whole site to your disk]
2009/1/11 jidanni@jidanni.org:
Gentlemen, let's say one is afraid one day the forces of evil will confiscate one's small wiki, so one wants to encourage all loyal users to keep a backup of the whole wiki (current revisions fine, no need for full history).
OK, we want this to be as simple as possible for our loyal users, just one click needed. (So forget Special:Export!)
Heh, I guess it would be a pain to do it by API, why don't you just download the backup from http://download.wikimedia.org ? ;) Looking at the sizes of these files, you can prolly tell why it cannot be done by API.
-- Michał "Hołek" Połtyn - holek.n@gmail.com - michal.poltyn@gmail.com http://pl.wikipedia.org/wiki/User:Holek - http://pl-wikiblog.blogspot.com
Isn't there any way via the API to generate the same output as Special:Export?
What query string can generate the same output as dumpBackup.php --current?
OK, just for all pages in one Namespace.
OK, just for all pages matching some prefix string. Or just for even one page.
And what query string can take the same input as Special:Import to import pages?
Don't worry. The wiki involved only has a handful of pages.
Anyway, how could the mighty API lack the features of Special:Export, whose user interface requires cutting and pasting.
jidanni@jidanni.org schreef:
Isn't there any way via the API to generate the same output as Special:Export?
What query string can generate the same output as dumpBackup.php --current?
OK, just for all pages in one Namespace.
OK, just for all pages matching some prefix string. Or just for even one page.
You can't generate the same output as dumpBackup.php or Special:Export or anything of that sort. You can, of course, get info and current revisions of pages with a generator: api.php?action=query&generator=allpages&prop=info|revisions&rvprop=content . This only does the main namespace (for other namespaces, use apnamespace) and only a limited number of pages at once (with aplimit=max and the apihighlimits right you'll get 5000), and the output won't be in Special:Export format (although, if all the required data is there, it shouldn't be too hard to write a converter).
And what query string can take the same input as Special:Import to import pages?
Nothing, sorry.
Don't worry. The wiki involved only has a handful of pages.
Anyway, how could the mighty API lack the features of Special:Export, whose user interface requires cutting and pasting.
That the API lacks export and import functionality is mainly because nobody asked for it (you're the first one, IIRC). I once intended to add them, but dropped the idea for lack of time. I'll probably have time to implement these modules some time soon, though.
Roan Kattouw (Catrope)
RK> I'll probably have time to implement these modules some time soon, though. OK, the outlines of this project are clear: The format to be produced is http://www.mediawiki.org/xml/export-0.3/ and up. The query string would be api.php?action=query&format=export ... That will finally give Special:Export an API. Probably no import API is needed as that could all be achieved with a HTTP GET with the existing Special:Import. The same could be said about Special:Export, but the latter lacks batch job generator capabilities.
jidanni@jidanni.org schreef:
RK> I'll probably have time to implement these modules some time soon, though. OK, the outlines of this project are clear: The format to be produced is http://www.mediawiki.org/xml/export-0.3/ and up. The query string would be api.php?action=query&format=export ...
I'll probably add an &export parameter to action=query, which will produce an XML dump of all pages in the result set. This dump would then be wrapped in whatever the &format= is, unless &exportnowrap is set, in which case you'll just get the export XML, not wrapped in anything.
That will finally give Special:Export an API. Probably no import API is needed as that could all be achieved with a HTTP GET with the existing Special:Import. The same could be said about Special:Export, but the latter lacks batch job generator capabilities.
It would be good to have an import API as well for stuff like error handling.
Roan Kattouw (Catrope)
RK> for other namespaces, use apnamespace Odd, http://radioscanningtw.jidanni.org/api.php?action=query&generator=allpag... http://radioscanningtw.jidanni.org/api.php?action=query&generator=allpag... seem to both give just namespace 0.
jidanni@jidanni.org schreef:
RK> for other namespaces, use apnamespace Odd, http://radioscanningtw.jidanni.org/api.php?action=query&generator=allpag... http://radioscanningtw.jidanni.org/api.php?action=query&generator=allpag... seem to both give just namespace 0.
Oops, that should be gapnamespace of course.
Roan Kattouw (Catrope)
RK> Oops, that should be gapnamespace of course.
Ah, and the vital words
Parameters passed to a generator must be prefixed with a g. For instance, when using generator=backlinks, use gbltitle instead of bltitle.
are missing from e.g., http://www.mediawiki.org/w/api.php That's right, nowhere in it are the two worlds (g and non g) linked.
By the way, to generate the names of all the pages of a site, one needs some kind of recursive generation: for namespace in $(namespaces) do for page in $(pages namespace) dump $page I wonder if that is possible in a single URL.
jidanni@jidanni.org schreef:
RK> Oops, that should be gapnamespace of course.
Ah, and the vital words
Parameters passed to a generator must be prefixed with a g. For instance, when using generator=backlinks, use gbltitle instead of bltitle.
are missing from e.g., http://www.mediawiki.org/w/api.php That's right, nowhere in it are the two worlds (g and non g) linked.
Yeah, they're not in the api.php help, but they are in the docs at MW.org. I'll work on putting a sentence or two about this in the api.php help as well.
By the way, to generate the names of all the pages of a site, one needs some kind of recursive generation: for namespace in $(namespaces) do for page in $(pages namespace) dump $page I wonder if that is possible in a single URL.
Currently, no. It's probably technically possible, but you'd still get your pages ordered by namespace first, then title (i.e. the same order as with separate requests) and paging through multiple namespaces screws up all kinds of wonderful features like apprefix= and apfrom=.
Roan Kattouw (Catrope)
By the way, to generate the names of all the pages of a site, one needs some kind of recursive generation:
RK> Currently, no. It's probably technically possible, but you'd still get RK> your pages ordered by namespace first, then title (i.e. the same order RK> as with separate requests) and paging through multiple namespaces screws RK> up all kinds of wonderful features like apprefix= and apfrom=. That would be fine, as we will only be feeding this into e.g., Special:Import (where order doesn't matter) on another computer.
Anyway, the whole impetus here is that one sees all the Free copyright banners, and then thinks "OK, if it is so free, let me have a hunk of the raw data so I can try (making a wiki like yours) too", etc.
Only to find Special:Export and its tiny ticket window of data export "yes Sir, which page would you like to export?" "What, you want more than one page well you'll have to list them one by one. No, you can't just 'have them all'".
Also for small wikis that might drop dead, loyal followers could also keep a backup without anybody having to set up anything beyond just vanilla MediaWiki.
Anyway, also an import feature would also be an answer to an earlier post: Michael Dale: api file uploading
P.S., of course I favor limits. Any "Export this whole site!" button needs to note "must change &limit= by hand, max=... sorry".
jidanni@jidanni.org schreef:
By the way, to generate the names of all the pages of a site, one needs some kind of recursive generation:
RK> Currently, no. It's probably technically possible, but you'd still get RK> your pages ordered by namespace first, then title (i.e. the same order RK> as with separate requests) and paging through multiple namespaces screws RK> up all kinds of wonderful features like apprefix= and apfrom=. That would be fine, as we will only be feeding this into e.g., Special:Import (where order doesn't matter) on another computer.
I get that, but the main point is that getting all page titles in all namespaces is already possible, it just requires a few more requests and consequently a few more lines of client code. Still, the process of getting all namespace IDs, iterating over them and calling list=allpages for each of them is quite painless. It's not worth going through a lot of trouble on the server side just to make life on the client side a tiny bit easier.
Anyway, the whole impetus here is that one sees all the Free copyright banners, and then thinks "OK, if it is so free, let me have a hunk of the raw data so I can try (making a wiki like yours) too", etc.
Only to find Special:Export and its tiny ticket window of data export "yes Sir, which page would you like to export?" "What, you want more than one page well you'll have to list them one by one. No, you can't just 'have them all'".
You could of course request that such a feature be added to Special:Export (either in core or through an extension), or even have a go at it yourself if you know PHP.
Also for small wikis that might drop dead, loyal followers could also keep a backup without anybody having to set up anything beyond just vanilla MediaWiki.
Anyway, also an import feature would also be an answer to an earlier post: Michael Dale: api file uploading
I'm planning to add action=import to the API as well.
P.S., of course I favor limits. Any "Export this whole site!" button needs to note "must change &limit= by hand, max=... sorry".
In this case, Special:Export could just provide multiple dumps, each one containing a portion of the wiki (cf. [[Special:Allpages]] at enwiki).
Roan Kattouw (Catrope)
RK> getting all namespace IDs, iterating over them and calling list=allpages RK> for each of them is quite painless. It's not worth going through a lot RK> of trouble on the server side just to make life on the client side a RK> tiny bit easier.
Anyway, just like $ ssh example.org who $ ssh example.org ps aux $ ssh example.org last can be bundled to $ ssh example.org 'who; ps aux; last' or even $ ssh example.org <<EOF ... there in general should be a way __with one request URL__ to pack an unlimited amount of operations in, and get all the results back in __one reply__. Need some temporary storage $variables too. Hmmm, not exactly "just "su nobody" and do what they ask in the maintenance directory :-)".
RK> You could of course request that such a feature be added to Special:Export http://bugzilla.wikimedia.org/show_bug.cgi?id=9474
RK> Special:Export could just provide multiple dumps, each one RK> containing a portion of the wiki (cf. [[Special:Allpages]] at enwiki).
For monster wikis probably the current Special:Export category helper button is the only way to go, as any larger slices of the wiki are probably mistaken slices of death for the user's modem.
However on small wikis (triggered by a check of some $TOTAL_PAGES, nah, or better a new $wgWelcomeToDownload (default=false)) there could be:
We're serious about openness. Welcome to download this whole wiki in portable export format. Namespaces desired: [*]Main [*]Main Talk ... [ ]Mediawiki... (boring, not checked by default)
Filter rules: [___] Limit: [10] pages. Max 250. [*]Save to file []Browse as plain text http://bugzilla.wikimedia.org/show_bug.cgi?id=16963 P.S. http://bugzilla.wikimedia.org/show_bug.cgi?id=9889
However, as I don't really "have only one burnin' desire" (http://www.mp3lyrics.org/j/jimi-hendrix/fire/) about these features I'm recommending, I'll just er, like fade away for now...
Only to find Special:Export and its tiny ticket window of data export "yes Sir, which page would you like to export?" "What, you want more than one page well you'll have to list them one by one. No, you can't just 'have them all'".
you can add categories to grab groups of pages. Offering a daily dump is easy enough for my wiki, and I rather generate that daily with a "nice" process rather than someone clicking "export all" and the site going down because everyone is running 10 minute+ queries
It is easy enough to make a link from the export page to a full "xml" dump too.
Best Regards
Jools
Jools Smyth schreef:
Only to find Special:Export and its tiny ticket window of data export "yes Sir, which page would you like to export?" "What, you want more than one page well you'll have to list them one by one. No, you can't just 'have them all'".
you can add categories to grab groups of pages.
You could, of course, put all articles in a generic category like [[Category:A to Z]].
Roan Kattouw (Catrope)
jidanni@jidanni.org wrote:
Isn't there any way via the API to generate the same output as Special:Export?
What query string can generate the same output as dumpBackup.php --current?
You could modify dumpBackup.php to work as a web script. It would be something like this:
<?php define('MEDIAWIKI',true); $IP = realpath( dirname( __FILE__ ) . '/..' ) ); require_once( "$IP/includes/AutoLoader.php" ); require_once( $IP.'/includes/Defines.php' ); require_once("$IP/LocalSettings.php");
include "backup.inc"; $dumper = new BackupDumper( array() ); $dumper->reporting = false; $dumper->dump( WikiExporter::CURRENT, $textMode );
It will be very large, take too much memory, too much time... Daily crontab dumps would be a much better system.
mediawiki-api@lists.wikimedia.org