Hi, this idea had floated around for quite some time, but now that bug 34257[1] was added to the long list of problems, I would like to step up and start some progress. We[2] propose to remove the following formats[3]:
* WDDX - doesn't seem to be used by anyone. Doesn't look sane either. * YAML - we don't serve real YAML anyway, currently it's just a subset of JSON. * rawfm - was created for debugging the JSON formatter aeons ago, not useful for anything now. * txt, dbg, dump - the only reason they were added is that it was possible to add them, they don't serve the purpose of machine/machine communication.
So, only 3 formats would remain: * JSON - *the* recommended API format * XML - evil and clumsy but sadly used too widely to be revoved in the foreseeable future * php - this one is used by several extensions and probably by some third-party reusers, so we won't remove it this time. However, any new uses of it should be discouraged.
We plan to remove the aforementioned formats as soon as MediaWiki 1.19 is branched so that these changes will take effect in 1.20, but would like to hear from you first if there are good reasons why we shouldn't do it or postpone it. Please have your say.
------ [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=34257 [2] Me and Roan Kattouw, one of API's primary developers [3] https://www.mediawiki.org/wiki/API:Data_formats
I personally prefer using the php format over the JSON, due to multiple encoding issues I've hit in the past and not having to deal with the small intricacies (and potential problems) of the JSON encoding/decoding on each end (the main that comes to mind is the issues of associate array key names in the JSON->PHP)
For me since I'm writing my MW interaction/bot/script/etc code in php, its VERY nice to easily get a native php array of info via format=php and unserialize(). On that same note, the pretty html print mate to php (txtfm) is invaluable for debugging/viewing the format of the return structure using a webbrowser and the GET url format.
If i were writing in another language, be it application,or live JS, I would indeed seen the prevalence of JSON and/or XML, but if I'm botting from php, accessing MW in php, why not keep the info in a an encoded array format that is more native from end to end
Just my 3 cents.
That being said, I agree and support the idea of cleaning up some of those more ancient formats if they are not being used.
Would there be anyway to hack in some temp tracking code to record/track the usage of each format for the top popular WMF sites (thinking EN:WP and commons are the first 2 coming to mind) to show if any of them are being used at all, and to what extent?
On Wed, Feb 8, 2012 at 8:51 PM, C Stafford c.stafford@gmail.com wrote:
I personally prefer using the php format over the JSON, due to multiple encoding issues I've hit in the past and not having to deal with the small intricacies (and potential problems) of the JSON encoding/decoding on each end (the main that comes to mind is the issues of associate array key names in the JSON->PHP)
I've never had such problems. What issues were you having with array key names? You should be fine as long as you pass the flag that tells the JSON decoder to output associative arrays instead of objects (for the standard JSON decoder in PHP, this is done with json_encode( $data, true ) ).
For me since I'm writing my MW interaction/bot/script/etc code in php, its VERY nice to easily get a native php array of info via format=php and unserialize(). On that same note, the pretty html print mate to php (txtfm) is invaluable for debugging/viewing the format of the return structure using a webbrowser and the GET url format.
If i were writing in another language, be it application,or live JS, I would indeed seen the prevalence of JSON and/or XML, but if I'm botting from php, accessing MW in php, why not keep the info in a an encoded array format that is more native from end to end
I tend to use JSON even in PHP clients (MediaWiki has some API clients built in, and they do this too; InstantCommons comes to mind), mostly because I'm paranoid about unserializing untrusted content from the web (there's an outside chance someone could exploit a __wakeup() function lying around somewhere).
The problem with PHP serialized format is that it has no way of escaping angle brackets (and why should it, it's a serialization format after all), so the only reason you can't exploit it for XSS purposes already is the fact that we set Content-Type: application/vnd.php.serialized on the response, so browsers don't interpret it as HTML. That's not something you can rely on all that comfortably, because IE will disregard the Content-Type and determine its own in many cases, and its MIME type sniffer is very happy to detect things as HTML. The application/vnd.php.serialized MIME type doesn't trigger sniffing in IE, but for instance text/plain does, which is why txt and dbg are delivered with the (invalid!) content type text/text instead.
So this unescaped HTML thing is something that should be fine for now, but it'd be nice if we could just get rid of it. I think dropping PHP format support would be acceptable if we 1) have a longer deprecation path than Max proposed, but more like what Chad proposed and 2) the encoding problems that C. Stafford mentions are easily addressed or not a problem in practice. json_decode() is bundled with PHP and enabled by default these days, so no one that's using PHP serialized should have a problem finding a JSON decoder.
Would there be anyway to hack in some temp tracking code to record/track the usage of each format for the top popular WMF sites (thinking EN:WP and commons are the first 2 coming to mind) to show if any of them are being used at all, and to what extent?
Recently Diederik (WMF analytics guy) gave me some basic stats on which &action= parameters are used how often. I'll ask him to do the same for &format= .
Roan
On Wed, Feb 8, 2012 at 2:31 PM, Max Semenik maxsem.wiki@gmail.com wrote:
We plan to remove the aforementioned formats as soon as MediaWiki 1.19 is branched so that these changes will take effect in 1.20, but would like to hear from you first if there are good reasons why we shouldn't do it or postpone it. Please have your say.
For a stable API, that's far too fast of a deprecation path. We don't even remove core functions that fast (or shouldn't). I'd suggest throwing some kinds of warnings in the output for at least one release (1.20) and then target them for removal in 1.21.
-Chad
mediawiki-api@lists.wikimedia.org