On Wed, Feb 8, 2012 at 8:51 PM, C Stafford c.stafford@gmail.com wrote:
I personally prefer using the php format over the JSON, due to multiple encoding issues I've hit in the past and not having to deal with the small intricacies (and potential problems) of the JSON encoding/decoding on each end (the main that comes to mind is the issues of associate array key names in the JSON->PHP)
I've never had such problems. What issues were you having with array key names? You should be fine as long as you pass the flag that tells the JSON decoder to output associative arrays instead of objects (for the standard JSON decoder in PHP, this is done with json_encode( $data, true ) ).
For me since I'm writing my MW interaction/bot/script/etc code in php, its VERY nice to easily get a native php array of info via format=php and unserialize(). On that same note, the pretty html print mate to php (txtfm) is invaluable for debugging/viewing the format of the return structure using a webbrowser and the GET url format.
If i were writing in another language, be it application,or live JS, I would indeed seen the prevalence of JSON and/or XML, but if I'm botting from php, accessing MW in php, why not keep the info in a an encoded array format that is more native from end to end
I tend to use JSON even in PHP clients (MediaWiki has some API clients built in, and they do this too; InstantCommons comes to mind), mostly because I'm paranoid about unserializing untrusted content from the web (there's an outside chance someone could exploit a __wakeup() function lying around somewhere).
The problem with PHP serialized format is that it has no way of escaping angle brackets (and why should it, it's a serialization format after all), so the only reason you can't exploit it for XSS purposes already is the fact that we set Content-Type: application/vnd.php.serialized on the response, so browsers don't interpret it as HTML. That's not something you can rely on all that comfortably, because IE will disregard the Content-Type and determine its own in many cases, and its MIME type sniffer is very happy to detect things as HTML. The application/vnd.php.serialized MIME type doesn't trigger sniffing in IE, but for instance text/plain does, which is why txt and dbg are delivered with the (invalid!) content type text/text instead.
So this unescaped HTML thing is something that should be fine for now, but it'd be nice if we could just get rid of it. I think dropping PHP format support would be acceptable if we 1) have a longer deprecation path than Max proposed, but more like what Chad proposed and 2) the encoding problems that C. Stafford mentions are easily addressed or not a problem in practice. json_decode() is bundled with PHP and enabled by default these days, so no one that's using PHP serialized should have a problem finding a JSON decoder.
Would there be anyway to hack in some temp tracking code to record/track the usage of each format for the top popular WMF sites (thinking EN:WP and commons are the first 2 coming to mind) to show if any of them are being used at all, and to what extent?
Recently Diederik (WMF analytics guy) gave me some basic stats on which &action= parameters are used how often. I'll ask him to do the same for &format= .
Roan