On Wed, Feb 8, 2012 at 8:51 PM, C Stafford <c.stafford(a)gmail.com> wrote:
I personally prefer using the php format over the
JSON, due to
multiple encoding issues I've hit in the past and not having to deal
with the small intricacies (and potential problems) of the JSON
encoding/decoding on each end (the main that comes to mind is the
issues of associate array key names in the JSON->PHP)
I've never had such problems. What issues were you having with array
key names? You should be fine as long as you pass the flag that tells
the JSON decoder to output associative arrays instead of objects (for
the standard JSON decoder in PHP, this is done with json_encode(
$data, true ) ).
For me since I'm writing my MW
interaction/bot/script/etc code in php,
its VERY nice to easily get a native php array of info via format=php
and unserialize(). On that same note, the pretty html print mate to
php (txtfm) is invaluable for debugging/viewing the format of the
return structure using a webbrowser and the GET url format.
If i were writing in another language, be it application,or live JS, I
would indeed seen the prevalence of JSON and/or XML, but if I'm
botting from php, accessing MW in php, why not keep the info in a an
encoded array format that is more native from end to end
I tend to use JSON even in PHP clients (MediaWiki has some API clients
built in, and they do this too; InstantCommons comes to mind), mostly
because I'm paranoid about unserializing untrusted content from the
web (there's an outside chance someone could exploit a __wakeup()
function lying around somewhere).
The problem with PHP serialized format is that it has no way of
escaping angle brackets (and why should it, it's a serialization
format after all), so the only reason you can't exploit it for XSS
purposes already is the fact that we set Content-Type:
application/vnd.php.serialized on the response, so browsers don't
interpret it as HTML. That's not something you can rely on all that
comfortably, because IE will disregard the Content-Type and determine
its own in many cases, and its MIME type sniffer is very happy to
detect things as HTML. The application/vnd.php.serialized MIME type
doesn't trigger sniffing in IE, but for instance text/plain does,
which is why txt and dbg are delivered with the (invalid!) content
type text/text instead.
So this unescaped HTML thing is something that should be fine for now,
but it'd be nice if we could just get rid of it. I think dropping PHP
format support would be acceptable if we 1) have a longer deprecation
path than Max proposed, but more like what Chad proposed and 2) the
encoding problems that C. Stafford mentions are easily addressed or
not a problem in practice. json_decode() is bundled with PHP and
enabled by default these days, so no one that's using PHP serialized
should have a problem finding a JSON decoder.
Would there be anyway to hack in some temp tracking
code to
record/track the usage of each format for the top popular WMF sites
(thinking EN:WP and commons are the first 2 coming to mind) to show if
any of them are being used at all, and to what extent?
Recently Diederik (WMF analytics guy) gave me some basic stats on
which &action= parameters are used how often. I'll ask him to do the
same for &format= .
Roan