Brion Vibber schreef:
Why do you use JSON here? Wouldn't it be better/faster/more intuitive/whatever to use format=php and unserialize()? Surely PHP can interpret its own format faster than JSON?
Unserializing PHP serialized data that comes from a foreign server is potentially unsafe. Some versions of PHP have had outright security vulnerabilities in unserialization, but even in the best of cases you aren't necessarily sure what data type you're going to get out -- a compromised or malicious server could hand us an instance of any class in MediaWiki, with completely arbitrary contents.
Rather than do a full audit of the entire system, including all extensions, to ensure that no class does anything unsafe in its unserialization wake-up functions, using a data format that will just give us an associative array full of strings and ints sounds nice and safe.
We're handling an *array* here, right? How the hell can unserializing() something, then using [] on it cause anything else than a fatal error? Or does PHP allow overloading of the [] operator? (I though PHP didn't *do* operator overloading, but I could be wrong.) Anyway, is_array() should be safe enough here.
As for performance, the overhead of JSON encoding/decoding should be fairly trivial compared to the network time and the backend's work. Feel free to benchmark it, but I doubt using PHP serialization would have any significant performance impact.
(It might be nice for compatibility, as PHP's native JSON decode module isn't guaranteed to be present, but we may have a JSON decoder sitting around for some of the AJAX stuff anyway, I haven't bothered to look.)
I'm not saying it makes a huge difference and I know trying to optimize non-bottleneck code is discouraged, but if you can get better performance by just using a different function, it's too easy not to do. You're also right about the compatibility thing.
Also, I though Http::get() was a faux pas, why use it here?
On the contrary -- *not* using Http::get() is a faux pas. :)
We can control the timeout values etc with Http::get(), and choose the best backend (PHP's URL wrappers or CURL module), all nice and encapsulated. Duplicating all that by hand is a big no-no.
OK. I remember someone saying something related to bug 14024 (interwiki API) that calling Http::get() would take too long etc.
As to the FIXME, it's kind of cryptical so I don't know exactly what you mean there, but the client API will normalize titles. Or are you worried about de requesting "Bild:Foo.jpg" from en?
Right. If the local wiki's not in English, it may construct the output using something other than the canonical 'Image:', which won't be recognized by the foreign repo.
In this case, we need to canonicalize the title first (i.e. replace Bild: by Image: on a German wiki), then submit that title to the remote wiki, which will normalize Image: to whatever it uses.
if( isset( $data['query']['pages'] ) ) {
foreach( $data['query']['pages'] as $pageid => $info ) {
if( isset( $info['imageinfo'][0] ) ) {
return new ForeignAPIFile( $title, $this, $info['imageinfo'][0] );
}
}
}
I don't really get what the foreach is for: it's only one file, right? You can use reset() to get the first element of an array if it doesn't have index 0.
IMHO, reset() and foreach() are equally ugly here. :)
Personal preference then, I guess. To me, foreach() implies that something runs more than once (although that's impossible in this case because of the return statement).
Speaking of which, there should probably be some sanity check that verifies that $title is a legal title to prevent unnecessary requests or people sneaking in a '|' and not getting what they expect.
An invalid title shouldn't have reached us by this point; we're handed a Title object, not a string.
OK then. I didn't look at the calling code at all, just at the functions themselves.
Roan Kattouw (Catrope)