On 02/05/2013 02:35 AM, Jens Ohlig wrote:
I'm wondering if some of the specialized functionality can be avoided by fetching JSON data from wikibase / wikidata through a web API. This would be more versatile, and could be used by alternative templating systems.
This was actually my first idea! However, since the client (i.e. Wikipedia) currently must have access to the database at the repo (i.e. Wikidata) anyway, this would result in a huge performance loss without any obvious gain.
Jens,
I am not so sure about the potential performance loss. I am guessing that you fear the overheads of JSON serialization, which tends to be relatively low with current libraries. Moving or accessing PHP objects to/from Lua involves some overheads too, which is avoided when directly decoding in Lua.
Apart from making the data generally available, using a web API means that the execution can be parallelized / distributed and potentially cached. It also tends to lead to narrow interfaces with explicit handling of state. Is direct DB access just needed because an API is missing, or are there technical issues that are hard to handle in a web API?
Adding specialized Wikidata methods to Lua has a cost for users and developers. Users probably need to learn larger and less general APIs. Developers need to continue to support these methods once the content is there, which can be difficult if the specialized methods don't cleanly map to a future web API.
Gabriel
On Tue, Feb 5, 2013 at 11:43 AM, Gabriel Wicke gwicke@wikimedia.org wrote:
Apart from making the data generally available, using a web API means that the execution can be parallelized / distributed and potentially cached. It also tends to lead to narrow interfaces with explicit handling of state.
It would also mean that MediaWiki would be making uncontrolled API calls *during the page parse*. That would probably not work out too well; I know on my local test wiki it's a pain just having to wait for the ForeignAPIRepo calls for images.
On 02/05/2013 10:53 AM, Brad Jorsch wrote:
On Tue, Feb 5, 2013 at 11:43 AM, Gabriel Wicke gwicke@wikimedia.org wrote:
Apart from making the data generally available, using a web API means that the execution can be parallelized / distributed and potentially cached. It also tends to lead to narrow interfaces with explicit handling of state.
It would also mean that MediaWiki would be making uncontrolled API calls *during the page parse*.
To me it is not clear why a Wikidata web API would be less controlled than a Wikidata Lua API with direct access to the DB.
That would probably not work out too well; I know on my local test wiki it's a pain just having to wait for the ForeignAPIRepo calls for images.
Slow operations will be slow, no matter how you call them. With a web API you at least get to parallelize and distribute the execution, so that you don't have to wait for a sequence of slow operations.
Gabriel
There will be (actually, there is already) a web API offering the kind of data required, and for client wikis not running on WMF infrastructure this will eventually be the way to access the data.
For WMF clients, like the Wikipedias, our decision was not to use HTTP web requests, but to internally get the data directly from the respective DB. This was deemed the deciding factor wrt performance: internal DB queries vs HTTP requests.
The cost of serializing to and from JSON was not deemed relevant, as you argue for.
This sums up my understanding of the situation, I might easily be wrong.
Cheers, Denny
2013/2/5 Gabriel Wicke gwicke@wikimedia.org
On 02/05/2013 02:35 AM, Jens Ohlig wrote:
I'm wondering if some of the specialized functionality can be avoided by fetching JSON data from wikibase / wikidata through a web API. This would be more versatile, and could be used by alternative templating systems.
This was actually my first idea! However, since the client (i.e. Wikipedia) currently must have access to the database at the repo (i.e. Wikidata) anyway, this would result in a huge performance loss without any obvious gain.
Jens,
I am not so sure about the potential performance loss. I am guessing that you fear the overheads of JSON serialization, which tends to be relatively low with current libraries. Moving or accessing PHP objects to/from Lua involves some overheads too, which is avoided when directly decoding in Lua.
Apart from making the data generally available, using a web API means that the execution can be parallelized / distributed and potentially cached. It also tends to lead to narrow interfaces with explicit handling of state. Is direct DB access just needed because an API is missing, or are there technical issues that are hard to handle in a web API?
Adding specialized Wikidata methods to Lua has a cost for users and developers. Users probably need to learn larger and less general APIs. Developers need to continue to support these methods once the content is there, which can be difficult if the specialized methods don't cleanly map to a future web API.
Gabriel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 02/06/2013 04:49 AM, Denny Vrandečić wrote:
There will be (actually, there is already) a web API offering the kind of data required, and for client wikis not running on WMF infrastructure this will eventually be the way to access the data.
For WMF clients, like the Wikipedias, our decision was not to use HTTP web requests, but to internally get the data directly from the respective DB. This was deemed the deciding factor wrt performance: internal DB queries vs HTTP requests.
Local HTTP requests have pretty low overhead (1-2ms), but api.php suffers from high start-up costs (35-40ms). This is more an issue with api.php and the PHP execution model than with HTTP though, and might be improved in the future.
It should be possible to hide optimization details like local DB access vs. an actual HTTP request behind the same interface. A URL-based query interface can support local handlers for specific URL prefixes. Decoding a URL is pretty cheap, and the logic for mapping it onto local DB queries likely already exists to support the web API. The URL-based interface would allow users to learn a single, simple JSON interface and use it to access multiple (possibly whitelisted) data sources. Wikidata-using Lua modules would be reusable in another wiki independent of local vs. remote access or software version.
Alternatively, a specialized Lua Wikidata API could be mapped to the web API behind the scenes to support both local and remote access to Wikidata. Care would be needed in the design of the interface to make this possible, and little existing code could be reused. New Wikidata features would only become available to clients after upgrading their Lua bindings.
I believe that we should try to keep the our public APIs small and versatile, both for the benefit of users and developers. Maybe there is a reason why URLs are too simple as an interface for Wikidata, but I am not so sure about that yet.
Gabriel
On Wed, Feb 6, 2013 at 8:54 AM, Gabriel Wicke gwicke@wikimedia.org wrote:
Local HTTP requests have pretty low overhead (1-2ms), but api.php suffers from high start-up costs (35-40ms). This is more an issue with api.php and the PHP execution model than with HTTP though, and might be improved in the future.
I would vote against local http requests, if we can avoid it. They can certainly be done safely if you design them correctly, but for example, you write a write a lua template, that calls an api that uses the same lua template that calls the api,... single request DoS!
We should definitely pick the design that makes the most sense, but keeping new attack vectors to a minimum would be good.
* Chris Steipp wrote:
On Wed, Feb 6, 2013 at 8:54 AM, Gabriel Wicke gwicke@wikimedia.org wrote:
Local HTTP requests have pretty low overhead (1-2ms), but api.php suffers from high start-up costs (35-40ms). This is more an issue with api.php and the PHP execution model than with HTTP though, and might be improved in the future.
I would vote against local http requests, if we can avoid it. They can certainly be done safely if you design them correctly, but for example, you write a write a lua template, that calls an api that uses the same lua template that calls the api,... single request DoS!
(That's usually trivially addressed by, say, including a counter in some request header and refusing to serve requests where the recursion goes beyond some configured limit. And it is usually possible to do this at a very high level, so that should not be a major concern.)
On Wed, Feb 6, 2013 at 10:04 AM, Bjoern Hoehrmann derhoermi@gmx.net wrote:
- Chris Steipp wrote:
On Wed, Feb 6, 2013 at 8:54 AM, Gabriel Wicke gwicke@wikimedia.org wrote:
Local HTTP requests have pretty low overhead (1-2ms), but api.php suffers from high start-up costs (35-40ms). This is more an issue with api.php and the PHP execution model than with HTTP though, and might be improved in the future.
I would vote against local http requests, if we can avoid it. They can certainly be done safely if you design them correctly, but for example, you write a write a lua template, that calls an api that uses the same lua template that calls the api,... single request DoS!
(That's usually trivially addressed by, say, including a counter in some request header and refusing to serve requests where the recursion goes beyond some configured limit. And it is usually possible to do this at a very high level, so that should not be a major concern.)
I totally agree, but that was just the first attack that popped into my head. There are many more I'm sure.
In general, it seems to me like there will be more attacks opened up by having lua open network requests to the api, than there would be by defining an internal api. But if that turns out to be the best way to handle it, then we'll just need to spend the time making sure it's done in a safe way.
-- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 02/06/2013 10:49 AM, Chris Steipp wrote:
In general, it seems to me like there will be more attacks opened up by having lua open network requests to the api, than there would be by defining an internal api.
Initially the use case will be providing access to the Wikidata API, not the MediaWiki API in general. A URL-style API can be opened up to provide access to some end points in the local MediaWiki API in the future if those are indeed safe, but I agree that we should be careful about this. Those local end points could also be handled as local method calls instead of actually performing an HTTP request.
But if that turns out to be the best way to handle it, then we'll just need to spend the time making sure it's done in a safe way.
Agreed. If we started out restricted to the Wikidata API only, the initial effort to verify safety should be quite manageable though. Additional URL-based APIs would need to be vetted before being whitelisted, but would not require a new Lua API.
Gabriel
Please don't forget about the hybrid approach -- API supports FauxRequests - so an API call can be made without doing a web call, but an internal one instead, without any json or startup overhead:
http://www.mediawiki.org/wiki/API:Calling_internally
On Wed, Feb 6, 2013 at 2:08 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
On 02/06/2013 10:49 AM, Chris Steipp wrote:
In general, it seems to me like there will be more attacks opened up by having lua open network requests to the api, than there would be by defining an internal api.
Initially the use case will be providing access to the Wikidata API, not the MediaWiki API in general. A URL-style API can be opened up to provide access to some end points in the local MediaWiki API in the future if those are indeed safe, but I agree that we should be careful about this. Those local end points could also be handled as local method calls instead of actually performing an HTTP request.
But if that turns out to be the best way to handle it, then we'll just need to spend the time making sure it's done in a safe way.
Agreed. If we started out restricted to the Wikidata API only, the initial effort to verify safety should be quite manageable though. Additional URL-based APIs would need to be vetted before being whitelisted, but would not require a new Lua API.
Gabriel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 02/06/2013 09:29 AM, Chris Steipp wrote:
On Wed, Feb 6, 2013 at 8:54 AM, Gabriel Wicke gwicke@wikimedia.org wrote:
Local HTTP requests have pretty low overhead (1-2ms), but api.php suffers from high start-up costs (35-40ms). This is more an issue with api.php and the PHP execution model than with HTTP though, and might be improved in the future.
I would vote against local http requests, if we can avoid it. They can certainly be done safely if you design them correctly, but for example, you write a write a lua template, that calls an api that uses the same lua template that calls the api,... single request DoS!
The supported API URLs can be restricted to straightforward JSON query APIs with a whitelist. If this whitelist initially only contains wikidata, the effect would be the same as using a specialized Wikidata API.
Gabriel
On Wed, Feb 6, 2013 at 11:54 AM, Gabriel Wicke gwicke@wikimedia.org wrote:
It should be possible to hide optimization details like local DB access vs. an actual HTTP request behind the same interface. A URL-based query interface can support local handlers for specific URL prefixes.
Or the interface can just look like a function call. Which seems a whole lot more straightforward than forcing people to encode a URL in Lua which is then passed to PHP and decoded to determine if it should be sent out as a remote request or looked up in the local DB.
On 02/06/2013 11:43 AM, Brad Jorsch wrote:
On Wed, Feb 6, 2013 at 11:54 AM, Gabriel Wicke gwicke@wikimedia.org wrote:
It should be possible to hide optimization details like local DB access vs. an actual HTTP request behind the same interface. A URL-based query interface can support local handlers for specific URL prefixes.
Or the interface can just look like a function call. Which seems a whole lot more straightforward than forcing people to encode a URL in Lua which is then passed to PHP and decoded to determine if it should be sent out as a remote request or looked up in the local DB.
I don't know much Lua, but a function call like this does not seem to be *that* hard to use:
-- Would fetch JSON from -- http://wikidata.org/api/query/?param1=foo%C2%B6m2=bar -- if no local handler is defined and the base URL is in a whitelist jsonObject = JSONRequest("http://wikidata.org/api/query/", { param1="foo", param2="bar" } )
No manual encoding, and URL encoding can be completely skipped if the prefix happens to match a registered local handler. An optional third parameter can pass in a table to specify the request method and other options.
Gabriel
On Wed, Feb 6, 2013 at 4:04 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
On 02/06/2013 11:43 AM, Brad Jorsch wrote:
On Wed, Feb 6, 2013 at 11:54 AM, Gabriel Wicke gwicke@wikimedia.org wrote:
It should be possible to hide optimization details like local DB access vs. an actual HTTP request behind the same interface. A URL-based query interface can support local handlers for specific URL prefixes.
Or the interface can just look like a function call. Which seems a whole lot more straightforward than forcing people to encode a URL in Lua which is then passed to PHP and decoded to determine if it should be sent out as a remote request or looked up in the local DB.
I don't know much Lua, but a function call like this does not seem to be *that* hard to use:
-- Would fetch JSON from -- http://wikidata.org/api/query/?param1=foo%C2%B6m2=bar -- if no local handler is defined and the base URL is in a whitelist jsonObject = JSONRequest("http://wikidata.org/api/query/", { param1="foo", param2="bar" } )
No manual encoding, and URL encoding can be completely skipped if the prefix happens to match a registered local handler. An optional third parameter can pass in a table to specify the request method and other options.
Having a method with a very generic and suggestive name like "JSONRequest" that only works for one long magic value of the first parameter seems like a bad design to me.
Also note it's not actually taking JSON as input, it's taking a Lua table.
On 02/06/2013 01:30 PM, Brad Jorsch wrote:
-- Would fetch JSON from -- http://wikidata.org/api/query/?param1=foo%C2%B6m2=bar -- if no local handler is defined and the base URL is in a whitelist jsonObject = JSONRequest("http://wikidata.org/api/query/", { param1="foo", param2="bar" } )
No manual encoding, and URL encoding can be completely skipped if the prefix happens to match a registered local handler. An optional third parameter can pass in a table to specify the request method and other options.
Having a method with a very generic and suggestive name like "JSONRequest" that only works for one long magic value of the first parameter seems like a bad design to me.
If a whitelist is enforced, then giving this URL a symbolic name looks like a solvable problem. Alternatively, each configured API end point could be a generic object with a request method:
jsonData = JSONAPI.wikidata.request({ param1="foo", param2="bar" })
Also note it's not actually taking JSON as input, it's taking a Lua table.
Right. Can you describe why you would want to pass JSON into a method that requests JSON data from an API?
Gabriel
On Wed, Feb 6, 2013 at 4:48 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
jsonData = JSONAPI.wikidata.request({ param1="foo", param2="bar" })
At that point, we may as well make it "mw.wikidata.request{ param1="foo", param2="bar" }" (or mw.ext.wikidata.request, no one has replied to that suggestion yet).
wikitech-l@lists.wikimedia.org