(I'm going to use "local wiki" here for what Peter is calling "distant wiki", and "foreign wiki" for what he's calling "home wiki". This seems to better match the terminology we use for Commons.)
On Tue, May 25, 2010 at 7:41 AM, Peter17 peter017@gmail.com wrote:
Yes. The shared database would be only for invalidating the cache when a template is edited. In my 3rd (preferred) solution, the templates are still fetched through the API. External wikis can transclude them and cache them for an arbitrary time, as ForeignAPIRepo does.
Ok, I will keep this in mind. Parsing the template on the home wiki seems necessary because it can use other templates hosted on that wiki to render correctly... I think it is the most logical way to do, isn't it?
I think parsing the template on the local wiki is better, because it gives you more flexibility. For instance, it can use local {{SITENAME}} and so forth. {{CONTENTLANG}} would be especially useful, if we're assuming that templates will be transcluded to many languages.
This doesn't mean that it has to use the local wiki's templates. There would be two ways to approach this:
1) Just don't use the local wiki's templates. Any template calls from the foreign wiki's template should go to the foreign wiki, not the local wiki. If this is being done over the API, then as an optimization, you could have the foreign wiki send back all templates that will be required, not just the actual template requested.
2) Use the local wiki's templates, and assume that the template on the foreign wiki is designed to be used remotely and will only call local templates when it's really desired. This gives even more flexibility if the foreign template is designed for this use, but it makes it harder to use templates that aren't designed for foreign use.
At first glance, it seems to me that (1) is the best -- do all parsing on the local wiki, but use templates from the foreign wiki. This will cause errors if the local wiki doesn't have necessary extensions installed, like ParserFunctions, but it gives more flexibility overall.
Another issue here is performance. Parsing is one of the most expensive operations MediaWiki does. Nobody's going to care much if foreign sites request a bunch of templates that can be served out of Squid, but if there are lots of foreign sites that are requesting giant infoboxes and those have to be parsed by Wikimedia servers, Domas is going to come along with an axe pretty soon and everyone's sites will break. Better to head that off at the pass.
Mmmh.... sorry, I'm not really sure I understand... My suggestion is to use a shared database that would store the remote calls, not the content of the pages... In my mind, fetching the distant pages would be done through the API, not by accessing directly the distant database. The external wikis will soon be able to access our images very easily with wgUseInstantCommons but it is still not an access to the database...
What you're proposing is that Wikimedia servers do this on a cache miss:
1) An application server sends an HTTP request to a Squid with If-Modified-Since.
2) The Squid checks its cache, finds it's a miss, and passes the request to another Squid.
3) The other Squid checks its cache, finds it's a miss, and passes the request to a second application server.
4) The second application server loads up the MediaWiki API and sends a request to a database server.
5) The database server returns the result to the second application server.
6) The second application server returns the results to the Squids, which cache it and return it to the first application server.
7) The first application server caches the result in the database.
What I'm proposing is that they do this:
1) An application server sends a database query to a database server (maybe even using an already-open connection).
2) The database server returns the result.
Having Wikimedia servers send HTTP requests to each other instead of just doing database queries does not sound like a great idea to me. You're hitting several extra servers for no reason, including extra requests to an application server. On top of that, you're caching stuff in the database which is already *in* the database! FileRepo does this the Right Way, and you should definitely look at how that works. It uses polymorphism to use the database if possible, else the API.
However, someone like Tim Starling should be consulted for a definitive performance assessment; don't rely on my word alone.
On Tue, May 25, 2010 at 9:11 AM, church.of.emacs.ml church.of.emacs.ml@googlemail.com wrote:
Yes. When I think about this a bit more, it makes sense to parse on the home wiki, because otherwise (a) you couldn't include other remote templates or (b) you would need one API call per included template. Both not feasible.
Just have it return all needed templates at once if you want to minimize round-trips.
However, you'd have to worry that each distant wiki uses only a fair amount of the home wiki server's resources. E.g. set a limit of inclusions (that limit would have to be on the home-wiki-server-side) and disallow infinite loops (they're always fun).
This is probably not enough. I really doubt Wikimedia is going to let a sizable fraction of its CPU time go to foreign template use. Serving images or plain old wikitext from Squid cache is very cheap, so that's not a big deal, but large-scale parsing will be too much, I suspect. (But again, ask Tim about this.)
Do I understand this correctly... you can either access a foreign repository via the API (if you're on another server) or directly via the database (if you're on the same wiki farm)? Very cool stuff.
Yes, that's how FileRepo works.
On Tue, May 25, 2010 at 9:22 AM, Platonides Platonides@gmail.com wrote:
He can internally call the api from the other wiki via FauxRequest.
How will that interact with different configuration settings? I thought FauxRequest only handles requests to the current wiki.
I'm afraid that it will produce the opposite. A third party downloads a xml dump for offline use but it doesn't work because it needs a dozen templates from meta (in the worst case, templates from a dozen other wikis).
My point is that ideally, you'd be able to copy-paste enwiki pages and then get the templates to work by configuring them to be fetched from enwiki. Even more ideally, you might want to fetch the enwiki templates as of the point in time your page was downloaded, in case the templates changed syntax (and also to allow indefinite caching).
But I guess that's much better handled by just using a proper export, and having the templates included in that, so never mind.
On Tue, May 25, 2010 at 9:30 AM, Platonides Platonides@gmail.com wrote:
Infinite loops could only happen if both wikis can fetch from the other one. A simple solution would be to pass with the query who requested it originally. If the home wiki calls a different wiki, it would blame the one who asked for it (or maybe building a wiki + template path).
An even simpler solution would be to only set up one wiki to allow this kind of foreign template request, the way Commons is set up now. But that might be limiting.